OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Apr 25, 2024 10:08 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: (Fixed) Memory access bug
PostPosted: Sun Sep 29, 2019 1:58 pm 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
Hi,

I've encountered a bug and I'm out of ideas.
To sum it up:
  • My OS uses 32 bit PAE paging, 4 KB pages
  • Accessing (reading/writing) anything from 0x800000 to 0x81E000 causes a page fault
  • Error code 0x0B (Protection violation (page present), page written, reserved violation, kernel mode)
  • Only happens on real hardware (no problems with emulators)
  • Only happens when using -O0, with -O2 it disappears
  • My error label gets corrupted -> "Page Fault" turns into some random chars and symbols. (memory corruption perhaps?)

Here is an example that causes a crash. (not this exact code, this is a simplified version, but it boils down to this):
Code:
   uint32_t* test = (uint32_t*) 0x81E000;
   test[0] = 0xCAFEBEEF;


Any random guesses, have you ever encountered anything similar?

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Last edited by Octacone on Mon Oct 14, 2019 8:16 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Mon Sep 30, 2019 9:23 am 
Online
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1605
Reserved violation? That means one of the relevant page translation entries has a reserved bit set.

It appears that something is corrupting your paging structures. This also explains why your constants aren't working anymore: The page translation entries for those might have been corrupted. But if the code continues to work then the paging entries for the code are still working. So there's a hint. Where is the kernel code and where are the kernel data? And are you putting your strings into a write-protected section or not?

Do you use write protection in kernel mode? If not, it might be worth it to start.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sat Oct 05, 2019 6:00 am 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
nullplan wrote:
Reserved violation? That means one of the relevant page translation entries has a reserved bit set.

It appears that something is corrupting your paging structures. This also explains why your constants aren't working anymore: The page translation entries for those might have been corrupted. But if the code continues to work then the paging entries for the code are still working. So there's a hint. Where is the kernel code and where are the kernel data? And are you putting your strings into a write-protected section or not?

Do you use write protection in kernel mode? If not, it might be worth it to start.


I know what a reserved violation means I'm just unable to relate it to my code.
There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.
It does continue to work since the only thing wrong is that string at that given moment (crash screen).
My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sat Oct 05, 2019 8:44 am 
Online
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1605
Octacone wrote:
My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?


Are your constants located in a significantly different place than your code? That's what I'm asking here. In my case, .rodata is linked directly following .text, so there's only a few KB difference. If anything made me unable to access the former but not the latter, it would have to touch pretty much only one page table entry, but leave the other PTEs, PDEs, PDPEs and PML4Es untouched.

Write-protection is what you can do with the WP bit in CR0. Once set, even the kernel can no longer write into write protected pages. Which I use after applying alternatives. Then I will know if anything tried to write into kernel. Otherwise it would just happen.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sun Oct 06, 2019 1:06 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4597
Location: Chichester, UK
Octacone wrote:
There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.

Well, of course, there is a way because it is happening. And the fact that your paging code is written in assembler is irrelevant if the code affecting the memory location is elsewhere.

As this only happens on real hardware and not emulators, the most likely cause is that somewhere you are assuming that uninitialized memory is set to 0. If the error happened in an emulator it would be easy to track - run under a debugger and set a watch on the offending memory location - but as it is you are just going to have to inspect your code to try to narrow down the likely cause.


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sun Oct 06, 2019 4:48 am 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
nullplan wrote:
Octacone wrote:
My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?


Are your constants located in a significantly different place than your code? That's what I'm asking here. In my case, .rodata is linked directly following .text, so there's only a few KB difference. If anything made me unable to access the former but not the latter, it would have to touch pretty much only one page table entry, but leave the other PTEs, PDEs, PDPEs and PML4Es untouched.

Write-protection is what you can do with the WP bit in CR0. Once set, even the kernel can no longer write into write protected pages. Which I use after applying alternatives. Then I will know if anything tried to write into kernel. Otherwise it would just happen.


Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

There is something odd doe, take a look at my section headers for a moment, I've noticed this behavior a long time ago, maybe it's correlated:
With -O0:
Attachment:
O0.png
O0.png [ 119.43 KiB | Viewed 2338 times ]


With -O2:
Attachment:
O2.png
O2.png [ 121.36 KiB | Viewed 2338 times ]


Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:
Attachment:
phO0.png
phO0.png [ 49.45 KiB | Viewed 2338 times ]

Take a look at those highlighted items, they shouldn't be there.
This is what it should look like (with -O2):
Image

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sun Oct 06, 2019 4:53 am 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
iansjack wrote:
Octacone wrote:
There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.

Well, of course, there is a way because it is happening. And the fact that your paging code is written in assembler is irrelevant if the code affecting the memory location is elsewhere.

As this only happens on real hardware and not emulators, the most likely cause is that somewhere you are assuming that uninitialized memory is set to 0. If the error happened in an emulator it would be easy to track - run under a debugger and set a watch on the offending memory location - but as it is you are just going to have to inspect your code to try to narrow down the likely cause.


It is quite hard to debug things like these because you can't use a debugger or anything. All my paging structures are initialized to zero.
I'm just wondering why would accessing a certain address with -O0 cause a fault and not with -O2. So the offending code cannot be the one written in Assembly, it must be the C++ one.

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sun Oct 06, 2019 5:16 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4597
Location: Chichester, UK
I suspect that the faulty code is nothing to do with paging. It just happens to be overwriting the memory used by the page table. It's not uncommon for optimization to reveal a bug that wasn't apparent before. Memory allocation/deallocation routines are a good candidate; a bug here can cause any part of your code to break.


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sun Oct 06, 2019 10:14 pm 
Online
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1605
Octacone wrote:
Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:


Oof, let's take these one at a time. So with -O0, your constants are located at 0xc010fXXX, and with -O2 they are at 0xc010aXXX. That's five entries difference in the page table, so your memory corruption may just hit this small range of memory.

You are not using WP, but you might want to reconsider. ATM we don't know if the page tables get corrupted or your constants are written to. I suggest changing your linker script to put a page break between the read-only sections and the read-write sections (as easy as ". += 0x1000" at that point). This should cause your output file to have two LOAD segments, one RX and the other RW. Then map your RX segment with write protection and set the WP bit once you are done modifying any code you may need to and have installed the IDT. Of course, this makes the initial paging a bit more challenging. But not a lot.

The additional .text sections are auto-generated by the C++ compiler when it instantiates a template. Perhaps that becomes unnecessary with optimization because the code is unreachable. In any case, there is nothing weird going on with your segments. The .text sections are part of the RX segment.

It is also wrong to say that you cannot use a debugger in that case. You can still use the debugging facilities of the machine in question. In this case, once you have your final page mapping, set watchpoint on the page tables (use the debug registers) and print out any debug exceptions that may occur.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Wed Oct 09, 2019 2:15 pm 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
nullplan wrote:
Octacone wrote:
Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:


Oof, let's take these one at a time. So with -O0, your constants are located at 0xc010fXXX, and with -O2 they are at 0xc010aXXX. That's five entries difference in the page table, so your memory corruption may just hit this small range of memory.

You are not using WP, but you might want to reconsider. ATM we don't know if the page tables get corrupted or your constants are written to. I suggest changing your linker script to put a page break between the read-only sections and the read-write sections (as easy as ". += 0x1000" at that point). This should cause your output file to have two LOAD segments, one RX and the other RW. Then map your RX segment with write protection and set the WP bit once you are done modifying any code you may need to and have installed the IDT. Of course, this makes the initial paging a bit more challenging. But not a lot.

The additional .text sections are auto-generated by the C++ compiler when it instantiates a template. Perhaps that becomes unnecessary with optimization because the code is unreachable. In any case, there is nothing weird going on with your segments. The .text sections are part of the RX segment.

It is also wrong to say that you cannot use a debugger in that case. You can still use the debugging facilities of the machine in question. In this case, once you have your final page mapping, set watchpoint on the page tables (use the debug registers) and print out any debug exceptions that may occur.


Interesting, why did the compiler choose those two .text sections and not some other random ones?
I did some debugging and found out something strange:
On QEMU my constants are located at 0xC010F2B1 (and contain real data) and on real hardware they're at 0x350046 (and contain garbage data)! Like how! That is impossible they should be +3GB higher (at least).
There is definitely something fishy going on.
There is really not much space when it comes to debugging on real hardware. The only thing I can do is dump registers and memory (if mapped) and that's about it.
What do I gain by using the WP bit?

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Wed Oct 09, 2019 10:25 pm 
Online
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1605
Octacone wrote:
Interesting, why did the compiler choose those two .text sections and not some other random ones?
When a template is instantiated, the compiler will generate a new section with the name indicating what exactly it contains. You can push the name through c++filt to see the cleartext version of it. If multiple CPP files instantiate the same template, their object files will contain the same template text sections. They are marked to only be linked once, so the linker throws all but one instance of the code away.

This allows you to compile and link C++ like you'd do with C. And believe me, it is a blessing. At work I have to deal with a compiler that uses a prelinker. In that system, the compiler will not instantiate templates at all, but rather, the linker is run again and again. The prelinker identifies undefined references to template instances and recompiles certain source files after telling the compiler to instantiate a certain template in there. It takes ages to complete and is extremely fragile in case the source file is no longer available at the time of the final link. Which can happen with libraries, for example.

Octacone wrote:
On QEMU my constants are located at 0xC010F2B1 (and contain real data) and on real hardware they're at 0x350046 (and contain garbage data)! Like how! That is impossible they should be +3GB higher (at least).
OK, is your linear mapping not long enough?

Octacone wrote:
There is really not much space when it comes to debugging on real hardware. The only thing I can do is dump registers and memory (if mapped) and that's about it.
That's about all you'll ever need. Once you've found the corrupt memory, you can use the debug registers to find out who wrote to it.

nullplan wrote:
What do I gain by using the WP bit?
If bad code writes into read-only sections, you get an exception immediately instead of memory corruption that crashes down the line somewhere.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Memory access bug
PostPosted: Sat Oct 12, 2019 4:38 pm 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
nullplan wrote:
When a template is instantiated, the compiler will generate a new section with the name indicating what exactly it contains. You can push the name through c++filt to see the cleartext version of it. If multiple CPP files instantiate the same template, their object files will contain the same template text sections. They are marked to only be linked once, so the linker throws all but one instance of the code away.

This allows you to compile and link C++ like you'd do with C. And believe me, it is a blessing. At work I have to deal with a compiler that uses a prelinker. In that system, the compiler will not instantiate templates at all, but rather, the linker is run again and again. The prelinker identifies undefined references to template instances and recompiles certain source files after telling the compiler to instantiate a certain template in there. It takes ages to complete and is extremely fragile in case the source file is no longer available at the time of the final link. Which can happen with libraries, for example.

That is interesting, I didn't know this. I will leave it as is.

nullpan wrote:
OK, is your linear mapping not long enough?

It is long enough, my kernel is very small and only the first 12 MB or so are mapped.

I'll have to rewrite a large chunk of my paging code in order to use the WP bit. That'll take a while.

Now this is interesting,
I tried mapping some more MB and I went from 12 to 14-ish and something interesting happened.
My debugger suggests that my constants are now located at 0x0 like what the? I can't come up with a way for that to be possible.

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
 Post subject: (Fixed) Memory access bug
PostPosted: Mon Oct 14, 2019 8:15 am 
Offline
Member
Member
User avatar

Joined: Fri Aug 07, 2015 6:13 am
Posts: 1134
Fixed!
It had nothing to do with my code, it was my toolchain.
I wanted to check and see if there was something wrong with my USB and BINGO!
I noticed that initialized variables were not initialized and that I could replicate all the faults on my emulator, so that was it.
Looks like mtools suck and don't know how to overwrite files properly so something unexplained happens and stuff gets moved around at random places and whatnot.
Sorry for bothering, but this took me quite a while to figure.

_________________
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], deblokc, nullplan, RayanMargham, SemrushBot [Bot] and 220 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group