OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 1:19 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 8:37 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
For the last week and I half, I was trying to implement a vmm, but for some reason it fails every time with a reboot(triple fault probably) when I reload the tables set up by the bootloader(load the address of my new PML4 into CR3).

I can't understand why does there occur a triple fault after page tables are loaded, I looked through the code and as I see all pages are zeroed(so no old data interferes with addresses or flags), R/W and present flags are set on them and all of them are filed with entries starting from where the kernel is, what could the problem be?

Also even if I manage to fix my VMM(hopefully someone will help me with that), that would probably still leave it in a pretty messy state. So are there any books that talk about designing a VMM, as from what I read(not everything, but I looked through all of the MM chapter) now(Tanenbaum and several others), they talk about what paging is, then talk a lot about TLB and processes which is not what I currently want?

Also is there any good vmm code(it would be good if it was for x86, and even better if it is cross-platform) that I could read(I would like some not really hard VMM as it is my first time making something like VMM)?

Finally I have a several questions about VMM:
- Should I make a separate VMM for 32-bit and 64-bit mode?
- What should I take into consideration to make it portable?


Last edited by rpio on Sun Jan 23, 2022 4:30 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 10:16 am 
Offline
Member
Member
User avatar

Joined: Thu Oct 13, 2016 4:55 pm
Posts: 1584
ngx wrote:
For the last week and I half, I was trying to implement a vmm, but for some reason it fails every time with a reboot(triple fault probably) when I reload the tables set up by the bootloader(load the address of my new PML4 into CR3).
Run qemu with "-d int" that will tell you.

ngx wrote:
I can't understand why does there occur a triple fault after page tables are loaded, I looked through the code and as I see all pages are zeroed(so no old data interferes with addresses or flags), R/W and present flags are set on them and all of them are filed with entries starting from where the kernel is, what could the problem be?
Use bochs. It has a very capable debugger, once you see the faulting address in CR2, you can do a "page" command to list the translation walking through the page tables. No other debugger can dump the tables like that, it is very useful to figure out what's wrong.

ngx wrote:
Also even if I manage to fix my VMM(hopefully someone will help me with that), that would probably still leave it in a pretty messy state. So are there any books that talk about designing a VMM, as from what I read(not everything, but I looked through all of the MM chapter) now(Tanenbaum and several others), they talk about what paging is, then talk a lot about TLB and processes which is not what I currently want?
Well, yes, there are books, but the thing is, all OS like to do things differently. There's no golden rule or algorithm you can use. Memory management is a risky business where you must make compromises, and it's up to you what you prefer: smaller footprint and slower execution or larger footprint and faster execution.

ngx wrote:
Also is there any good vmm code(it would be good if it was for x86, and even better if it is cross-platform) that I could read(I would like some not really hard VMM as it is my first time making something like VMM)?
Probably try Minix3 or xv6, but as I've said these are pretty OS-specific things. My OS/Z uses a totally different approach to those by separating address spaces from processes into a new layer (and then I call that architecture specific vmm_new() in the architecture-independent task_new() function, which in turn is called during process creation. This isn't necessarily the best approach, just one of the many).

ngx wrote:
- Should I make a separate VMM for 32-bit and 64-bit mode?
That depends. The paging tables differ on 32 bit and 64 bit, but you can have some ifdefs and share the most of the code. Or you can have two entirely separated and optimized implementations. Up to you, no "best" solution exists.

ngx wrote:
- What should I take into consideration to make it portable?
That's a difficult question. I'd suggest to study the architecture manuals for all platforms you want to support, and figure out what they have in common, and what are the platform-specific things.

Cheers,
bzt


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 10:31 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
bzt wrote:
Run qemu with "-d int" that will tell you.

Does it output statuses of all registers for every instruction? In that case how should I find the needed one and how would I debug what happened - should I look at million addresses and find the problem that way?

bzt wrote:
Use bochs. It has a very capable debugger, once you see the faulting address in CR2, you can do a "page" command to list the translation walking through the page tables. No other debugger can dump the tables like that, it is very useful to figure out what's wrong.

For some reason it would not start on my system, more correctly it would - but it would not work

bzt wrote:
Well, yes, there are books, but the thing is, all OS likes to do things differently. There's no golden rule or algorithm you can use. Memory management is a risky business where you must make compromises, and it's up to you what you prefer: smaller footprint and slower execution or larger footprint and faster execution.

Yes I understand that, it is just a bit hard for me to think of ways to implement it when I haven't seen example of others or some theory, while it is my first time writing an OS

bzt wrote:
Probably try Minix3 or xv6, but as I've said these are pretty OS-specific things. My OS/Z uses a totally different approach to those by separating address spaces from processes into a new layer (and then I call that architecture specific vmm_new() in the architecture-independent task_new() function, which in turn is called during process creation. This isn't necessarily the best approach, just one of the many).

Ok, thanks - I will look at it. Any other good code I could look at?

Also, am I reloading page tables correctly - I just load the new address(of PML4) into the CR3?

Cheers,
bzt[/quote]


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 12:20 pm 
Offline
Member
Member
User avatar

Joined: Thu Oct 13, 2016 4:55 pm
Posts: 1584
ngx wrote:
Does it output statuses of all registers for every instruction?
No, it will output one block per exception. Look for the first one. (You can use "-d in_asm" to output every instruction).

ngx wrote:
For some reason it would not start on my system, more correctly it would - but it would not work
My advice is try to solve that issue. It's always better if you test your OS in multiple VMs, plus bochs' debugger is really spectacular (since it's an emulator, you can debug things with it that no other VM capable of.) Here's a very (very) minimal configuration file you could try, it looks for the file named "disk-x86.img" as the disk image. If you're using Ubuntu, you'll have to install multiple packages, one with the front-end (called bochs-sdl or bochs-x), one with the bios images (bochsbios) and another with the vga rom images (vgabios). This config uses X11, so if you install the SDL version (bochs-sdl), then replace "display_library: x" with "display_library: sdl" in it. Also make sure that the bios and rom filenames matches the one you install, and then it should work.

ngx wrote:
Also, am I reloading page tables correctly - I just load the new address(of PML4) into the CR3?
Yes, that should do the trick. It also flushes the entire cache, so whenever possible try to use the INVPGL instruction instead.

Cheers,
bzt


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 1:59 pm 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
bzt wrote:
My advice is try to solve that issue. It's always better if you test your OS in multiple VMs, plus bochs' debugger is really spectacular (since it's an emulator, you can debug things with it that no other VM capable of.) Here's a very (very) minimal configuration file you could try, it looks for the file named "disk-x86.img" as the disk image. If you're using Ubuntu, you'll have to install multiple packages, one with the front-end (called bochs-sdl or bochs-x), one with the bios images (bochsbios) and another with the vga rom images (vgabios). This config uses X11, so if you install the SDL version (bochs-sdl), then replace "display_library: x" with "display_library: sdl" in it. Also make sure that the bios and rom filenames matches the one you install, and then it should work..


Thanks for your help. I tried using the qemu debugging method and found out that it is a page fault, but the problem is that page fault puts the error on top of the stack - so how should I obtain the error code off the stack right after the exception(as there is no way I will be fast enough to pause qemu and dump the ram)?

And also
Page fault puts the address into the CR2 - is it the address of instruction that caused fault or the address that the instruction tried to access the address, and if it is physical address in CR2 how would I debug(as it does not correspond to virtual because kernel is mapped into higher half) and obtain the virtual address from it?

Do I need to map page tables into virtual RAM in order for them to work or the CPU will work fine with physical addresses that are not mapped anywhere?


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 5:41 pm 
Offline
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 401
ngx wrote:
And also
Page fault puts the address into the CR2 - is it the address of instruction that caused fault or the address that the instruction tried to access the address, and if it is physical address in CR2 how would I debug(as it does not correspond to virtual because kernel is mapped into higher half) and obtain the virtual address from it?


CR2 is the linear address that caused the page fault. It might be the instruction being executed, or it might be data the instruction is referencing. You probably shouldn't care beyond whether the desired reference is valid. You'll get an error code pushed onto the #PF fault stack, which will indicate whether the page fault was:

- A result of user or kernel mode access.
- A result of a read or a write.
- A result of page not present or page protection violation (such as a write to a read-only page)

Your page fault handler will then just lookup the address, determine if the desired access is allowed based on the permissions of the address in question, then fix up the page table to allow the access or deliver a signal to indicate the access is not allowed (SIGSEGV), then start returning to user mode.

If the access is not allowed, at this point UNIX would post the SIGSEGV if handled, or kill the process with a core if not.

If the access is allowed, we just return to the faulting code which will retry the operation and continue.

ngx wrote:
Do I need to map page tables into virtual RAM in order for them to work or the CPU will work fine with physical addresses that are not mapped anywhere?


You can map the page tables into virtual memory, using recursive page mapping. It makes inspecting and updating the page table easier, as the page table is just a mapped array which we can use a simple C pointer to.

But that is a convenience to you and your code. The CPU will work in physical addresses, and doesn't require the page table be mapped into virtual memory. But of course, when updating any paging structures, your code will require that memory be mapped somehow anyway, so just recursively map it.

Page_Tables


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 9:15 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
ngx wrote:
Thanks for your help. I tried using the qemu debugging method and found out that it is a page fault, but the problem is that page fault puts the error on top of the stack - so how should I obtain the error code off the stack right after the exception(as there is no way I will be fast enough to pause qemu and dump the ram)?

It's included in the debug output.


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 11:29 pm 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
Octocontrabass wrote:
It's included in the debug output.

There are a lot of entries, do you know which one exactly?


Last edited by rpio on Sun Jan 23, 2022 4:49 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Sun Apr 11, 2021 11:52 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
ngx wrote:
There are a lot of entries, do you know which one exactly?

Code:
195: v=08 e=0000 i=0 cpl=0 IP=0008:ffffffff8010203a pc=ffffffff8010203a SP=0010:ffffffff801116a0 env-

v=08 is the vector number, so 8 means "double fault".
e=0000 is the error code, so 0 in this case.
IP tells you the address of the instruction that triggered this exception.

To debug the page fault, you want to look at the dump where "v=0e".

Code:
check_exception old: 0x8 new 0xe

This is saying that the previous exception was a double fault (0x8) and that the next one is a page fault (0xe).

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:02 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
kzinti wrote:
ngx wrote:
There are a lot of entries, do you know which one exactly?

Code:
195: v=08 e=0000 i=0 cpl=0 IP=0008:ffffffff8010203a pc=ffffffff8010203a SP=0010:ffffffff801116a0 env-

v=08 is the vector number, so 8 means "double fault".
e=0000 is the error code, so 0 in this case.
IP tells you the address of the instruction that triggered this exception.

To debug the page fault, you want to look at the dump where "v=0e".

Code:
check_exception old: 0x8 new 0xe

This is saying that the previous exception was a double fault (0x8) and that the next one is a page fault (0xe).



Thanks, but now I have the vector
Code:
check_exception old: 0x8 new 0xe
, the error code
Code:
e=0000
and the address
Code:
CR2=00000000074470f8
- the problem is that there is no possibility my kernel would even try to access anything near this address after enabling interrupts as it is mapped in the last 2GB of the higher half(0xffffffff80100000) and I haven't mapped anything below that address, but even more - immediately after reloading page tables there is an endless loop which won't even allow jumping back to the main kernel code from VMM initialization.


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:13 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
thewrongchristian wrote:
CR2 is the linear address that caused the page fault. It might be the instruction being executed, or it might be data the instruction is referencing. You probably shouldn't care beyond whether the desired reference is valid. You'll get an error code pushed onto the #PF fault stack, which will indicate whether the page fault was:

- A result of user or kernel mode access.
- A result of a read or a write.
- A result of page not present or page protection violation (such as a write to a read-only page)

Your page fault handler will then just lookup the address, determine if the desired access is allowed based on the permissions of the address in question, then fix up the page table to allow the access or deliver a signal to indicate the access is not allowed (SIGSEGV), then start returning to user mode.


The address in CR2 can't be linear, I certainly think it is physical as in my case it is
Code:
CR2=00000000074470f8

when my kernel is higher half(0xffffffff80100000) and here is the code that goes after reloading page tables(so no faulty instructions are in the way)
Code:
while(1)
  ;


Last edited by rpio on Sun Jan 23, 2022 4:49 am, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:42 am 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
CR2 is only used with page faults. You didn't show us the dump for the page fault (vector 0xe), you showed us the dump for a double fault (vector 0x8).

ngx wrote:
the problem is that there is no possibility my kernel would even try to access anything near this address after enabling interrupts as it is mapped in the last 2GB of the higher half(0xffffffff80100000) and I haven't mapped anything below that address

Your kernel can access all of the virtual address space. Whether or not you mapped anything in low memory doesn't mean that you don't have a bug where your code tries to access it. But again, you haven't showed us the proper dump for the page fault, so we still don't know which address the CPU was trying to access.

_________________
https://github.com/kiznit/rainbow-os


Last edited by kzinti on Mon Apr 12, 2021 12:45 am, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:44 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
kzinti wrote:
CR2 is only used with page faults. You are not looking at a page fault (vector 0xe), you are looking at a double fault (vector 0x8).

Then why is there written - check_exeption old: 0x8 new 0xe in the end?


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:47 am 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
ngx wrote:
Then why is there written - check_exeption old: 0x8 new 0xe in the end?

This line is printed when a new exception happens. It is telling us that the last exception was a double fault (we know that, you copied the dump) and that the next one is a page fault (which you haven't showed us).

It's not showing this at the end of an exception, it's showing this at the beginning of it.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Paging resources and wrong code
PostPosted: Mon Apr 12, 2021 12:51 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
kzinti wrote:
ngx wrote:
Then why is there written - check_exeption old: 0x8 new 0xe in the end?

This line is printed when a new exception happens. It is telling us that the last exception was a double fault (we know that, you copied the dump) and that the next one is a page fault (which you haven't showed us).

It's not showing this at the end of an exception, it's showing this at the beginning of it.


Sorry, I have just seen that this had 0x8 and previous was a 0xe(before I though that all were 0x8 and haven't bothered looking). But the one before is: https://pastebin.com/z2sHpyLv, which indicates instruction fetch error because of a non present page(0010), while in my checks the page is present and writable https://pastebin.com/9sUDyKEw (line 139)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 65 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group