OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Jun 24, 2022 5:15 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Sun May 08, 2022 4:59 pm 
Offline
Member
Member

Joined: Mon Aug 27, 2018 12:50 pm
Posts: 43
Hey everyone,

I've been thinking quite a bit about this lately, so I thought I'd ask how other's have tackled the issue. Apologies if this fits better in General Ramblings, it's meant to be about design.

I am writing a new micro-kernel and I've just reached the section on memory management. Previously, I just had a very simple manager in the ukernel which just allocated page frames to a process' address spaces. I never supported mmap(), because I don't want the kernel to have to send calls to user-space directly. In my head only user programs should IPC to other user programs, and the kernel should just facilitate this. To load a file into memory (for an execve()), I was literally calling read() on the FS server. My method of executing a file was extremely bad, and involved the kernel replacing the current program with an "exec server" which would then contact FS and load the file and jump to the entry point.

I want to do something better. I want proper memory management, with different regions of memory - something higher level than raw pages. I want an mmap() that can load files, anonymous regions and memory shares, and I want an execve() that doesn't need a special loader program. I've come up with all the technical designs, but I don't know where to put it. I can't figure out quite how it works without something being in the kernel - which I don't really want.

If the MM (memory manager) is a server, then it can cleanly request files from FS in an mmap(), but won't be able to force another process to jump to the entry point in an execve().
If the MM is in the kernel, then it can't cleanly request from the FS (kernel shouldn't be IPCing to a program directly), but it can force the entry point jump.
If both the MM and FS are in the kernel, then it's not really a micro-kernel anymore.

The only solutions I can think of are:
  • have a special syscall just for the MM server, that lets it modify any PCB (to set the EIP)
  • have the MM server run in ring-0 but as it's own process still, with it's own address space.
  • map the kernel data structure into MM's address space as RW/User/Present
  • have both as servers, make execve() occur in my libc, and have it mmap() the new file in first, then jump itself to the entry point - only issue is if the libc gets (re)moved in memory as part of the mmap() (this is why it's normally in the kernel)
All of these seem to have their own issues with them though.

Does anyone have any better solutions? (I'm sure there are many)
Has anybody got cautionary tales from when they tried to do something similar?
It seems like the MM and FS are pretty closely intertwined - how have people separate them into different servers?

Thanks,
Barry


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 09, 2022 9:45 am 
Online
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 281
Barry wrote:
Hey everyone,

I've been thinking quite a bit about this lately, so I thought I'd ask how other's have tackled the issue. Apologies if this fits better in General Ramblings, it's meant to be about design.

I am writing a new micro-kernel and I've just reached the section on memory management. Previously, I just had a very simple manager in the ukernel which just allocated page frames to a process' address spaces. I never supported mmap(), because I don't want the kernel to have to send calls to user-space directly. In my head only user programs should IPC to other user programs, and the kernel should just facilitate this. To load a file into memory (for an execve()), I was literally calling read() on the FS server. My method of executing a file was extremely bad, and involved the kernel replacing the current program with an "exec server" which would then contact FS and load the file and jump to the entry point.

I want to do something better. I want proper memory management, with different regions of memory - something higher level than raw pages. I want an mmap() that can load files, anonymous regions and memory shares, and I want an execve() that doesn't need a special loader program. I've come up with all the technical designs, but I don't know where to put it. I can't figure out quite how it works without something being in the kernel - which I don't really want.

If the MM (memory manager) is a server, then it can cleanly request files from FS in an mmap(), but won't be able to force another process to jump to the entry point in an execve().
If the MM is in the kernel, then it can't cleanly request from the FS (kernel shouldn't be IPCing to a program directly), but it can force the entry point jump.
If both the MM and FS are in the kernel, then it's not really a micro-kernel anymore.

The only solutions I can think of are:
  • have a special syscall just for the MM server, that lets it modify any PCB (to set the EIP)
  • have the MM server run in ring-0 but as it's own process still, with it's own address space.
  • map the kernel data structure into MM's address space as RW/User/Present
  • have both as servers, make execve() occur in my libc, and have it mmap() the new file in first, then jump itself to the entry point - only issue is if the libc gets (re)moved in memory as part of the mmap() (this is why it's normally in the kernel)
All of these seem to have their own issues with them though.

Does anyone have any better solutions? (I'm sure there are many)
Has anybody got cautionary tales from when they tried to do something similar?
It seems like the MM and FS are pretty closely intertwined - how have people separate them into different servers?

Thanks,
Barry



I've never understood the idea of removing paging entirely from a microkernel. Handling CPU page faults is inherently a kernel task, and fixing up mappings similarly so.

What isn't necessarily a micro-kernel task is loading the page that will be mapped. This is where the separation needs to come in.

So, in the (micro)kernel, you'll need a page fault handler, which maps the virtual address to some sort of region, and that region will need some sort of handler to load missing pages.

The microkernel page fault handler then becomes a matter of looking up the region, getting its handler, and asking that handler to provide the details of the page that will service the page fault in this region at the given offset.

Now, this handler would be some sort of memory segment driver, which in a monolithic kernel, might ultimitely resolve to a file system driver that will load the corresponding page from disk. Or it may be an anonymous page driver that provides fresh 0 filled pages for new mappings, or loads existing pages from swap for existing swapped data.

But the point is the (micro)kernel doesn't need to know these details. It just has a region, and a page offset, which it defers to something else to handle.

This is where your user level paging or file system kicks in. The page fault handler can just send a message to your user process to load the page desired however it should be handled. The process then just returns the possibly newly read page to the page fault handler, which maps it into the virtual address required, and returns from the page fault handler.

So, for your microkernel abstraction, your mmap will create a microkernel region, that will be used as a handle to whatever provides the missing pages to be mapped. Your virtual address space manager can live in the microkernel, the virtual address space and set of regions mapped therein are sufficiently abstract already.

Your exec can then live in user space, creating the regions in the exec client address space as required (pointing them however you want to your file system code that does that actual page read/write), then once you've set up the regions, set the client process in motion by pointing it at the entry point.

The other thing I also struggle with microkernel wise is in separation of MM/FS. Minix provided a separate MM and FS, which I never understood as it would make page faults horribly inefficient.

I'm personally of the opinion that the mechanism of MM (such as page fault handling and mapping) lives entirely within the kernel. So long as the page desired is known to the kernel, there is no reason why we should transition to user space or filesystem code to resolve a page fault. That would also imply the MM handles or knows about the filesystem caching, caching by file identity/file offset, allowing the resolution of page faults to pages that are in the cache, without filesystem specific intervention code. So long as your management of working set pages is handled well in your kernel, you won't be reaching out to the filestsystem code that often, and when you do, it's likely that you'll be doing IO to resolve the fault anyway, so a trip to user space is not a ig deal in the grand schema of things.

That's what I'm aiming at in my kernel. It's not a microkernel, it still presents as a monolithic kernel in that the system call interface is handled by the kernel proper.

But my MM/VFS is integrated to cache exclusively by file identity/offset, which the page fault handling can use without filesystem intervention, and my file systsem drivers have the option to be moved to user space, where they'll only be invoked for the relatively slow operations of doing actual I/O.


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 09, 2022 10:51 am 
Offline
Member
Member

Joined: Mon Aug 27, 2018 12:50 pm
Posts: 43
Thanks - that's a nice little insight into how it works for your system. I'll probably end up taking quite a bit of inspiration from it.
It sounds like you opted for having the MM in the kernel and letting it spit out calls to servers directly. I'm starting to think this is the best approach. I've been debating with putting MM and the VFS in the kernel completely, but having all the FS drivers still as their own servers - or maybe I'll just switch to a mono-kernel.
I'm just struggling to understand how a completely separate process can update a different process - only the kernel should be able to do this, right?

I know exactly what you mean about Minix though, it's been confusing me for the last few weeks straight. The Minix model for exec / page-fault handling seems really inefficient. It just seems like there will always be some things that have to occur outside of an isolated "process". Obviously they're still in a process, but they're running as the calling process, not their own thing.

Is having the micro-kernel send messages itself normal for micro-kernel designs? I always assumed only user-mode programs should.

Thanks again,
Barry


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 09, 2022 11:16 am 
Offline
Member
Member

Joined: Sun Jun 23, 2019 5:36 pm
Posts: 590
Location: North Dakota, United States
I mean, if I can figure out paging and all that properly (the paging wiki article uses graphics all over the place and for some reason the Intel manuals just confuse me), I plan to let the kernel send messages to processes via IPC or something else. I don't see why it shouldn't be okay. There's no reason to strictly follow the microkernel or unikernel design; its your OS and you should feel free to build it however you like. If you want to mix microkernel and monokernel together, go for it. If you want to follow a design strictly, go for it.


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 09, 2022 12:13 pm 
Offline
Member
Member

Joined: Mon Aug 27, 2018 12:50 pm
Posts: 43
Ethin wrote:
I don't see why it shouldn't be okay. There's no reason to strictly follow the microkernel or unikernel design; its your OS and you should feel free to build it however you like. If you want to mix microkernel and monokernel together, go for it. If you want to follow a design strictly, go for it.
This is a very good point, and I think I'll just put MM in the kernel and let it send messages itself. This seems to be what most other people are doing.


Ethin wrote:
I mean, if I can figure out paging and all that properly (the paging wiki article uses graphics all over the place and for some reason the Intel manuals just confuse me)
If you haven't take a look at recursive paging before it really simplifies it all, even though it probably takes a good bit of thinking to understand initially.

Quick rundown of paging (pretty sure everyone has trouble understanding it at first, you're not alone at all):
CR3 should hold a physical address of a page (4096 bytes)
that page (the page directory) will hold 1024 physical addresses of more pages
each of those pages (the page tables) will hold 1024 physical addresses of more pages
those pages are accessible in memory byte-by-byte relative to where they are in the page tables / directory over all: e.g.
table#0, page#0 is at 0x00000000
table#0, page#1 is at 0x00001000
table#0, page#1023 is at 0x003ff000
table#1, page#0 is at 0x00400000
each page is accessible by reading/writing to that location, but will actually write to whatever address is stored in your page table in physical memory.

Hopefully that helps a little, and best of luck with paging. Recursive paging is very similar, but you just set the last page table to be the page dir, such that the last 4MB of memory is the continuous, in-order list of physical page frame addresses, and the last 4KB is the page table addresses. Look into it further if you want to.


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 09, 2022 3:56 pm 
Online
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 281
Barry wrote:
Thanks - that's a nice little insight into how it works for your system. I'll probably end up taking quite a bit of inspiration from it.
It sounds like you opted for having the MM in the kernel and letting it spit out calls to servers directly. I'm starting to think this is the best approach. I've been debating with putting MM and the VFS in the kernel completely, but having all the FS drivers still as their own servers - or maybe I'll just switch to a mono-kernel.
I'm just struggling to understand how a completely separate process can update a different process - only the kernel should be able to do this, right?


gdb can trace other processes for debugging purposes. Same thing here. It's not unreasonable for related processes to be able to manipulate each others state, security permitting.

Microkernels just take that a step further. Setting up a child state could even be done using the same messages used to manipulate a process' own resources. For example, a mmap syscall message might be something like:

Code:
struct dommap {
  pid_t pid;
  void * addr;
  size_t length;
  int prot;
  int flags;
  int fd;
  off_t offset;
};


Now, you can use that same message format to set up your own mmap (using your own pid) or a new process being exec'd (using that process' pid).

Starting the exec'd process off as well could be a message sent to that process. The kernel might have a core of simple message primitives that it can interpret on the processes behalf, to manipulate state (such as mapping data, setting register state) that could be used not only for debugging, but the entire exec state for a process.

Barry wrote:
I know exactly what you mean about Minix though, it's been confusing me for the last few weeks straight. The Minix model for exec / page-fault handling seems really inefficient. It just seems like there will always be some things that have to occur outside of an isolated "process". Obviously they're still in a process, but they're running as the calling process, not their own thing.


I might be coloured by my memory of Minix. Going from memory, when a process forks, a message is sent to the MM to do any per-process work, and also to the FS, for it to do per-process work. Now, I can understand that a process has some per-process state, but that's merely a thin veneer of open file descriptor information. Open file state itself is not per-process, at least in UNIX, and that information is file system agnostic.

The actual FS state for an open file, such as tracking allocation data and inodes etc. is very much process agnostic. Files don't live in the context of a process (again, in UNIX at least) so it seems strange that a FS server would need to know when a process has forked and to track such information.

The other thing to remember is that much microkernel research was done on architectures that made address space isolation much cheaper than contemporary x86 designs. I think AST could envision RISC surging ahead in units shipped in the 1990s, and the MMU design of architectures like MIPS and SPARC had address space ids built in from the start, so switching address spaces wasn't nearly as expensive as the CR3 update the i386 required (with it's corresponding complete TLB flush.)

Unfortunately, Microsoft DOS and later Windows put paid to that rosy future, and PC operating systems lagged probably 10-15 years as a result, as well as tying us to x86 compatibility in the long term.

Barry wrote:
Is having the micro-kernel send messages itself normal for micro-kernel designs? I always assumed only user-mode programs should.


I don't see why a kernel cannot send messages. They're just packets of data to convey intent or information, so it makes sense for the kernel to do that in the same way any other message sender would. Then you just need a single mechanism to receive and act on those messages.


Top
 Profile  
 
 Post subject: Re: Memory Management in Micro-Kernels (mmap and execve)
PostPosted: Mon May 16, 2022 2:34 pm 
Offline
Member
Member

Joined: Mon Aug 27, 2018 12:50 pm
Posts: 43
Small update:
I've ended up putting the Memory Manager in the kernel, and the VFS is still a server. At a later point I may move MM into a server, or VFS into the kernel depending how my system changes. I'm opting to have some user-mode pagers (necessary when the VFS is in user-space, but anonymous memory is handed out by the kernel still, and it calls out to the disk driver server. It's a little bit clunky, but it works. I had an idea to either just have an actual swap server, use a swap file, or some kind of hybrid file system that stores the actual memory regions on disk (backing each region with it's own file). The only thing I actually have to be careful about is that the kernel doesn't move the DISK/VFS servers into swap, since then it won't be able to call them to recover them.
To unify my interfaces a bit, and to make it easy to move stuff in to / out of of the kernel, I've decided that my MM calls (such as mmap) are going to be messages that you can send to the kernel, rather than actual interrupts. This just keeps my syscall list small, and saves me from rewriting a lot of lines if I decide later to put MM in userspace or VFS in the kernel.

thewrongchristian wrote:
Your exec can then live in user space, creating the regions in the exec client address space as required (pointing them however you want to your file system code that does that actual page read/write), then once you've set up the regions, set the client process in motion by pointing it at the entry point.
This is a good idea, and I'm actually tempted to make exec it's own server, having re-read this. That'd be really nice because then I wouldn't have something bulky like ELF parsing in the kernel (incase you couldn't tell, I really hate putting stuff in the kernel). I guess I'd just need to do some privilege checking on the caller of mmap() to make sure a malicious program isn't changing another program's memory layout. I just don't like the idea of having a feature, i.e. the id parameter in mmap(), just for a single program to use.

Thanks for the ideas,
Barry


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group