Message passing, allocation and how to copy

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

I've reached the point where I'm going to implement a message passing system for my kernel.

The messages is of asynchronous type. Each thread has a message queue.

QNX seems to have solved it with that you copy contents of your local buffer to the receivers buffer. All buffers are allocated locally, for example dynamically or on the stack. This also give you a limit on the receiver side how large message it can receive.

I want to avoid this and I want to have a methods where you can allocate and delete the messages yourself in the code. This because if the message is big enough it could be transferred with just a page flip without any copy. The message size could be any size up to a certain limit, maybe a few MB.

Now, I'm a bit confused how to implement such thing. For kernel threads it is easy, just give the pointer to the other thread.

For user processes it becomes more complex. You allocate some message in user space (should it be in a separate virtual area or just or just somewhere on the heap?). Then it decides to send it, switch to kernel mode. For messages that are small that should be copied over result in a problem since the memory of the receiving process isn't available until you have switched context. So basically how do you copy from one user space to another? You could copy it in a kernel area first but that would require two memory copies until it has reached its destination which is something you would want to avoid.

Also message allocation is something that should be done quite fast and with as little fragmentation as possible. What do you usually use here? Buddy+stack based one, or slabs?

gravaera · **Posted:** Tue Jul 20, 2010 4:34 pm

That or you could look into inventing a way to walk foreign page tables. When you can walk process A's tables from process B, you'll suddenly see a whole world of design opportunities unfold before you.

--Good luck,
gravaera

gerryg400 · **Posted:** Tue Jul 20, 2010 7:02 pm

Quote:

So basically how do you copy from one user space to another?

There are many ways.
1) Use a segmented memory model so that more than one process can appear in memory at a time. Then memcpy.
2) Map part (or all) of the 2nd process into the 1st process' address space. Then memcpy.
3) Convert the 2nd process addresses to physical and map required pages. Then memcpy.
4) If messages are small, memcpy to kernel buffer, switch context then memcpy from kernel buffer to process.
5) Permanently map part (or all in x86_64) of physical mem into every address space. Convert one of the messages from virt to phys. Then memcpy page at a time.
6) Pass messages in a shared memory area, either globally shared, or shared between pairs of processes.
7) Have a kernel call that allocates a message buffer that can be transferred by the kernel from process to process.
etc.

For synchronous messages, in my 32bit x86 kernel I used 1, 4 and 3 (stopped using 3 when I bought more than 1G of RAM). In my 64bit x86 kernel I use 4 and 5.

For asynchronous messages, if the receive side isn't ready, I use fixed size allocations from the slaballocator and do a double copy.

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

gerryg400 wrote:

Quote:

So basically how do you copy from one user space to another?

There are many ways.
1) Use a segmented memory model so that more than one process can appear in memory at a time. Then memcpy.
2) Map part (or all) of the 2nd process into the 1st process' address space. Then memcpy.
3) Convert the 2nd process addresses to physical and map required pages. Then memcpy.
4) If messages are small, memcpy to kernel buffer, switch context then memcpy from kernel buffer to process.
5) Permanently map part (or all in x86_64) of physical mem into every address space. Convert one of the messages from virt to phys. Then memcpy page at a time.
6) Pass messages in a shared memory area, either globally shared, or shared between pairs of processes.
7) Have a kernel call that allocates a message buffer that can be transferred by the kernel from process to process.
etc.

For synchronous messages, in my 32bit x86 kernel I used 1, 4 and 3 (stopped using 3 when I bought more than 1G of RAM). In my 64bit x86 kernel I use 4 and 5.

For asynchronous messages, if the receive side isn't ready, I use fixed size allocations from the slaballocator and do a double copy.

Actually, 6 looks interesting. You could temporary read-only map the pages in a transfer area in the receiving process virtual address space. Each page has one message and an offset points to the location of the message in the page. If there are several messages in the queue, it will map each page after each other. Now you can copy each message into the message heap of the receiving process. The question is when. If you have several messages in the queue it will take some time to copy all the messages and the execution is delayed. You could copy on demand but that is kind of complex and it delays when the message buffer is freed for the sending process.

gerryg400 · **Posted:** Tue Jul 20, 2010 8:11 pm

Perhaps the best way to decide is to make some use cases. Try to predict how you will use your message passing system. How big will messages be? Which/how many processes will be sources and sinks ? Synchronous or asynchronous ? Try to consider real world situations, for example how a database will be opened and read, a log file written, the fs-cache flushed to disk, an mpeg played or how a process will be loaded. It's not always obvious which way is best until you do that.

This will help you find the best solution for you.

NickJohnson · **Posted:** Tue Jul 20, 2010 8:17 pm

I use something like 2 or 6, and it works very well. When a process sends a message, the kernel copies the physical addresses of the pages containing the message (the message must be page aligned), unmaps them from the sender's address space, and gives them as a sort of capability to the receiving process, which can then map them at its leisure wherever it wants. There's no shared memory at any point, and therefore no locking or waiting on the sender's side, but nothing really has to be copied, so it's effectively as fast as shared memory.

gerryg400 · **Posted:** Tue Jul 20, 2010 9:50 pm

NickJohnson, do you do process loading across your message passing interface ? Or is your process loader in the kernel ?

I have found this to be the thorniest issue. My process loader is part of my memory manager, which is a process itself. The question of how to get exe images from the file system to the memory manager is my current obsession. In my previous O/S, the mm opened the exe and did a plain old Posix 'read'. I'm hoping this time for something a bit smarter that allows for demand-loading.

Similarly, how should one implement mmap'd files when one's file system and memory manager are separate ?

OSwhatever, I think this is something you should consider before committing to a scheme.

computafreak · **Joined:** Sun Dec 14, 2008 1:53 pm **Posts:** 76

I use 3 almost exclusively because it was the simplest way that I could think of, and when I was learning how to set up and manage virtual address spaces it was easy for me to get my head around. Recently, I've been toying with the idea of using COW, so that I don't use as much physical memory; obviously there'd be constraints on it, so that the amount of data being copied is minimal in every case.

Brendan · **Posted:** Wed Jul 21, 2010 5:59 am

Hi,

OSwhatever wrote:

4) If messages are small, memcpy to kernel buffer, switch context then memcpy from kernel buffer to process.

I use 4), but for medium sized messages (between about 1 KiB and 1024 KiB) I move page table entries (from sender's address space to message queue to receiver's address space) instead of copying the data; and for large message (more than about 1024 KiB) I move page directory entries.

To minimise hassle messages are moved and not copied. For speed, each thread has 2 message buffers at fixed virtual addresses (e.g. in a "thread local storage" area) and the virtual memory manager understands that these buffers are special; so that the message data can be sent/received without checking if a pointer to the message data is dodgy or points to something strange (e.g. a memory mapped device).

Apart from the implementation details; there's about 5 characteristics that should be considered when designing/describing messaging:

Blocking: Can the message sender be blocked as part of sending a message (and under which conditions)? Can a message receiver block when receiving a message (and under which conditions)? For me, sending a message never causes a thread to block. When receiving a message there's 2 kernel API functions - one that is guaranteed not to block ("check_for_message()") and one that will cause a thread to block if the message queue is empty ("get_message()"). Most asynchronous messaging systems are similar to this, and synchronous messaging systems have a lot more reasons for when a task might block.
Reliability Guarantees: For me, messages that were sent correctly aren't guaranteed to be received correctly (e.g. due to network failures), unless the receiver is the virtual file system.
Ordering: For me, messages from a specific thread are guaranteed to arrive in order (if they arrive at all); but messages from different threads aren't guaranteed to arrive in any specific order. For example, imagine if threadA sends a message to threadC and then sends another message to threadB; and when threadB receives its message it sends a message to threadC. In this case threadC may receive the message from threadB before it receives the (earlier) message from threadA.
Scope: For some systems the IPC only works when the sender and receiver are running on the same computer, or there's different guarantees/behaviour when the receiver isn't local. For me, the messaging is "cluster-wide" (e.g. the behaviour is the same regardless of whether the receiver is local or remote, and the sender doesn't need to care if the receiver is on the same computer or not).
Protection/permissions: Most OSs have some way of restricting who can talk to who (one example is capabilities). For my OS there's none of that - anyone can send messages to anyone, and the message receiver is responsible for deciding if it should ignore the message or not (if necessary). To allow the receiver to do these checks the OS always tells the receiver who sent the message and guarantees that the "senderID" can't be forged.

Cheers,

Brendan

NickJohnson · **Posted:** Wed Jul 21, 2010 1:40 pm

gerryg400 wrote:

NickJohnson, do you do process loading across your message passing interface ? Or is your process loader in the kernel ?

I have found this to be the thorniest issue. My process loader is part of my memory manager, which is a process itself. The question of how to get exe images from the file system to the memory manager is my current obsession. In my previous O/S, the mm opened the exe and did a plain old Posix 'read'. I'm hoping this time for something a bit smarter that allows for demand-loading.

I currently have the kernel load executables, but I'm planning to modify that in the future. It seems like as long as the memory manager can catch page faults, it could easily implement demand paging.

OSDev.org

Message passing, allocation and how to copy

Who is online