In my microkernel, I have completely forgotten about async messaging. The problem with async messages is the same problems Mach had with its messaging. The approach I plan on using is like what @AndrewAPrice mentioned. A message when sent, will work as follows:
1) User calls sendmsg(), passing a memory address containing message data and the message ID and other things, and the thread ID of the destination.
2) Kernel acquires a semaphore, which contains how many threads are waiting to send a message to this thread.
3) Kernel obtains physical address of message data, and then alters the PTEs and PDEs for the destination thread to map that data at a virtual address. In my kernel, memory management is outside of the kernel (save for the MMU and a kernel use only physical memory allocator and kernel object pool), so the region from 0x0 - 0x1000, is unmapped, 0x1000 - 0x2000 contains the TCB of the current thread, and 0x3000-0x300000 contains a message area where message data is. This does limit the size of a message, especially if there are multiple messages at once in a process, but it was the best solution I came up with
.
4) The kernel then temporarily boosts the priority of the receiving thread so that it would preempt us, and then initiates a context switch to this thread.
5) Receiver thread handles message
6) Receiver thread calls ackmsg(), which awakes the thread waiting for this thread to process the message
7) Sender wakes up, and goes about its business
Please tell me any kinks in this system