Hi,
darkinsanity wrote:
Another problem that bothers me is, when and how do I clean up threads? When a thread exits, the kernel uses the kernel-stack of the thread, so I can't immediately free it. But when do I do it, then? Should I let the scheduler check if there are threads that can be cleaned up, or do I add them to a list so the idle-thread does the cleanup (and what would happen if the CPU is kept busy)?
I normally have a FIFO queue of "thread data blocks" (which includes the thread's kernel stack) for threads that have been terminated but not completely cleaned up yet. This solved a few problems. For my messaging, messages are sent to a thread, and if the same thread ID is re-used too quickly then software might not know the old thread was terminated and accidentally send messages to the new thread. The FIFO queue prevents that by ensuring the same thread ID isn't re-used too quickly. The other thing it helped with is resource usage. When a thread is being spawned the kernel checks that FIFO queue of "thread data blocks", and if the oldest one is old enough it's recycled (which helps performance a little because it's already allocated and mapped into kernel space). Finally; when the kernel is running low on free physical pages it goes looking for pages it can free; which includes checking the FIFO queue of "thread data blocks" and freeing them if they're old enough.
darkinsanity wrote:
Or should I switch to a one-stack-per-CPU-approach?
Switching from one kernel stack per thread to one kernel stack per CPU involves changing semantics that a lot of the micro-kernel's existing code may rely on. You'll have to weigh up the advantages and disadvantages and decide if it's something you want to do.
To help you make that decision, the advantages are:
- Less memory usage
- Slightly faster spawning and terminating threads
- Better cache locality for kernel (e.g. less cache and TLB misses when kernel accesses its stack after a thread switch)
And the disadvantages:
- No easy way to have "kernel threads" that do things like cleaning up kernel data and/or doing work before it's needed (pre-fetching, pre-calculating, whatever) when there's nothing more important to do.
- When the kernel is in the middle of doing something less important; there's no way to stop and switch to something more important (e.g. creating a new lower priority process when a high priority IRQ occurs). This means that to avoid potential latency problems you might end up splitting up the more expensive operations into multiple small pieces so that you've got a chance to switch to more important things in-between those smaller pieces. For example; rather than creating a new process in one go; you might create some sort of "process data block", check if there's something more important to switch to, create a new/empty virtual address space for the process, check if there's something more important to switch to, then create a new "thread data block" for the process' initial thread.
Note: For both of the "disadvantages" above; I'd seriously consider splitting things up into "tasklettes" (little pieces of work the kernel needs to do - e.g. "handle system call from thread 0x1234", "create virtual address space for new process 0x4321", etc); and giving each tasklette a priority and having a (per CPU?) list of tasklettes for each priority. When anything happens (IRQ, kernel syscall, etc) you'd put a tasklette on the appropriate list. The kernel would do whichever tasklettes are higher priority than the current highest priority (user-space) thread before returning to the current highest priority (user-space) thread. It's like having a miniature "tasklette scheduler". Of course you'd want to make sure that creating and freeing tasklettes is very fast, and if its faster to do something immediately instead of creating a tasklette you'd do it immediately.Cheers,
Brendan