Korona wrote:
There are several separate problems that you have to solve:
The first one is allowing nested IRQs. x86 allows multiple IRQ priorities (determined by the vector number) and potentially there can be one active IRQ per priority level. In order to allow this you would either have to (i) run IRQs on the usual kernel stack and not via a separate task gate (on i386) or IST stack (on x86_64) or (ii) have some scheme to handle IRQs that overwrite live stacks. Both of these hurdles can possibly be overcome by some engineering. (i) increases the required size of all kernel stacks (which is a problem if your kernel uses one kernel stack per thread) while (ii) is much more difficult to implement correctly. Note that usage of syscall on x86_64 basically demands (ii) because if you run IRQs on the default stack there is a race between stack switching (i.e. movq gs:some_offset, %rsp; swapgs; sysret) and IRQs. This could probably be solved by swapping (no pun intended) swapgs and sysret, at the cost of an additional clobbered register.
The second problem then becomes the implementation of a lockfree scheduler. While something like the O(1) scheduler can be implemented using lockfree linked lists, advanced schedulers require more complex data structures. I do not know of any competitive lockfree tree implementation that would enable something like a lockfree CFS. Note that lockfree data structures usually perform much worse than locking data structures unless there is heavy contention. Another problem might be serializing access to thread data structures: For example my scheduler locks threads during scheduling in order to make sure that it is not killed/ptrace()d/whatever during waiting -> running and running -> waiting state transitions. That would have to be solved in a lockfree manner, too.
The last problem is how to perform a context switch with interrupts enabled. This involves multiple problems: For example on x86_64 the kernel might have to do a swapgs during context switch. The IRQ handler would have to deal with situations where there the interrupt occurred after swapgs but before iret. This adds further complications (and thus runtime overhead) to each IRQ dispatch.
Because of these difficulties and overheads I doubt whether such a system would be even desirable.
Sorry, I wasn't clear enough what I meant with "lockfree scheduler". You can use lockfree primitives for a completely lockfree transition but "quick" locks and disable interrupt are also allowed. This as opposed to a scheduler where you disable the interrupts through out most the transition and interrupts a first allowed again "on the other side" in the new context. I'm currently investigating how you can avoid disabling the interrupts for that long in the scheduler as this is really "forever" in computer time.
Correct me if I'm wrong but I think it can be possible if you run in a new context every time including interrupts. In this case an irq stack isn't enough, you would need a new fresh context and stack for every irq handler entry. You can solve this by preallocating those thread for the interrupt handler but for a system with 256 interrupt priorities, do you preallocate 256 threads? Also allocating on demand wouldn't work that well in interrupt handlers as it would likely involve the memory subsystem and you will end up in deadlocks.