Thread-safe kernel stacks in Multi-CPU

Agalloch · Post by **Agalloch** » Thu Aug 10, 2017 5:08 am

Hi,

While rewriting my Scheduler for Multi-CPU, I'm finding a race condition that I'm struggling to choose an elegant solution to. The Kernel uses per-thread Kernel stacks and currently (single-CPU), when I switch directly to a new thread, I put the current thread on some kind of list - possibly a ready-to-run list - then I search the ready-to-run list for a thread to run. Because it's single-CPU and interrupts and disabled, there is no chance of anything trying to run the thread I just placed on the list. Later I do something like the following…

Code: Select all

PUSHAD
... (other stuff)
MOV    [currentThreadData], ESP
MOV    ESP, [newThreadData]
... (other stuff, CR3 etc)
POPAD
RET

Which means that between placing the Thread on the list and moving to a new Thread - including all the code to find the next task to run - I will be using the Kernel stack for the thread I placed on the list. Now (multi-CPU), I put the current thread on some kind of list - possibly another CPU's ready to run list - then I search my own ready-to-run list. At this point, I am using the First threads Kernel stack while another CPU might remove the same thread from its own ready to run queue and also begin using the same Stack.

I can think of three solutions, and was hoping for some input on each and whether there is something I have overlooked...

1. Either a new thread state (scheduling) or a lock in the thread data. This would mean a new CPU has to wait after removing the thread from the list until the first CPU has chosen the next task to run, which I don't like as it seems like an unnecessary stall.
2. Guarantee that all code that can be used following a decision to perform a task switch avoids using the Stack. This would mean leaving Interrupts disabled from the moment I decide to yield the current task, through all code to put it on a ready-to-run, sleep or I/O queue, choosing a new Task and saving possibly large amounts of context (FPU/SSE etc).
3. Switch to a per-CPU Kernel Stack when yielding the current thread until switching to the next. This would allow me to enable IRQ's at points throughout scheduling - with task switching postponed. However, it also means I should probably just switch to per-CPU Kernel stacks instead of per-Thread Kernel stacks, which I don't like as they limit the ability to have Kernel-mode threads that I really like.

Thanks!

LtG · Post by **LtG** » Thu Aug 10, 2017 5:54 am

If old core is still using the thread resources (kernel stack), then you _may not_ put it in any "free to use" list, including "ready to run", because it's not, its resources are still in use.

One option that you mentioned is to switch to a per-core kernel stack that is used when there's no active userland thread. The alternative is to keep the old kernel stack locked and thus the thread itself locked.

Agalloch wrote: Which means that between placing the Thread on the list and moving to a new Thread - including all the code to find the next task to run - I will be using the Kernel stack for the thread I placed on the list. Now (multi-CPU), I put the current thread on some kind of list - possibly another CPU's ready to run list - then I search my own ready-to-run list. At this point, I am using the First threads Kernel stack while another CPU might remove the same thread from its own ready to run queue and also begin using the same Stack.

Why are you moving "running thread" from core A to B, just so A can then schedule some other thread? Maybe it's just because of your example, but it would be probably be good to ensure that you never do something like that.. Instead core B should schedule that "some other thread"..

Agalloch wrote: 2. Guarantee that all code that can be used following a decision to perform a task switch avoids using the Stack. This would mean leaving Interrupts disabled from the moment I decide to yield the current task, through all code to put it on a ready-to-run, sleep or I/O queue, choosing a new Task and saving possibly large amounts of context (FPU/SSE etc).

This is probably going to be a nightmare to maintain, an extra 4KiB (or less?) stack per core is a lot better.

Agalloch · Post by **Agalloch** » Thu Aug 10, 2017 6:15 am

LtG wrote:If old core is still using the thread resources (kernel stack), then you _may not_ put it in any "free to use" list, including "ready to run", because it's not, its resources are still in use.

One option that you mentioned is to switch to a per-core kernel stack that is used when there's no active userland thread. The alternative is to keep the old kernel stack locked and thus the thread itself locked.

Yes, that is the the problem. However, I'd like to not use it's resources rather than change putting it onto a list. If I was to move putting it onto a queue then, for example, the current CPU wouldn't find it on it's ready to run queue and might select a less important task to run. I'd like to avoid this in an elegant (read: faster, less special cases and branch mispredictions) manner, and not add special checks for the case where the current ask would be ready.

Your first option (my third) is the one I am using currently, but I'd like to know if I'm missing some obvious advantages and disadvantages? And if I'm using per-core stacks anyway is there a good reason to not switch to them entirely? I think this would make Kernel-bound threads difficult/impossible?

Your alternative (my first) means a coupling between CPUs that I think should be unnecessary. I like to try and make all CPUs as separate as possible, to improve scalability. Obviously this problem would only ever effect 2 CPUs though so do you think it would have little impact on scalability?

LtG wrote:
Agalloch wrote: Which means that between placing the Thread on the list and moving to a new Thread - including all the code to find the next task to run - I will be using the Kernel stack for the thread I placed on the list. Now (multi-CPU), I put the current thread on some kind of list - possibly another CPU's ready to run list - then I search my own ready-to-run list. At this point, I am using the First threads Kernel stack while another CPU might remove the same thread from its own ready to run queue and also begin using the same Stack.
Why are you moving "running thread" from core A to B, just so A can then schedule some other thread? Maybe it's just because of your example, but it would be probably be good to ensure that you never do something like that.. Instead core B should schedule that "some other thread"..

I don't know, it's not relevant to the quesiton. Assume that CPU A is running faster for whatever reason (faster chip, thermal throttling, power states chosen based on temperature etc) and has been preempted by a higher priority task that wants to run on the faster CPU, while CPU B has been considered a reasonable place to continue running the lower priority task. Arbitarily restricting this would reduce performance unncecesarily - the way in which I schedule wouldn't automatically check for this so it'd be an unnecessary extra branch or more. I will think about how easy it is to add though.

LtG wrote:
Agalloch wrote: 2. Guarantee that all code that can be used following a decision to perform a task switch avoids using the Stack. This would mean leaving Interrupts disabled from the moment I decide to yield the current task, through all code to put it on a ready-to-run, sleep or I/O queue, choosing a new Task and saving possibly large amounts of context (FPU/SSE etc).
This is probably going to be a nightmare to maintain, an extra 4KiB (or less?) stack per core is a lot better.

This is my least favourite option also, I just wanted to be thorough in case I missed something.

Thanks!

LtG · Post by **LtG** » Thu Aug 10, 2017 7:20 am

Agalloch wrote:And if I'm using per-core stacks anyway is there a good reason to not switch to them entirely? I think this would make Kernel-bound threads difficult/impossible?

I think this depends on the overall system design.

Suppose tA (thread A) makes a syscall and the syscall blocks in the kernel (maybe IO), what happens? Is the entire _core_ now essentially blocked?

If you had a kernel thread for each user thread then the kernel thread can block without issues and you just switch to another one.

In the end, I don't think that your current _reason_ makes much of a difference for any of the three options. However your larger overall design probably would give better guidance for this decision.

If you have absolutely tiny kernel, then I might even seriously consider not using a kernel stack at all, and have the entire kernel in assembly. But only if it's really tiny and it's not expected to change at all. If it's a monolithic kernel then in practice you probably need to have a kernel stack/thread for each userland thread, which means you're back to your three options.

If you can easily switch to new task and then put the old one in a "ready to run" queue, then that might be the simplest solution. So keep the old thread "locked" until you've got yourself a new thread and a new kernel stack. If that's not an option for some reason then I'd probably go with an additional per-core kernel stack that can be used in the interim.

Korona · Post by **Korona** » Thu Aug 10, 2017 10:02 am

I solve this problem using your third option. Specifically there are per-cpu scheduler stacks and per-thread syscall stacks. The per-thread threads are still required so that threads can block at arbitrary points in syscalls.

Thread switching behaves as followings:

Scheduling starts on a per-thread syscall stack.
The scheduler switches to a per-cpu scheduling stack.
The current thread's state is saved to some per-thread structure.
The current thread is marked as suspended. The thread state is protected by a per-thread IRQ-disabling lock. Among other things his is required to prevent the thread from concurrently being marked as killed. At this point other processors are free to schedule/kill/interrupt the current thread.
The thread is added to a run queue.
Either a new thread is selected or an idle procedure is entered.
The new thread is marked as active. As above, this requires holding the new thread's state lock. At this point other processors are forbidden to schedule/kill/interrupt the new thread.
IRQs are disabled.
The new thread's state is restored.

Your second option would require writing all this scheduling code in assembly which is not acceptable for me. The first option has an obvious scalability bottleneck.

OSDev.org

Thread-safe kernel stacks in Multi-CPU

Thread-safe kernel stacks in Multi-CPU

Re: Thread-safe kernel stacks in Multi-CPU

Re: Thread-safe kernel stacks in Multi-CPU

Re: Thread-safe kernel stacks in Multi-CPU

Re: Thread-safe kernel stacks in Multi-CPU