The love of fibers

AndrewAPrice · Post by **AndrewAPrice** » Thu Feb 11, 2021 1:31 pm

How are others going about threads, fibers, parallelism in their OS's?

I spent time implementing fibers completely in user-space. My intepretation of the the System V calling convention is that the callee is only required to save 7 registers and the stack, so the context switch is much lighter weight than switching threads. Then I use an object pool to avoid dynamic allocation when I rapidly create/remove fibers.

Now I can do:

Code: Select all

int main() {
  Scheduler::Defer([]() {
     std::cout << "world" << std::endl;
  });

  std::cout << "Hello ";
  // While this fiber sleeps, here's our chance to run the deferred function above.
  Sleep(100);
  std::cout << "!" << std::endl;

  return 0;
}

The reason for implementing fibers is that anytime user code blocks (waiting on an RPC or kernel message), we can continue to keep the processor busy, without the overhead of thread switching (system calling, and storing + loading all registers). Any incoming RPC or message is handled as a fiber.

There are two functions:

Code: Select all

Scheduler::FinishAnyPendingWork();

FinishAnyPendingWork() handles any queued fibers and polls the kernel for incoming messages and returns when there's nothing more to do. The intention for this is that you could use it in game loop.

Code: Select all

Scheduler::HandOverControl();

HandOverControl() never returns and sleeps the kernel thread until there is something to do. The intention for this is to turn the program into an event loop (such as a UI application waiting for clicks, or a server waiting for RPCs.)

Here's my PS/2 keyboard and mouse driver, which shows how I set up some handlers, then hand over control:

Code: Select all

int main() {
  // These are services that handle incoming RPCs.
  mouse_driver = std::make_unique<PS2MouseDriver>();
  keyboard_driver = std::make_unique<PS2KeyboardDriver>();
  InitializePS2Controller();

  // Calls InterruptHandler() when IRQ1 or IRQ12 occurs.
  RegisterInterruptHandler(1, InterruptHandler);
  RegisterInterruptHandler(12, InterruptHandler);

  // Switches us over to be an event loop that sleeps unless we're dispatching interrupts and RPCs.
  HandOverControl();
  return 0;
}

PeterX · Post by **PeterX** » Thu Feb 11, 2021 6:07 pm

I had to look up fibers. It is cooperative multitasking, right?

Not sure if I like the concept. Could coop multitiasking (or more precisely the context switch) be created by a compiler? Or do you have to explicitely write the application code in blocks plus some context-switch/idle function/macro (like your HandOverControl() or Sleep()?

Greetings
Peter

AndrewAPrice · Post by **AndrewAPrice** » Thu Feb 11, 2021 7:36 pm

PeterX wrote:I had to look up fibers. It is cooperative multitasking, right?

Not sure if I like the concept. Could coop multitiasking (or more precisely the context switch) be created by a compiler? Or do you have to explicitely write the application code in blocks plus some context-switch/idle function/macro (like your HandOverControl() or Sleep()?

Greetings
Peter

It is cooperative multitasking. I also have threads which are preemptive, and fibers compliment threads.

You normally wouldn't manually call yield() in a fiber, because the point isn't to give the illusion of code running in parallel (CPU intensive code should still live in threads), but to maximize throughput by keeping the processor busy if there's work to do.

The framework you're using at a very low level should be the one that that worries any context switching, not the end programmer. So any operation that would normally cause the thread to wait (e.g. an RPC, waiting for IO, waiting on a future, etc) would check to make sure there are no other fibers (or "tasks" if you want to call them that) to run before putting the thread to sleep.

nullplan · Post by **nullplan** » Thu Feb 11, 2021 9:41 pm

I'm not a big fan of fibers, because they are easy to mess up, and the whole point of an OS is to allow parallel execution without the possibility of mess-ups. But I can't prevent userspace from using them, since they are entirely transparent to the kernel, so I guess applications can do what they like.

AndrewAPrice wrote:The reason for implementing fibers is that anytime user code blocks (waiting on an RPC or kernel message), we can continue to keep the processor busy, without the overhead of thread switching (system calling, and storing + loading all registers).

That would have been an argument twenty or thirty years ago, but now? Now CPUs are so fast that this whole overhead barely takes any measurable time at all, and you have so many of them that a better way to keep the CPU busy is to spread the load to multiple cores. I rarely have more than one runnable process at any given time, and eight CPUs to run it on. And this isn't a particularly high-end system. So, in all, threads avoid all switching overhead by running on multiple cores simultaneously. Implement this and a dynamic tick system, and compute-intensive tasks can run uninterrupted for long periods of time.

And interactive tasks are blocking all the time, anyway. They don't need fibers because they have nothing to do.

moonchild · Post by **moonchild** » Fri Feb 12, 2021 1:29 am

nullplan wrote:I'm not a big fan of fibers, because they are easy to mess up, and the whole point of an OS is to allow parallel execution without the possibility of mess-ups.

I would argue threads are much easier to mess up than fibers. Ever had an impossible-to-debug heisenrace?

nullplan wrote:
AndrewAPrice wrote:The reason for implementing fibers is that anytime user code blocks (waiting on an RPC or kernel message), we can continue to keep the processor busy, without the overhead of thread switching (system calling, and storing + loading all registers).
That would have been an argument twenty or thirty years ago, but now? Now CPUs are so fast that this whole overhead barely takes any measurable time at all, and you have so many of them that a better way to keep the CPU busy is to spread the load to multiple cores.

Demonstrably false. CPUs are much faster now than they used to be, yes, but the overhead of context switches compared with other commonly-performed operations has only increased. See: the up to 25% performance degradation observed on I/O heavy workloads following the integration of sceptre/meltdown mitigations into the linux kernel. See: nginx, which is massively more performant than apache httpd largely because it uses an event loop instead of forking.

AndrewAPrice · Post by **AndrewAPrice** » Fri Feb 12, 2021 9:35 am

There are alternatives to fibers for blocking operations such as reading a file. Some alternatives I can think of are:

Single tasking. Your program does nothing else while it waits.
Threading. Every task that might block is made into a thread, and we use the kernel's machinism for sleeping, yielding, waking up, etc.
Callbacks. Make every operation asynchronous.
Promises. I like promise chains.
Coroutines. C++20 supports them now, and you can call co_await onlong operations.

Callbacks are my least favorite of the above because it can lead to spaghetti code as you jump between handlers all over the place, and I like code that reads sequentially.

Callbacks, promises, coroutines all requires everyone up the call stack that depends on the operation to know about it. For example, let's say you had a function GetBusinessDescription(business) that returns a description of a business ("A romantic restaurant"), and you have many functions that call GetBusinessDescription(), and many functions that call those functions. Then one day, you want to support other languages, and GetBusinessDescription() might want to do operations such as querying a datastore (on disk or network) to get the description in Bengali or Portuguese. With callbacks, promises, and coroutines, GetBusinessDescription() will have a different function signature, and any function depends on GetBusinessDescription() will have to change their function signature, and so forth. There might be some purists that believe all blocking operations should be explicit, but this throws much of the C/C++ standard libary out the door.

Fibers also never give up control until they get blocked. I can't imagine a situation (outside of a gameloop) where a fiber would call 'yeild()' to hand up control when there's processing work that could be done.

Fibers and threads co-exist. Imagine a 3d rendering program and you want to ray-trace an image (which is a CPU heavy task that could take hours.) You would kick off a thread to do the actual ray tracing in parallel, while your main thread continues to be responsive to UI events (and if the ray tracing thread did disk operations to read textures of a disk, it itself could have its own fibers.)

You are right that we can make everything a thread, and functionally we achieve the same result. My OS is a microkernel, and I have not benchmarked threads vs fibers, but it's hard to believe that creating a thread for every incoming RPC would be efficient, compared to fibers (with an object pool to recycle them to avoid dynamic allocation), where the next RPC is processed only if the existing RPC reaches a blocking operation.

PeterX · Post by **PeterX** » Sat Feb 13, 2021 9:15 am

AndrewAPrice wrote:There are alternatives to fibers for blocking operations such as reading a file. Some alternatives I can think of are:
Single tasking. Your program does nothing else while it waits.
Threading. Every task that might block is made into a thread, and we use the kernel's machinism for sleeping, yielding, waking up, etc.
Callbacks. Make every operation asynchronous.
Promises. I like promise chains.
Coroutines. C++20 supports them now, and you can call co_await onlong operations.
.

I wonder if callbacks are useful for performance boost at all.
And I wonder if single task and callback are really different when implemented. What will the caller do anyway while waiting for the callback?

I must admit that I didn't understand promises and coroutines.

I personally would probably go for single tasking. Since (if I understood you rightly) it isn't what commonly is understood as "single tasking". Instead you mean probably "synchronous" or simply "sleeping". Then the scheduler decides what (other) code is run while waiting.

Greetings
Peter

OSDev.org

The love of fibers

The love of fibers

Re: The love of fibers

Re: The love of fibers

Re: The love of fibers

Re: The love of fibers

Re: The love of fibers

Re: The love of fibers