Are Microkernels really better?

linguofreak · **Joined:** Wed Mar 09, 2011 3:55 am **Posts:** 509

zaval wrote:

nexos wrote:

Modular monolithic, which is where kernel and drivers run in kernel mode, but drivers are stored in modules and are loaded and linked into the kernel at runtime. This is more flexible, but it has third party code running in kernel mode generally, so it is less secure. Linux and *BSD are this.

Monolithic hybrid, which is where kernel and drivers run in kernel mode, but the kernel is structured like a microkernel. May have the ability to run user mode drivers, and have more things in user mode, but most drivers will run in kernel mode. Windows NT, ReactOS, and XNU (macOS) are this. Arguably, this is just marketing as Linus Torvalds said.

"Monolithic" implies that the kernel is a SINGLE image, thus - "monolithic", if that qualifier term is supposed to mean something. Neither Windows, nor ReactOS and, I believe, OS X as well, aren't single image kernels. Windows literally constructs itself at load time from numerous modules. and even unloads portions of them after and loads new ones at run time. It's definitely a modular kernel. there is no just one image. Linux on the other hand IS monolithic and IS NOT modular, since its "modules" are just object files, left to be linked into a SINGLE kernel image later. it is NOT modular, rather, delayed linked monolithic.
Torvalds maybe "said" that because he sh1ts everything what is not like his UNIX clone and maybe for linux not looking like a retarted clone, not bringing anything new. what that "just marketing" means? it's BS, because those are two different organization strategies and they do have visible consequenses for kernels that employ them.

Modular simply means that the kernel is spread over multiple files, and new components can be added to the kernel at runtime. The alternative is having everything in a single file that is loaded in its entirety at boot, with no ability to add components at runtime. Pretty much every kernel that exists these days is modular, because otherwise the core kernel would balloon to ridiculous sizes (MacOS might get away with a non-modular kernel if Apple wanted to do that, given that they control the hardware tightly, but no other OS that wants to be competitive, especially in the PC market, has that luxury). Back in the day, before computers were a huge consumer item, and when the selection of devices available for a given platform was more limited, many monolithic kernels were non-modular, but that time is long gone.

Colloquially, "monolithic" would imply "non-modular", but that usage is rare, if present, in the literature. "Monolithic kernel" is pretty much always used as an antonym to "microkernel". Monolithic kernels, as the term is used in the literature, are monolithic in the sense that they are a single unit as far as the CPU is concerned: there are no protection transitions made when execution is transferred between different kernel components.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Don't mix the concepts of modular/single and monolithic/microkernel. Those are totally different things.

Monolithic kernel runs everything in the same address space in privileged mode (ring 0). Microkernel uses more address spaces, and tries to push everything to normal (ring 3) mode as much as possible, usually keeping just a small fraction of kernel code in ring 0. So this category refers to the run-time mode.

On the other hand, modular refers to how the kernel is stored on disk. This is completely different to run-time mode.

For example:
- monolithic single: xv6 or the Linux kernel with static modules only (typically in embedded systems, like in a wifi router or NAS), where MODULES=n
- monolithic modular: Linux kernel for a typical distro, compiled with MODULES=y and at least one Kconfig option is "m" or "M".
- microkernel single: Minix (uses one statically linked kernel file, but runs many separated tasks in run-time)
- microkernel modular: L4 for example. Any microkernel that uses an initrd with multiple files, one for each task

As you can see, all combinations of monolithic/micro and single/modular exists, so these are independent categories. As a matter of fact, the Linux kernel, which is monolithic can be compiled for both modular and single image mode. And the Minix kernel, which is a microkernel, isn't modular.

Cheers,
bzt

nexos · **Joined:** Tue Feb 18, 2020 3:29 pm **Posts:** 1071

bzt wrote:

Microkernel uses more address spaces, and tries to push everything to normal (ring 3) mode as much as possible, usually keeping just a small fraction of kernel code in ring 0

On x86 microkernels, this is true. But remember that some architectures (like the 68k) had no memory protection or protection rings as far as I know, AmigaOS, a microkernel, still ran on the 68k. A microkernel is where drivers and system services run as separate executables, away from the kernel. They are communicated to with messages. AmigaOS was probably faster then most monolithic kernels, as it didn't have expensive context switches, and there was no memory protection or protection rings in the 68k. Microkernel's problem is not the design. The design is great. It is x86's protection features that make it slow. And although these feature are very important, they make kernels a lot slower in general.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

nexos wrote:

bzt wrote:

Microkernel uses more address spaces, and tries to push everything to normal (ring 3) mode as much as possible, usually keeping just a small fraction of kernel code in ring 0

On x86 microkernels, this is true. But remember that some architectures (like the 68k) had no memory protection or protection rings as far as I know, AmigaOS, a microkernel, still ran on the 68k. A microkernel is where drivers and system services run as separate executables, away from the kernel.

Well, yes and no. You're right that 68k did not have protection rings, and therefore all tasks run at the same priv level. But for example Minix does not use separate executables, everything is linked into one single file. So I guess the best would be to make a distinction between executables (files on the disk, either modules or services) and processes (distinct run-time entities, probably with separated address spaces if the arch supports that).

So I'd like to rephrase: Microkernel uses more processes, and tries to push everything out of the kernel process into it's own separated process. While in a monolithic kernel there's only one process, modules can see each other's memory directly. In a microkernel, processes *should* only communicate using a common API provided by the kernel, without directly accessing each other's memory. (With memory protection and address spaces, they not only *should*, but they simply *can't* circumvent that API.)

nexos wrote:

They are communicated to with messages.

Having a message bus looks like to be a very common part of microkernels. Although other methods could be used (port abstraction was pretty common in the early days), I'm tend to say messaging is simple and also the most effective solution (when implemented correctly), that's why it's so widespread.

nexos wrote:

AmigaOS was probably faster then most monolithic kernels, as it didn't have expensive context switches, and there was no memory protection or protection rings in the 68k.

Yes, absolutely. Calling the kernel to pass a message was as simple as calling a shared library function these days. And because there was only one address space, it only needed to pass pointers around, which is the fastest possible way. No monolithic kernel that implement address spaces has that luxury.

nexos wrote:

Microkernel's problem is not the design. The design is great. It is x86's protection features that make it slow. And although these feature are very important, they make kernels a lot slower in general.

I'm not entirely sure I agree.

First, switching privilege level has an overhead, true, but that also affects monolithic kernels. That's not specific to microkernels.

Second, using shared pages you can still have separated address spaces but pass messages as pointers, thus you can eliminate the overhead of copying the messages from one address space to another. (But it is easy to provide a bad, extremely slow implementation here if you're not familiar with the hardware's capabilities in 100%.) Implementing a simple messaging is easy, but implementing an effective messaging solution is notoriously hard.

Third, memory paging and protection levels are not x86 specific, all modern architecture have those. For example on ARM, you have similar paging tables (actually even more sophisticated, you have a separated user address space and kernel address space, TTBR0 and TTBR1), and you also have supervisor (EL1 equivalent of ring 0) and user (EL0 equivalent of ring 3) privilege levels.

Cheers,
bzt

nexos · **Joined:** Tue Feb 18, 2020 3:29 pm **Posts:** 1071

bzt wrote:

I'm not entirely sure I agree.

First, switching privilege level has an overhead, true, but that also affects monolithic kernels. That's not specific to microkernels.

Second, using shared pages you can still have separated address spaces but pass messages as pointers, thus you can eliminate the overhead of copying the messages from one address space to another. (But it is easy to provide a bad, extremely slow implementation here if you're not familiar with the hardware's capabilities in 100%.) Implementing a simple messaging is easy, but implementing an effective messaging solution is notoriously hard.

Third, memory paging and protection levels are not x86 specific, all modern architecture have those. For example on ARM, you have similar paging tables (actually even more sophisticated, you have a separated user address space and kernel address space, TTBR0 and TTBR1), and you also have supervisor (EL1 equivalent of ring 0) and user (EL0 equivalent of ring 3) privilege levels.

I meant protection features in general, not just x86

.

The first one is true. It affects monolithic kernels as well.

The second one can be true. If you used shared pages, you might could get away with it, but imagine this scenario. Say we sent a message to a process giving it data. The process we sent a message to expects a certain value. The sender then changes that data value, or frees it from the heap. Now the receiver has an invalid pointer. One solution I have is a copy on write. The page the data is on is mapped read only. The sender or receiver write to it, and the page fault handlers does copy on write. This would remove some performance overhead, while not making it less stable. Also, I did say kernels in general. Plus, an address space swap is a very expensive operation

.

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 402

bzt wrote:

nexos wrote:

bzt wrote:

Microkernel uses more address spaces, and tries to push everything to normal (ring 3) mode as much as possible, usually keeping just a small fraction of kernel code in ring 0

On x86 microkernels, this is true. But remember that some architectures (like the 68k) had no memory protection or protection rings as far as I know, AmigaOS, a microkernel, still ran on the 68k. A microkernel is where drivers and system services run as separate executables, away from the kernel.

Well, yes and no. You're right that 68k did not have protection rings, and therefore all tasks run at the same priv level.

68k has always had the concept of a protected supervisor mode. With an external MMU, if also supported protected memory domains. Sun-1 was based on 68k, as were a number of other workstations from the early 80s (from HP, SGI, Apollo). A description of the Sun-1 MMU can be found in http://i.stanford.edu/pub/cstr/reports/csl/tr/82/229/CSL-TR-82-229.pdf

What the initial 68k lacked was the ability to save sufficient state to restart faulting instructions, so it was unable to fully support paged virtual memory. The 68010 fixed that, I believe.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

nexos wrote:

I meant protection features in general, not just x86

.

Ah, ok, you wrote "x86 protection" that's why I thought you meant x86 only. This makes more sense :-)

nexos wrote:

The second one can be true. If you used shared pages, you might could get away with it, but imagine this scenario. Say we sent a message to a process giving it data. The process we sent a message to expects a certain value. The sender then changes that data value, or frees it from the heap. Now the receiver has an invalid pointer. One solution I have is a copy on write. The page the data is on is mapped read only. The sender or receiver write to it, and the page fault handlers does copy on write. This would remove some performance overhead, while not making it less stable.

Now you understand why I wrote writing messaging is easy, while writing messaging efficiently is hard! You've made good points!

nexos wrote:

Also, I did say kernels in general. Plus, an address space swap is a very expensive operation

.

Yes, but address space swap is expensive for monolithic kernels too.

To sum it up, efficiency of a microkernel is not affected by protection at all (monolithic suffers of exactly the same overhead). Instead it mostly depends on how cleverly the message passing is implemented.

This latter of course includes how system structures are spread across system services and with that how much messages needed for basic operations. In this regard Minix is terrible (needs lots of messages, and therefore lots of context switches), while L4 is pretty good (not just maps the messages, but minimizes the number of required messages too, and with that, the number of required context switches). For example, under Minix, a simple open() needs way too many context switches:
1. message from user process to kernel
2. message from kernel to VFS
(point 3 repeated as many times as many mount points the certain path has)
3.1 message from VFS to FS driver service(s)
3.2 reply message from FS to VFS
4. reply message from VFS to kernel
5. reply message from kernel to user process
But that's okay, since Minix was designed with clear structure in mind instead of efficiency.

thewrongchristian wrote:

68k has always had the concept of a protected supervisor mode. With an external MMU

My point was, that the original AmigaOS did not use supervisor mode for the exec (looked like any other library), and it had only one address space. BTW you're right according to wikipedia, it was the 68010 that first implemented virtual memory.

Cheers,
bzt

linguofreak · **Joined:** Wed Mar 09, 2011 3:55 am **Posts:** 509

bzt wrote:

Yes, absolutely. Calling the kernel to pass a message was as simple as calling a shared library function these days. And because there was only one address space, it only needed to pass pointers around, which is the fastest possible way. No monolithic kernel that implement address spaces has that luxury.

On current hardware, at least. If the hardware can automatically make a transition between protection domains on control transfers between two programs operating at the lowest privilege level, you could have the best of both worlds (assuming that this functionality doesn't impair the performance of the hardware too much to be viable, as with the Intel TSS mechanism). On hardware that allowed programs to load supplementary address spaces and had some way of restricting when a given address space could be accessed, you could have a message passing system that looked like this:

1) Your thread starts out with various address spaces loaded (such as the one containing the code the thread is currently executing, the one containing the thread's stack and thread-local storage, the one containing the heap for whatever job the thread is doing, maybe a few memory mapped files, etc), and a few tables something like an x86 LDT specifying what address spaces may currently be accessed (such as one table containing the list of libraries may be called from the current code address space, another containing a list of data address spaces that may be loaded when the current stack/thread local storage address space is loaded, another containing a list of address spaces with memory mappings of open file descriptors, etc).
2) The thread wants to pass a message to some microkernel component to request some service or other.
3) The thread marks any loaded address spaces that will not be needed for the system call to be marked as inaccessible on the next control transfer to a different code address space. It also loads any address spaces that will be needed that are not yet loaded (but it is probably making the service request to operate on data that it has recently manipulated, so those address spaces are likely to already be loaded).
4) The thread constructs a stack frame, including any pointers into the address spaces that will be needed for the service request.
5) The thread executes a far call to a function in the address space for the microkernel component it is requesting service from. This automatically causes the register pointing at the address space descriptor table for the current code space to be loaded with a new value, pointing at the descriptor table listing the address spaces callable by the microkernel component.
6) The function call and stack frame in 4) and 5) serve as the message from the program to the microkernel component. The microkernel component is able to use any pointers passed directly. It queues up the work requested (or, if it can be done quickly, or the request is blocking, does the work immediately).
7) The microkernel component returns execution to the code that called it, and the address spaces that were marked inaccessible on the original control transfer are re-marked as valid. For non-blocking requests, it calls a callback in the application code, which functions as a "request completed" message (if this is needed for the request in question). For blocking requests, the "request completed" message is the return itself.

Note that there's not really a requirement for a core kernel to implement a message-passing service, the message is sent by a direct function call from the application to the microkernel component in question, as is possible on a system with no protection whatsoever.

The interesting thing about such a system is that you retain the concepts of executables, libraries, and threads, but it becomes hard to say what a "process" is. I think the idea of a "process" is largely a product of being unable, on traditional architectures, to make protection domain transitions without coming out of the lowest privilege level. It's basically a bundle of code and one or more threads working on that code, plus a common heap for the threads to use, that are all bundled together in one protection domain.

The Intel TSS mechanism basically implemented this, but it used a reload-the-world approach that made it prohibitively expensive to do. They almost managed to implement a more workable system with segmentation, but they structured it wrong. Imagine a system where instead of referencing the GDT for segment selectors with bit 2 clear, and the LDT for bit 2 set, you instead have two more segment registers (for a total of eight) and an LDT for each segment register (basically folding the LDTR into the segment registers), with the LDT referenced by a given selector being determined by bits 2-4 of the selector (I'd actually use the high order bits, but for the purpose of illustration I'm making as few changes to the protected mode Intel segmentation architecture as I can while still ending up with a system with the desired properties). So now an eighth of your segment selectors reference the LDT for the CS register, and eighth reference the LDT for the DS register, and so on. The next change is that the LDT format is not the same as the GDT format, but rather an index into the GDT, plus some metadata. Then you double the size of each GDT descriptor, and add a base and limit for an LDT to the descriptor format.

When a program loads a segment register, the CPU selects an LDT based on bits 2-4 of the selector, then uses the rest of the selector to index into that LDT. The LDT descriptor is then used to index into the GDT, and the contents of the GDT descriptor are loaded into the segment register. Because the new GDT descriptor format contains a base and offset for an LDT, and the LDTR has been folded into the segment registers, this causes an LDT switch (for one of the eight LDTs) any time a segment register is loaded. This allows every segment to have a list of other segments that may be loaded when that segment is loaded.

The tricky bit is figuring out how to handle returns from intersegment calls. The code segment containing an executable might have all of the libraries it uses referenced in its LDT, so it can call those libraries, but the libraries probably won't have all the executables that call them referenced in their LDTs, and even if they do, figuring out what selector to push to the stack to make sure that the right segment is returned to would be tricky. One could push a GDT selector, but that would defeat the system: a program could access an arbitrary segment if it knew that segment's GDT selector, simply by writing the selector onto the stack and then popping it. One solution I've considered is a separate "segment escrow stack", which cannot be directly addressed by programs, but to which segment registers can be pushed or popped.

Such a system would have been, I think, both much more powerful and much more performant than the TSS mechanism. If I were introducing a new architecture, I wouldn't base the mechanism for transitioning betweeen protection domains on base:offset segmentation, but it shows how close Intel was to something much more useful than what they ended up designing.

OSDev.org

Are Microkernels really better?

Who is online