Why separate the kernel binary and the initial ramdisk?

Stijn · Post by **Stijn** » Sun Jun 24, 2018 5:02 am

The initial ramdisk is supposed to contain essential drivers for loading the rest of the OS, the most obvious one being a file system driver.

What is the goal of separating this functionality into an initial ramdisk, as opposed to including it in the kernel binary? I assume the kernel will need to know the exact content of the initial ramdisk anyway, so the kernel and the initial ramdisk would be tightly coupled.

Some context to sketch my current level of knowledge: I have a two-stage bootloader that loads an ELF kernel binary. The kernel currently has interrupt handling (mostly stubs), a terminal, and a very basic keyboard input handler coupled to the terminal.

iansjack · Post by **iansjack** » Sun Jun 24, 2018 6:28 am

The kernel is generic to all machines that the OS runs on. The initial ramarisk contains code that is needed for a particular installation, but not all installations. The alternative would be a kernel that contained all the drivers that any machine might require to boot, perhaps even all drivers it ever needs.

In older versions of Unix the entire kernel was compiled for a particular machine. Nowadays there is a generic kernel with loadable modules. But some modules must be loaded before the system can reach a state where it can load modules. These are placed in the initial ramarisk, which is created when the kernel is installed.

Stijn · Post by **Stijn** » Sun Jun 24, 2018 7:20 am

I hadn't considered the ramdisk to be created during installation, it makes sense now, thank you.

Brendan · Post by **Brendan** » Sun Jun 24, 2018 5:38 pm

Hi,

Stijn wrote:The initial ramdisk is supposed to contain essential drivers for loading the rest of the OS, the most obvious one being a file system driver.

What is the goal of separating this functionality into an initial ramdisk, as opposed to including it in the kernel binary?

For micro-kernels (where kernel "can't"/shouldn't contain things like file systems and drivers) you need to load a whole bunch of stuff in addition to the kernel (because kernel alone can't access disk to load disk drivers, etc), and it makes things a lot easier if you combine it all of the little files into a single larger "initial RAM disk" file.

For monolithic kernels there's 3 choices:

Recompile the kernel for each specific computer and build the drivers, etc directly into the kernel at compile time. This is relatively inflexible, makes it hard to have a generic bootable CD/USB (where you don't know anything about the specific computer/s) and isn't very user-friendly; but (in theory) can also give you the highest performance (compiler and link-time-optimiser can optimise everything).
Use "kernel modules" and end up with the same problem micro-kernels have.
Use a combination of both (some things compiled directly into the kernel, and some things implemented as a "kernel module")

Also; once you have some sort of "initial RAM disk" it may be used for other things (and not just drivers and file system code), including:

EDID information for the monitor (in case the information can't be obtained from the monitor itself for some reason, or in case the information from the monitor is dodgy and needs to be overridden)
ACPI tables, etc (in case the information from the firmware needs is dodgy and needs to be overridden)
Some kind of "splash screen" graphics data, so that the boot loader can (e.g.) display your operating system's logo before kernel started
One or more kernels. For an example, my boot code typically auto-detects the computer's capabilities (from CPUID, memory map, etc) and then uses this information to auto-select the best micro-kernel.
"Boot modules" that run before the kernel is started. For an example, I have "boot modules" for things like outputting the boot log (one for video, one for serial, one for the "0xE9 hack" in Bochs, etc) where my "boot abstraction layer" starts whichever boot modules make sense for the computer; plus another module for CPU identification (parsing information from CPUID, setting flags for CPU errata, etc).
Caching "unnecessary" files to improve boot times. Typically after kernel is started it takes a little while to initialise drivers and file systems (and/or networking); and this time is mostly various delays (waiting for device to reset, waiting for device to identify itself, waiting for device to respond to various initialisation commands, ...) where CPU is doing nothing. By putting some extra files in the initial RAM disk the OS can start doing some things sooner instead of doing nothing/waiting (e.g. start a "user log-in" prompt so that user can start typing their username & password while the drivers are initialising).
Fault recovery. The "initial RAM disk" can contain emergency recovery tools (e.g. maybe some tools the user can use to backup and restore the file system).
Fault tolerance. If the file system is corrupt or some essential files have been deleted but there's copies of those files in the "initial RAM disk", then you can use the copies in RAM to restore/fix the file system (so that critical files end up "self-healing").

Stijn wrote:I assume the kernel will need to know the exact content of the initial ramdisk anyway, so the kernel and the initial ramdisk would be tightly coupled.

Not necessarily - you'd have some kind of directory structure built into the "initial RAM disk" so that (during early boot) files can be found by name (e.g. you could have normal "open()" and "read()" functions that work on files in RAM). In my case; the files from the "initial RAM disk" are used to prepare the VFS cache, and VFS does an "if request can't be satisfied from cache and we're still booting; put the request in a queue and worry about it after file systems are mounted" thing so that processes can access files using normal file IO without caring if file systems are mounted yet.

Cheers,

Brendan

OSwhatever · Post by **OSwhatever** » Mon Jun 25, 2018 3:34 pm

The answer is simple, it is to keep the kernel generic and the rest of the system is loaded from memory so that additional system specific modules are started after the kernel start. There are different methods for doing this. Linux uses a RAM disk. microkernels often let additional user processes to be appended to the kernel image.

simeonz · Post by **simeonz** » Mon Jun 25, 2018 4:43 pm

Everything said so far is correct. I just want to say that the most generic reason for the ramdisk's existence is to escape the vicious circle that you get from having to load your external storage stack drivers (disk and fs) without working storage stack. To break the circle, the kernel uses prepackaged ram device that can hopefully be loaded using only the pre-boot environment. Thus, the ramdisk fs ends up being the singular simplistic driver (loosely speaking) that the kernel has to have built-in support for, no matter which filesystem actually hosts the ramdisk file.

In the Linux case however, considering that grub uses its own flavor of drivers, this line of reasoning is not valid. Instead the burden gets shifted to the bootloader. Grub supports its own kind of ramdisk, called core image, which it embeds in space between the MBR and the first partition, after the GPT, or in system reserved clusters of the filesystem. Loading grub's core image does not require filesystem support, but once loaded provides a module that is capable of parsing the boot volume filesystem.

simeonz · Post by **simeonz** » Sun Jul 01, 2018 5:36 am

Sorry for bumping the thread, but after contemplating the scheme presented in the recent "Silcos Initializer - usable on other kernel" post, I have to revise my earlier statement. I am not actually sure why Linux would continue using ramdisk in the future, except for LILO support.

On BIOS systems, GRUB uses the MBR gap to store a packed image with filesystem support module, so that it can load the kernel initrd image directly from a file, so that the kernel can load its own filesystem support modules. This appears overly complicated to me. If the Linux boot protocol assumed that it will be loaded from a filesystem aware bootloader, then it could require a file io interface (e.g. as an array of callbacks in a known location) to use to load any amount of kernel modules directly from files. The only reason to continue using ramdisk in the future is to support LILO, which uses blocklist. This is perfectly fine if the Linux boot protocol continues to endorse blocklist based bootloaders, because there is no duplication of effort there. But if the idea is to eventually support only filesystem aware bootloaders, it will be natural to remove the kernel ramdisk as well.

Brendan · Post by **Brendan** » Wed Jul 04, 2018 3:41 am

Hi,

simeonz wrote:Sorry for bumping the thread, but after contemplating the scheme presented in the recent "Silcos Initializer - usable on other kernel" post, I have to revise my earlier statement. I am not actually sure why Linux would continue using ramdisk in the future, except for LILO support.

On BIOS systems, GRUB uses the MBR gap to store a packed image with filesystem support module, so that it can load the kernel initrd image directly from a file, so that the kernel can load its own filesystem support modules. This appears overly complicated to me. If the Linux boot protocol assumed that it will be loaded from a filesystem aware bootloader, then it could require a file io interface (e.g. as an array of callbacks in a known location) to use to load any amount of kernel modules directly from files. The only reason to continue using ramdisk in the future is to support LILO, which uses blocklist. This is perfectly fine if the Linux boot protocol continues to endorse blocklist based bootloaders, because there is no duplication of effort there. But if the idea is to eventually support only filesystem aware bootloaders, it will be natural to remove the kernel ramdisk as well.

Think of it as a 3 part sequence, where:

OS installs its own drivers (for disk, network, ...) and therefore must break any previous "boot code" support for these devices (if there was any)
OS examines the devices and does some auto-detection (e.g. looks for any "RAID superblock" metadata on the disk); and starts any "optional middleware", like RAID layers and encryption; to create logical volumes (after any "boot code" support for the underlying device has been broken)
OS mounts the logical volume/s with file system/s (after any "boot code" support for the underlying device has been broken)

Note that this works fine for various cases where there is no file system, like "Linux in ROM" (the original idea behind Coreboot) and "Linux boot protocol in ROM" (the way Xeon Phi accelerator cards and probably lots of embedded systems boot), and also works for various other scenarios (e.g. "boot from network, but mount local disk/s after boot").

Also, (as far as I can tell) Linux developers themselves agree with you about it being over-complicated - I even remember seeing a conversation about GRUB 2 (likely on the Linux kernel developer's mailing list) where someone suggested using a Linux kernel as a boot loader instead because a whole Linux kernel is simpler than GRUB 2.

Cheers,

Brendan

simeonz · Post by **simeonz** » Wed Jul 04, 2018 11:05 am

Brendan wrote:Think of it as a 3 part sequence, where:

OS installs its own drivers (for disk, network, ...) and therefore must break any previous "boot code" support for these devices (if there was any)
OS examines the devices and does some auto-detection (e.g. looks for any "RAID superblock" metadata on the disk); and starts any "optional middleware", like RAID layers and encryption; to create logical volumes (after any "boot code" support for the underlying device has been broken)
OS mounts the logical volume/s with file system/s (after any "boot code" support for the underlying device has been broken)

The filesystem can be recongized by reading the volume using only firmware IO services. At least for the boot filesystem. It may not be applicable for something like network attached devices, etc, but you wouldn't be booting from storage that cannot be easily supported by the bootloader anyway. Hence there should be enough information to decide which modules to pre-load before beginning to initialize the storage stack. There will be some architectural complexity in separating the identification and mounting steps, but this is already desirable for on-demand fs module loading.

There is actually a major issue, which breaks this proposition. Depending on the module organization and the integrity guarantees of the filesystem operations, crashing after partial unsuccessful update may leave the partition unbootable. A single ramdisk image sort-of works around the problem. On the other hand, this leads me to a different thought. That booting from arbitrary filesystems is probably not such a good idea. They may not support atomicity at all or the bootloaders may not have the necessary logic to properly replay journals, which they usually don't. Using specific filesystem designed for booting and atomic updates is, I think, much better. In which case, the above approach could still be feasible.

Brendan wrote:Note that this works fine for various cases where there is no file system, like "Linux in ROM" (the original idea behind Coreboot) and "Linux boot protocol in ROM" (the way Xeon Phi accelerator cards and probably lots of embedded systems boot), and also works for various other scenarios (e.g. "boot from network, but mount local disk/s after boot").

Of course, if they wanted to, they could have introduced in-memory devices and mounted them as usual. It would obviously complicate things for this particular use case, but not every solution is best for every situation.

Brendan wrote:Also, (as far as I can tell) Linux developers themselves agree with you about it being over-complicated - I even remember seeing a conversation about GRUB 2 (likely on the Linux kernel developer's mailing list) where someone suggested using a Linux kernel as a boot loader instead because a whole Linux kernel is simpler than GRUB 2.

They do actually support loading the kernel directly from the EFI boot manager. Although "directly" is probably a little overstated here. In linux this means loading an EFI stub that jumps into generic stub that decompresses the kernel code and jumps to it, but all of this happens from code and data in one image. This is in theory. In practice however, since secure boot wont normally allow loading the kernel, they use a shim signed by MS that is loaded first. (Microsoft uses its private key to sign other people's generic loaders.) And because they prefer not to overwrite the firmware's boot entry on every update, yet another boot proxy is introduced - systemd-boot (ex- gummyboot). So, they have jumped a lot of hoops by the end, but the idea is in principle there.

OSDev.org

Why separate the kernel binary and the initial ramdisk?

Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?

Re: Why separate the kernel binary and the initial ramdisk?