How do you test your kernel project?

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Hi guys,
I was curious about how people test their kernel projects. I tried to search the forum for previous posts like that, but I was unable to find anything. Unless there was some ancient post that I missed, in first few search pages there was nothing. Maybe I used the wrong keywords or simply there hasn't been a discussion about that, yet. So, I thought about opening a discussion about the topic.

In the very early stages of a kernel project everything is so primitive that we [me, but I believe most people too] typically don't even try to write automated tests. We test everything manually, which basically consist in checking things like:

- we successfully entered protected mode (for legacy bootloaders)!
- it booted, yey!
- it can display "hello world" on the screen
- it can read PS/2 input and echo on the screen
- we went to userland and back with our first syscall
- etc.

After that, at some point, we start having too many code paths to test manually and the risk for breaking something that used to work raises with every single new commit. Also, checking that new features often really requires writing some code. So, the need for automated tests becomes very strong. Therefore, my question is:
How do you test your project? Type of tests used, infrastructure, frameworks for testing, using CI (or not ?), coverage, sanitizers etc.

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

We have:

A build server that builds a new image whenever a Managarm package (e.g., the kernel, or a port) is update. The server automatically runs the image in qemu (link to script) to test whether it successfully starts Weston or not.
A CI based on GitHub actions that builds our kernel and drivers to check for compilation problems. Unfortunately, we cannot (yet) test the kernel on each CI run.
For mlibc, the CI builds all in-tree ports to check for compilation errors. Additionally, we run a few unit tests on the Linux port.
A few rudimentary unit tests for syscalls and posix behavior we can manually run whenever needed.

As debugging tools, we support UBSAN and KASAN in the kernel. We also have a memory leak checker for the kernel (that exports a trace of all allocations to the host via virtio-console). Our newest addition is a server for the GDB remote debugging protocol such that we can debug (user space) drivers via GDB.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3193

vvaltchev wrote:

Hi guys,
I was curious about how people test their kernel projects. I tried to search the forum for previous posts like that, but I was unable to find anything. Unless there was some ancient post that I missed, in first few search pages there was nothing. Maybe I used the wrong keywords or simply there hasn't been a discussion about that, yet. So, I thought about opening a discussion about the topic.

In the very early stages of a kernel project everything is so primitive that we [me, but I believe most people too] typically don't even try to write automated tests. We test everything manually, which basically consist in checking things like:

- we successfully entered protected mode (for legacy bootloaders)!
- it booted, yey!
- it can display "hello world" on the screen
- it can read PS/2 input and echo on the screen
- we went to userland and back with our first syscall
- etc.

After that, at some point, we start having too many code paths to test manually and the risk for breaking something that used to work raises with every single new commit. Also, checking that new features often really requires writing some code. So, the need for automated tests becomes very strong. Therefore, my question is:
How do you test your project? Type of tests used, infrastructure, frameworks for testing, using CI (or not ?), coverage, sanitizers etc.

My main test method is that I always step new code either in the application or kernel debugger to verify that it does what I expect. I try to test different scenarios this way. The next step then is to test functions on different machines with different configurations. Then the whole application go through testing, followed by testing in the field first on a few terminals and then on larger sets to finally upgrade all.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3193

Korona wrote:

Our newest addition is a server for the GDB remote debugging protocol such that we can debug (user space) drivers via GDB.

I have this for VFS drivers too (except that I use OpenWatcom's debugger over TCP/IP, not GDB). VFS drivers are basically like microkernel drivers, and need a good debug tool.

I can also use the application debugger to trace into syscalls in kernel. I typically use this to debug C-based kernel drivers by starting a small application with a "test" gate which I then can link to the kernel code I want to debug. I have a built in kernel debugger, but it doesn't operate on source level.

I can also attach the application debugger to an already running application or VFS server, and then stop it with a signal or by setting a breakpoint in the code.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

rdos wrote:

My main test method is that I always step new code either in the application or kernel debugger to verify that it does what I expect. I try to test different scenarios this way. The next step then is to test functions on different machines with different configurations. Then the whole application go through testing, followed by testing in the field first on a few terminals and then on larger sets to finally upgrade all.

If I've understood correctly, you're talking only about manual testing. Do you have any form of automated tests?

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Korona wrote:

As debugging tools, we support UBSAN and KASAN in the kernel. We also have a memory leak checker for the kernel (that exports a trace of all allocations to the host via virtio-console).

UBSAN and KASAN are great to have in a kernel. I suppose UBSAN shouldn't be hard to support. In theory, it just generates calls to __ubsan_handle_* functions, right? But I don't know how KASAN works. Was it hard to integrate?

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3193

vvaltchev wrote:

rdos wrote:

My main test method is that I always step new code either in the application or kernel debugger to verify that it does what I expect. I try to test different scenarios this way. The next step then is to test functions on different machines with different configurations. Then the whole application go through testing, followed by testing in the field first on a few terminals and then on larger sets to finally upgrade all.

If I've understood correctly, you're talking only about manual testing. Do you have any form of automated tests?

Not really. I sometimes build all the code to check that it compiles, but I wouldn't call that an automated test. I actually fail to see how you effectively could test OSes with automated tests. Just getting it to build & boot is simply not enough for a real test. I always do testing on real hardware and not in emulators as I find that emulators often work even when real hardware wouldn't.

About the worse thing that can happen to an OS project is the introduction of random crashes. I've had a few of those were I had to revert code back 100s of revisions like when I introduced multicore operation, but also to a lesser extent when I modified the physical memory handler. I don't think automated testing would catch these scenarios. Often, lengthly tests on as many diffrent machines as possible is required.

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

vvaltchev wrote:

I suppose UBSAN shouldn't be hard to support. In theory, it just generates calls to __ubsan_handle_* functions, right?

Right.

vvaltchev wrote:

But I don't know how KASAN works. Was it hard to integrate?

It was harder to find information on it and it is also easier to break accidentally, but overall, it doesn't require that much effort. Basically, you need to reserve (virtual memory) space for a "shadow map" which is 1/8 of the size of your heap. You pass the address of the shadow map to the compiler at compile time. The compiler instrumentation does almost all of the work for you; you just need to adjust your memory allocator to mark valid objects in the heap (i.e., mark them as valid when you allocate, and as poisoned when you free). Aside from that, you only need to supply __asan hooks for error reporting. The entire implementation (except for the shadow memory allocation at boot) is in this file (click me) and less than 200 lines of code.

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 402

vvaltchev wrote:

Hi guys,
I was curious about how people test their kernel projects. I tried to search the forum for previous posts like that, but I was unable to find anything. Unless there was some ancient post that I missed, in first few search pages there was nothing. Maybe I used the wrong keywords or simply there hasn't been a discussion about that, yet. So, I thought about opening a discussion about the topic.

In the very early stages of a kernel project everything is so primitive that we [me, but I believe most people too] typically don't even try to write automated tests. We test everything manually, which basically consist in checking things like:

- we successfully entered protected mode (for legacy bootloaders)!
- it booted, yey!
- it can display "hello world" on the screen
- it can read PS/2 input and echo on the screen
- we went to userland and back with our first syscall
- etc.

After that, at some point, we start having too many code paths to test manually and the risk for breaking something that used to work raises with every single new commit. Also, checking that new features often really requires writing some code. So, the need for automated tests becomes very strong. Therefore, my question is:
How do you test your project? Type of tests used, infrastructure, frameworks for testing, using CI (or not ?), coverage, sanitizers etc.

I don't have much in the way of automated testing, but I recently did add a system call fuzzer, which did find a couple of bugs that I probably wouldn't have found otherwise.

Now, my system runs the fuzzer process until it runs out of memory (some fuzzed system calls will result in user memory being referenced and allocated on demand) and I don't handle out of memory situations yet, and panic as a result.

My hope is, as I add system calls and further functionality, the fuzzer can find further bugs.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Korona wrote:

It was harder to find information on it and it is also easier to break accidentally, but overall, it doesn't require that much effort. Basically, you need to reserve (virtual memory) space for a "shadow map" which is 1/8 of the size of your heap. You pass the address of the shadow map to the compiler at compile time. The compiler instrumentation does almost all of the work for you; you just need to adjust your memory allocator to mark valid objects in the heap (i.e., mark them as valid when you allocate, and as poisoned when you free). Aside from that, you only need to supply __asan hooks for error reporting. The entire implementation (except for the shadow memory allocation at boot) is in this file (click me) and less than 200 lines of code.

Thanks, man! I think that for now I'm just going to add support for UBSAN, but when I have a little more time, I'll consider KASAN too. Those are very powerful tools.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

rdos wrote:

I actually fail to see how you effectively could test OSes with automated tests. Just getting it to build & boot is simply not enough for a real test. I always do testing on real hardware and not in emulators as I find that emulators often work even when real hardware wouldn't.

I totally agree that on VMs / emulators it's easy to get the stuff working. The real test is on the real HW. But, that doesn't stop you from having automated tests. Obviously, just getting to build & boot is nothing: there's much more that can be done. For example, you could test your syscalls in a variety of scenarios using a program running on your OS. A 100k LoC projects has an incredible amount of code paths. Also, you can have kernel self-tests as well and trigger them with a special syscall interface. That will allow to test special code paths that are not easily triggerable from outside. And if that's still not enough, you can write multi-threaded stress/chaos tests that will expose any race conditions in your code. All of that can run both on VMs and on real hardware as well.

rdos wrote:

About the worse thing that can happen to an OS project is the introduction of random crashes. I've had a few of those were I had to revert code back 100s of revisions like when I introduced multicore operation, but also to a lesser extent when I modified the physical memory handler. I don't think automated testing would catch these scenarios. Often, lengthly tests on as many diffrent machines as possible is required.

Oh yeah, random crashes are terrible. Automated testing using CI helped me catch many of those cases. Actually, I started writing more tests just to increase the stability of my kernel. I observed that, even if on my local machine in a VM is harder to reproduce certain bugs, when the same tests are run in CI system in the "cloud", I was able to catch more bugs because there the my code was preempted much more often so, interrupts might come later etc. In other words, the opposite of a real time system. That's a terrible environment, but it's also good because it "shakes" the whole timing and triggers real bugs (race conditions).

Obviously, I'm talking about pure software bugs. When you have bugs in your drivers related with how you interact with real HW (which is much tricky to work with compared to the emulated devices), well, there you have no choice other than running your OS on the real machine and (possibly) trying to reproduce the problem there, with tests.

OK, just for completeness, I'll add that in my knowledge, big companies can do automated tests with hardware as well: they have special HW devices (assumed as reliable) that simulate real input (PS/2, USB, Ethernet, PCIe, whatever) so they can treat a whole machine (HW + software) like a sort of "black box", exactly as we can test an OS using a VM. Each of those special devices, looks like a regular device of type XYZ to the tested machine's kernel, but in reality, its behavior is controlled from outside. Not an expert here, I just know such things exist.

clementttttttttt · **Joined:** Tue Jul 14, 2020 4:01 am **Posts:** 70

Manual testing using my trusty custom extra-verbose bochs that prints out everything it's doing.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3193

vvaltchev wrote:

I totally agree that on VMs / emulators it's easy to get the stuff working. The real test is on the real HW. But, that doesn't stop you from having automated tests. Obviously, just getting to build & boot is nothing: there's much more that can be done. For example, you could test your syscalls in a variety of scenarios using a program running on your OS. A 100k LoC projects has an incredible amount of code paths. Also, you can have kernel self-tests as well and trigger them with a special syscall interface. That will allow to test special code paths that are not easily triggerable from outside. And if that's still not enough, you can write multi-threaded stress/chaos tests that will expose any race conditions in your code. All of that can run both on VMs and on real hardware as well.

I agree. I have a few stress test applications, but I don't feel they do a particularly good job. The best test I have actually is our commercial terminal application, and maybe particularly the part that downloads it from Internet and installs it. Although the terminal itself is a good test tool too. My home server that collects data on my internal network and services my web-site at home is a pretty good tool too.

However, recent problems with USB, particularly on one Intel machine with a builtin hub have been a challenge. To test that I need to use different USB devices on different USB hardware, including the problematic one with a builtin hub.

vvaltchev wrote:

Oh yeah, random crashes are terrible. Automated testing using CI helped me catch many of those cases. Actually, I started writing more tests just to increase the stability of my kernel. I observed that, even if on my local machine in a VM is harder to reproduce certain bugs, when the same tests are run in CI system in the "cloud", I was able to catch more bugs because there the my code was preempted much more often so, interrupts might come later etc. In other words, the opposite of a real time system. That's a terrible environment, but it's also good because it "shakes" the whole timing and triggers real bugs (race conditions).

The terminal application caught those pretty quickly too.

vvaltchev wrote:

OK, just for completeness, I'll add that in my knowledge, big companies can do automated tests with hardware as well: they have special HW devices (assumed as reliable) that simulate real input (PS/2, USB, Ethernet, PCIe, whatever) so they can treat a whole machine (HW + software) like a sort of "black box", exactly as we can test an OS using a VM. Each of those special devices, looks like a regular device of type XYZ to the tested machine's kernel, but in reality, its behavior is controlled from outside. Not an expert here, I just know such things exist.

Certainly. It's just so much work behind creating those that I don't feel it is worth it.

Although, as part of programming my FPGA, I did a test to stress the interrupt system with more or less random interrupts at very high speed. Although, it didn't find any problems, but it was an interesting test to make.

dengeltheyounger · **Joined:** Wed Feb 24, 2021 8:52 pm **Posts:** 25

I'm really new to the game, and so I'm not sure how useful this is. I did not have much luck attaching gdb to QEMU. It led to weird errors involving packet length or something like that. Right now, I have adapted the meaty skeleton so that it sets up GDT and IDT in boot.S. I'm in the process of getting a memory map through the multiboot specification.

The two primary ways that I test and debug is through print statements and also through exceptions. By adding one small piece at a time, I get a general idea of where the exception occurred. For example, when I tried getting the amount of available memory < 1 MiB, I got a machine check exception. I knew immediately it was due to the memory prober (which only sought out available memory < 1 MiB at that point), and I figured out pretty quickly that the issue was that I was trying to make a BIOS call in protected mode (a newbie mistake). In other cases, I can use print statements to go step by step and confirm that things look sane.

As my kernel gets bigger, I hope to figure out why gdb and QEMU are misbehaving. I also hope to have more robust exception handlers that give more information than the name of the exception. Ideally, I'll do a stack trace and also implement a kernel panic. For now, though, I've got the name of the exception, the general location, and also print statements to guide my way.

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

GDB usually misbehaves if you are using x86_64-system-qemu in 32-bit mode. In 64-bit mode it works fine again.

OSDev.org

How do you test your kernel project?

Who is online