Advice for debugging a memory allocator

Scoopta · **Joined:** Tue Jun 13, 2017 3:17 am **Posts:** 23

A while ago I wrote 2 memory allocators for my kernel, one for virtual memory and one for physical memory. From the limited amount they're currently used they both seem to work properly however as a method of procrastinating the next part of the kernel I'm wanting to sort of stress test them to make sure they work properly and don't leak like a sieve. I've ported the virtual allocator to Linux userspace using mmap with the hope that it'd be easier to catch problems but I'm not actually sure how to test it. How do you find memory leaks or problems in a memory allocator? All my experience is around userland C with tools like libasan which I don't believe will work for this. Additionally how do I test my physical allocator or even port it to userspace?

devc1 · **Posted:** Mon Aug 07, 2023 6:11 am

If you want to find memory leaks, try to allocate as much memory as you have (for e.g. page per page), and set all the allocated memory to -1 then test the free function, use your memory map and free every address page per page, try to free it another time without allocating to check if the free function works properly.
Repeat it multiple times.
Check if the variables "UsedMemory" or "FreeMemory" in your allocator are properly set each time after you stress test.

If everything works fine, you should be able to do that and redo it multiple times without any crash or weird behaviour.

Also, if you want to benchmark your allocator, just use a timer like HPET and allocate for example a million pages, then read the time that has passed while allocating those pages, you should know how to convert it to allocations/s and display it. Same thing for the free function.

If your system supports multiprocessing, try creating different threads that execute the stress test, for example 3 threads per processor.

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 403

Scoopta wrote:

I've ported the virtual allocator to Linux userspace using mmap with the hope that it'd be easier to catch problems but I'm not actually sure how to test it. How do you find memory leaks or problems in a memory allocator? All my experience is around userland C with tools like libasan which I don't believe will work for this.

Problems in the memory allocator itself should be relatively easy to test. Just allocate and deallocate memory, with perhaps instrumentation turned on in debug to verify the allocator meta-data.

I don't worry about memory leaks myself. I have a mark/sweep garbage collector handle freeing memory, but the same code can also be used to look for memory that is not referenced and should have been freed, but hasn't been.

Basically, when you want to find a leak, you:

- Start from known roots - global variables, thread stacks, thread CPU register state. For each datum that looks like a pointer to the heap, look up in the heap whether there is an allocation for that pointer.
- For any allocations found from the roots, scan those allocations looking for data that looks like a pointer in the heap, as you did for the roots.
- This will result in more allocations to scan. Keep doing that until you run out of allocations to scan (basically, until you have no more allocations you haven't already scanned)
- This will leave you with two types of allocations - Allocations that have been resolved from roots, possibly via other allocations. These allocations are still in use. And allocations that have not been resolved from roots. These are your leaked allocations.

In a mark-sweep garbage collection, these leaked allocations are the garbage, and my code sweeps those away and reuses them.

You can instead report them as leaks, though.

See Boehm GC for details.

My own allocator is here, which operates similarly to the Boehm GC, but is simpler (it's not entirely general purpose, it doesn't handle arbitrarily big allocations.)

I also came across a small GC library here, which is a lot simpler than mine, but has unbounded stack growth as the scanning is recursive (you can't really do that with an 8K kernel stack.) Something like this might be worth studying to form the basis of a leak detector.

Just like with GC, you won't want threads running, mutating your graph of references, so you'll want to "stop the world" when doing this leak detection. Perhaps it can be an idle time task, when other threads are not running anyway?

Refs:

https://hboehm.info/gc/leak.html
http://thewrongchristian.org.uk:8082/fi ... ibk/slab.c
https://github.com/mkirchner/gc

devc1 wrote:

use your memory map and free every address page per page, try to free it another time without allocating to check if the free function works properly.

Of course, the only reasonable thing for an allocator to do when faced with a double free is some sort of panic. While it is perhaps relatively easy to catch this case and preserve the meta-data in a correct form, this is a case that already spells disaster for the wider kernel.

If you've freed a block of memory twice, it's clear at least two pieces of code thinks it owns the region, and at least one has been using the region after it is invalid.

Such a situation should be fatal for a kernel, with as much information preserved in the panic report as possible (such as a stack trace where the second free has come from) so that the source of the error can be found.

It is not safe to continue in this circumstance, though (IMO.) The kernel is compromised, and these sorts of faults are often sources of exploits.

Scoopta wrote:

Additionally how do I test my physical allocator or even port it to userspace?

I wouldn't bother. Physical memory allocation is at a different level to virtual address space mapping, and there isn't really an equivalent in user space.

If you were really desperate to test it though, you could use a fixed size file to represent your physical memory, and use lots of small mmaps to map each page of that file to a user virtual address at the page level. You can then allocate "physical pages" from your file, and map them into your address space using mmap.

But your underlying kernel is unlikely to thank you, though I think most modern kernels use some sort of tree based virtual address space map these days, rather than a linear sorted list more common in older kernels.

linguofreak · **Joined:** Wed Mar 09, 2011 3:55 am **Posts:** 509

thewrongchristian wrote:

If you've freed a block of memory twice, it's clear at least two pieces of code thinks it owns the region, and at least one has been using the region after it is invalid.

Such a situation should be fatal for a kernel, with as much information preserved in the panic report as possible (such as a stack trace where the second free has come from) so that the source of the error can be found.

It is not safe to continue in this circumstance, though (IMO.) The kernel is compromised, and these sorts of faults are often sources of exploits.

I'm not sure that you can safely assume that free-after-free is a guaranteed indicator that use-after-free has occurred. For a small personal project where you know your own coding practices (assuming you adhere to them rigorously) it may be, but if you accept external contributions, or come back to code after not having touched it for a while, or aren't disciplined about adhering to your own rules, I can imagine scenarios where a free call simply ends up duplicated accidentally at two different points after the last use of the pointer in question, possibly even in the same function.

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 403

linguofreak wrote:

thewrongchristian wrote:

If you've freed a block of memory twice, it's clear at least two pieces of code thinks it owns the region, and at least one has been using the region after it is invalid.

Such a situation should be fatal for a kernel, with as much information preserved in the panic report as possible (such as a stack trace where the second free has come from) so that the source of the error can be found.

It is not safe to continue in this circumstance, though (IMO.) The kernel is compromised, and these sorts of faults are often sources of exploits.

I'm not sure that you can safely assume that free-after-free is a guaranteed indicator that use-after-free has occurred. For a small personal project where you know your own coding practices (assuming you adhere to them rigorously) it may be, but if you accept external contributions, or come back to code after not having touched it for a while, or aren't disciplined about adhering to your own rules, I can imagine scenarios where a free call simply ends up duplicated accidentally at two different points after the last use of the pointer in question, possibly even in the same function.

It doesn't have to be a guaranteed indicator. Most buffer overflows also don't affect program control, but a buffer overflow is a desperately bad indicator. If nothing else, a panic will draw your attention to it quicker.

Still, it raises another point about use after frees. For use after frees to go unnoticed, the previous contents must appear valid, or sufficiently valid for the code to continue to appear to work.

So, some more debugging advice on allocators, when you free memory, spoil the contents of the allocation so it's unlikely to be valid any more. Then, if the contents are used by anything else, they're likely to crash when used with the spoiled contents. Spoiling it with values that, when interpreted as pointers, can be easily spotted and diagnosed by the kernel would be best.

Scoopta · **Joined:** Tue Jun 13, 2017 3:17 am **Posts:** 23

devc1 wrote:

If you want to find memory leaks, try to allocate as much memory as you have (for e.g. page per page), and set all the allocated memory to -1 then test the free function, use your memory map and free every address page per page, try to free it another time without allocating to check if the free function works properly.
Repeat it multiple times.
Check if the variables "UsedMemory" or "FreeMemory" in your allocator are properly set each time after you stress test.

If everything works fine, you should be able to do that and redo it multiple times without any crash or weird behaviour.

Hmmm, for some reason I figured it'd be pretty difficult to test this in kernel space but the way you described it seems pretty straight forward. Albeit I actually don't have a global count of utilized memory...but I probably should and then adding that makes it easier to test this

devc1 wrote:

Also, if you want to benchmark your allocator, just use a timer like HPET and allocate for example a million pages, then read the time that has passed while allocating those pages, you should know how to convert it to allocations/s and display it. Same thing for the free function.

If your system supports multiprocessing, try creating different threads that execute the stress test, for example 3 threads per processor.

I'm not that worried about bench-marking it and I don't have HPET support as the HPET isn't always present. I opted for calibrating the APIC timer using the ACPI timer instead...not entirely sure how accurate my timers are but I'm not that concerned about the performance of the allocator right now anyway. Too early in development for that to be a major concern, I just want to make sure it's working.

Scoopta · **Joined:** Tue Jun 13, 2017 3:17 am **Posts:** 23

thewrongchristian wrote:

Scoopta wrote:

I've ported the virtual allocator to Linux userspace using mmap with the hope that it'd be easier to catch problems but I'm not actually sure how to test it. How do you find memory leaks or problems in a memory allocator? All my experience is around userland C with tools like libasan which I don't believe will work for this.

Problems in the memory allocator itself should be relatively easy to test. Just allocate and deallocate memory, with perhaps instrumentation turned on in debug to verify the allocator meta-data.

...right...having easily accessible metadata isn't a thing I have right now...guess I should fix that first.

thewrongchristian wrote:

I don't worry about memory leaks myself. I have a mark/sweep garbage collector handle freeing memory, but the same code can also be used to look for memory that is not referenced and should have been freed, but hasn't been.

That's wayyy too fancy for my kernel right now. I'm just trying to get basic functionality working.

thewrongchristian wrote:

Basically, when you want to find a leak, you:

- Start from known roots - global variables, thread stacks, thread CPU register state. For each datum that looks like a pointer to the heap, look up in the heap whether there is an allocation for that pointer.
- For any allocations found from the roots, scan those allocations looking for data that looks like a pointer in the heap, as you did for the roots.
- This will result in more allocations to scan. Keep doing that until you run out of allocations to scan (basically, until you have no more allocations you haven't already scanned)
- This will leave you with two types of allocations - Allocations that have been resolved from roots, possibly via other allocations. These allocations are still in use. And allocations that have not been resolved from roots. These are your leaked allocations.

I need substantially better kernel debugging to have any hope of doing half of what you've mentioned here. That's why I ported the memory allocator to userspace. I have qemu running a gdb server so I'm not completely blind but it's not massively easy to investigate all that stuff...or I'm just very bad with gdb, probably the latter.

thewrongchristian wrote:

In a mark-sweep garbage collection, these leaked allocations are the garbage, and my code sweeps those away and reuses them.

You can instead report them as leaks, though.

See Boehm GC for details.

My own allocator is here, which operates similarly to the Boehm GC, but is simpler (it's not entirely general purpose, it doesn't handle arbitrarily big allocations.)

I also came across a small GC library here, which is a lot simpler than mine, but has unbounded stack growth as the scanning is recursive (you can't really do that with an 8K kernel stack.) Something like this might be worth studying to form the basis of a leak detector.

Just like with GC, you won't want threads running, mutating your graph of references, so you'll want to "stop the world" when doing this leak detection. Perhaps it can be an idle time task, when other threads are not running anyway?

Refs:

https://hboehm.info/gc/leak.html
http://thewrongchristian.org.uk:8082/fi ... ibk/slab.c
https://github.com/mkirchner/gc

Soooo GC would be really cool to have but I'm trying not to use outside code in my kernel, I plan on porting a good chunk of existing userspace stuff when I get there but I'm trying to keep the kernel to my own project as importing code that I don't really understand bothers me. Unless I'm working on something where the details don't matter to me, but writing an OS is about learning stuff so I'm not really interested in trying to port a GC to my kernel. I might try to learn how to write a GC at some point but that's not currently on my roadmap.

thewrongchristian wrote:

Scoopta wrote:

Additionally how do I test my physical allocator or even port it to userspace?

I wouldn't bother. Physical memory allocation is at a different level to virtual address space mapping, and there isn't really an equivalent in user space.

If you were really desperate to test it though, you could use a fixed size file to represent your physical memory, and use lots of small mmaps to map each page of that file to a user virtual address at the page level. You can then allocate "physical pages" from your file, and map them into your address space using mmap.

But your underlying kernel is unlikely to thank you, though I think most modern kernels use some sort of tree based virtual address space map these days, rather than a linear sorted list more common in older kernels.

Fair enough...I didn't think about the file idea although it's not too bad...although you're right that linux might not like me for it.

OSDev.org

Advice for debugging a memory allocator

Who is online