Null pointers and pointer safety

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

Null pointers and pointer safety

Most OS's protect the null pointer and in practice this means that address 0x0000 (the first page) of each VAS (Virtual Address Space) is never allocated and is always not-present so any access to it can be detected.

Are there any other safety measures you employ or can think of that would help with pointer safety?

Some/many OS's fill memory with easily recognizable pattern (0xCCCCCC, 0xCAFEBABE, etc) that helps with debugging and may help analysis tools
Filling the first physical memory page with "magic" data so that if it ever gets assigned it would either guarantee or at least increase probability of a crash. Haven't really had a chance to think it thru as to whether it's actually useful and what the "magic" data should be exactly:
- I'm not really sure if this a valid concern; only thing that comes to mind would be a kind of null pointer in the PMM/VMM that would cause the first physical memory page to ever be mapped
- Needs to be "magic" as both data and code, which makes things a bit more tricky
- If data access to 0x0 then it may be 1, 2, 4 or more bytes long access, which might make the "magic" even trickier
- If data or code access and it's not to 0x0 but somewhere else in the first page then that might also be worth catching
Reserving the whole first 1MiB of VAS, I would guess that lower numbers are somehwat more likely than random values so this might protect when a "normal" variable is accidentally cast to a pointer; again, not really sure if it's actually useful
I've also sometimes thought about automatically growing stack/heap (demand paging whatever the app happens to use), while it might improve performance a little in some cases I think I'd rather prefer the explicitness of brk/mmap and get the security/stability instead

Any thoughts on the above things or pointer safety in general? Any other ideas?

The "magic first physical page" is a kind of recursion in the concept of the normal virtual address null pointer, anyone got any ideas on how to take that further (if it's possible)?

eryjus · **Posted:** Mon May 22, 2017 4:28 pm

LtG wrote:

Any thoughts on the above things or pointer safety in general? Any other ideas?

I have been mildly considering this as well. I am not at a point I can take this on fully at the moment. However, some things I have been thinking about:

* I have a rather large block of address space that I was planning on labelling as "poison" (to steal a Linux term), where this space is not mapped. Any accesses to this address space will cause a page fault.
* I am also planning on setting up special addresses for each structure with a reasonable amount of space between them to keep errant pointer arithmetic from stepping on other address spaces.
* Rather than initializing to NULL, I would have to teach myself not only to initialize my pointers to a poison value but also clean them up when deallocating them.
* What I am hoping to achieve from this is not only an address for the instruction causing the problem, but also the ability to identify the type of structure I am looking for with a sense of what kind of operations that were done to the address as well.
* This is specifically for kernel pointers, not user-space pointers.

Now, with that said, I remember Brendan had another set of poison values that could be used to "decorate" the contents of a page to also protected against accidental execution. This post is several years old (pushing a decade maybe...?). I could not dig it up with a quick search; I will keep searching though. I vaguely remember it being an invalid op-code that would fault immediately no matter where it landed in that address space. Perhaps Brendan's memory is better than mine.

StudlyCaps · **Posted:** Mon May 22, 2017 7:07 pm

I have a question on the "poison" pointers. If you have to remember to initialize the pointer to a poison value, why not just use 0x0? It's expected to point at an invalid page and when you try to do things like frees and comparisons, it acts like you'd expect it to.

One thing that might be useful is to have functionality in the kernel heap allocater to automatically create and cleanup guard pages around allocations. Particularly in long mode where you have tons of virtual address space.

Brendan · **Posted:** Mon May 22, 2017 7:08 pm

Hi,

Some notes...

I wouldn't worry about the physical address space - if your physical memory manager is buggy then using perfectly valid/legal physical page/s at or near 0x00000000 is the least of your problems.

For user-space usage of the user-space part of virtual address spaces, it's purely a user-space problem. The kernel should only be responsible for doing what the process tells it to do. If a process tells the kernel to make the area from 0x00000000 to 0x7FFFFFFFF "normal read/write" (e.g. possibly because it's an emulator and that happens to make it faster/easier for that emulator to emulate "2 GiB of guest physical RAM") then kernel should do what it's told and handle that without a single complaint or problem.

Typically (assuming some form of "spawn process" kernel API function) all virtual addresses in user-space are inaccessible by default; then the kernel (or executable loader) maps the executable into the virtual address space (in a manner determined by linker and executable headers) causing certain areas (".text", ".rodata", etc) to be configured as present with certain permissions. This means that when the process starts executing there's huge areas of "still inaccessible by default". Then the process sets up whatever it wants however it wants, like maybe a small initial "read/write" area for it's initial heap, but there's still huge areas of "still inaccessible by default". Note: for "fork()" it's an ugly mess where "exec()" has to clean up the pointless trash created during "fork()", but you still reach a point where there's huge areas of "still inaccessible by default" after "exec()" has done its cleanup.

All areas of "still inaccessible by default" contain nothing because they're not accessible, so "special magic values" doesn't make sense for this. The "special magic values" are typically part of whatever the process felt like doing for dynamic memory management (e.g. things like filling areas with 0xDEADBEEF when you call "free()" in the hope of detecting "used after it was freed" bugs); and are mostly used so that you can tell (by which special value was used) if the bug is caused by a NULL pointer or a "used after free" problem or some other type of problem. Again, this has nothing to do with the kernel and is still purely a user-space problem (e.g. something that some processes might do and other processes might not do).

For security reasons; a kernel should ensure that physical pages are filled with something when they're allocated, so that one process can't obtain data left behind (e.g. passwords, etc) by whichever process used that page last. Because it's likely that a lot of processes will have an "(un)initialised to zero" area (e.g. ".bss" section) it makes the most sense for a kernel to ensure that physical pages are filled with zero (so that a process doesn't need to clear its ".bss" section before use).

Note that most kernels will map a the same physical page full of zeros as "read only" at many virtual addresses and then do an "allocate on write" thing to avoid allocating lots of physical pages (e.g. for ".bss section") at process startup and (potentially) wasting a lot of RAM that never actually gets written to. In this case; there's no real reason kernel couldn't do the same with a different value instead of zeros (e.g. a single physical page full of 0xCC bytes mapped at many virtual addresses and used for "allocate on write"); and no real reason a kernel couldn't support 2 or more values (e.g. one physical page full of zeros, and a second physical page full of 0xCC bytes, etc); but any cases where the process wants the area filled with a different value that kernel doesn't support will force the process to fill the virtual pages itself, which will cause underlying physical pages to be allocated for the entire area (and potentially cause lots of RAM to be wasted).

eryjus wrote:

Now, with that said, I remember Brendan had another set of poison values that could be used to "decorate" the contents of a page to also protected against accidental execution. This post is several years old (pushing a decade maybe...?). I could not dig it up with a quick search; I will keep searching though. I vaguely remember it being an invalid op-code that would fault immediately no matter where it landed in that address space. Perhaps Brendan's memory is better than mine.

I'm not sure exactly what you're remembering. I tend to like using 0xCC as "not executable padding" in executable areas (because it's the "int3" instruction that causes a breakpoint if executed, and zero corresponds to an "add" instruction that often doesn't cause any exception and makes finding the bug harder). Also; back when I wrote my own Bochs BIOS I added a "fill all physical RAM with 0xCC bytes before BIOS starts OS" feature to help OS developers find bugs (you'd be surprised how often "oops, forgot to fill it with zero but it was already zero by accident" problems happen in things like boot code and aren't detected on most computers).

Cheers,

Brendan

alexfru · **Joined:** Tue Mar 04, 2014 5:27 am **Posts:** 1108

Brendan wrote:

If a process tells the kernel to make the area from 0x00000000 to 0x7FFFFFFFF "normal read/write" (e.g. possibly because it's an emulator and that happens to make it faster/easier for that emulator to emulate "2 GiB of guest physical RAM") then kernel should do what it's told and handle that without a single complaint or problem.

Except, it makes the kernel (and especially 3rd party kernel drivers) potentially more exploitable where a NULL check is missing. This was fixed in Windows several years ago (AFAIR, by default the area close to address 0 cannot be mapped, but there's a registry option to get the old behavior back).

Brendan · **Posted:** Mon May 22, 2017 7:52 pm

Hi,

alexfru wrote:

Brendan wrote:

If a process tells the kernel to make the area from 0x00000000 to 0x7FFFFFFFF "normal read/write" (e.g. possibly because it's an emulator and that happens to make it faster/easier for that emulator to emulate "2 GiB of guest physical RAM") then kernel should do what it's told and handle that without a single complaint or problem.

Except, it makes the kernel (and especially 3rd party kernel drivers) potentially more exploitable where a NULL check is missing. This was fixed in Windows several years ago (AFAIR, by default the area close to address 0 cannot be mapped, but there's a registry option to get the old behavior back).

Having 3rd party drivers in kernel-space makes the kernel exploitable regardless of what you do (including using full "accessing anything in user-space checking" like SMAP). Only having NULL pointer checking in this case is like putting a band-aid on a decapitated person's "neck stump".

Of course if you allow processes to use the area at 0x00000000, very few processes will use the area, so it's very unlikely that it'll make any difference to kernel bug/exploit detection.

Cheers,

Brendan

Boris · **Joined:** Sat Nov 07, 2015 3:12 pm **Posts:** 145

alexfru wrote:

Except, it makes the kernel (and especially 3rd party kernel drivers) potentially more exploitable where a NULL check is missing.

IMHO null pointer check should be done in userspace.
And strictly speaking, since i find it rare were lazy loading isnt antipattern, non initialized symbols should be a compilation error, and null pointers should not exist at all except for some borderline cases where you have structs like linked lists. If you want a struct with optional components, use a list of tags like multiboot2 does.

But if you want your security work for brendan's example, you could just set in your ABI a symbol for a "unmappable area" set by default to 0x0 , some people would set 0xDEADBEEF&pageAlign..

MichaelFarthing · **Posted:** Tue May 23, 2017 4:21 am

Ancient people, however, might remember the time when the C convention was precisely that null pointed to a vaid address traditionally initialised to a zero byte so that a null pointer was treated as equivalent to an empty string.

Sik · **Joined:** Wed Aug 17, 2016 4:55 am **Posts:** 251

Boris wrote:

And strictly speaking, since i find it rare were lazy loading isnt antipattern, non initialized symbols should be a compilation error, and null pointers should not exist at all except for some borderline cases where you have structs like linked lists. If you want a struct with optional components, use a list of tags like multiboot2 does.

There are several cases (not just things like structures) where you need a way to mark something is missing, that's why there are "optional" types in newer languages (where you can't use the pointer stored in them until you explciitly split away nulls from not nulls). Of course this leads to error checking overload, when in many cases you could use for "null" objects to simply have a default behavior (e.g. null strings are empty, null files are 0 bytes long, null bitmaps are 1×1 and transparent, etc.) so you can decide to just check when a response is really needed. And of course, something to be said about using addresses directly instead of IDs (the former can prevent a smart memory allocator from rearranging blocks of memory to optimize further accesses).

OK, I went on a tangent...

I think the original post was talking about a different issue though. As long as the operating system can run native binaries there's always going to be programs written in languages that allow null pointers, so anything that makes it easier to spot that kind of issues should be welcome (including crashing early and leaving behind blatant clues).

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

Brendan wrote:

I wouldn't worry about the physical address space - if your physical memory manager is buggy then using perfectly valid/legal physical page/s at or near 0x00000000 is the least of your problems.

I was thinking along the same lines, in addition the added complexity (even if minor) may make bugs more likely in the PMM. I'll just unit test the PMM as much as reasonable.

Brendan wrote:

For user-space usage of the user-space part of virtual address spaces, it's purely a user-space problem.

True, I was thinking of putting it in the "run time environment", but I think I'll actually leave it to the dev tools (compiler/linker) and let them decide what's best for them.

Brendan wrote:

Note that most kernels will map a the same physical page full of zeros as "read only" at many virtual addresses and then do an "allocate on write" thing to avoid allocating lots of physical pages (e.g. for ".bss section") at process startup and (potentially) wasting a lot of RAM that never actually gets written to. In this case; there's no real reason kernel couldn't do the same with a different value instead of zeros (e.g. a single physical page full of 0xCC bytes mapped at many virtual addresses and used for "allocate on write"); and no real reason a kernel couldn't support 2 or more values (e.g. one physical page full of zeros, and a second physical page full of 0xCC bytes, etc); but any cases where the process wants the area filled with a different value that kernel doesn't support will force the process to fill the virtual pages itself, which will cause underlying physical pages to be allocated for the entire area (and potentially cause lots of RAM to be wasted).

But if you let the userland manage the virtual memory then it can write whatever it wants into one page and have the backing physical memory be referenced from many virtual pages. This way the kernel doesn't actually have to care (or directly support multiple possibilities with good performance), beyond that _something_ must be used, either 0x00 implicitly or the page the userland says, these pages are tagged as CoW so when a write occurs the VMM only needs to duplicate the page.

As an aside, I was also thinking about outsourcing the #PF handler to userspace, though not sure how much benefit it would provide. Couple of things to keep in mind:
- The exception should not disable interrupts, otherwise the process could use it intentionally to disable interrupts
- In practice it would require making at least part of the page directory available for userspace to read, but not write. This would expose the physical addresses used, not sure how important that is though.

I don't have any stats but I would expect most #PF's to be caused by CoW (including demand-paged)/swapped and in both cases I think there's no alternative but to call the kernel to resolve it, since resolving it requires altering the page directory.

Given the above it might actually hurt performance..

Rusky · **Joined:** Wed Jan 06, 2010 7:07 pm **Posts:** 792

Brendan wrote:

Hi,

alexfru wrote:

Brendan wrote:

If a process tells the kernel to make the area from 0x00000000 to 0x7FFFFFFFF "normal read/write" (e.g. possibly because it's an emulator and that happens to make it faster/easier for that emulator to emulate "2 GiB of guest physical RAM") then kernel should do what it's told and handle that without a single complaint or problem.

Except, it makes the kernel (and especially 3rd party kernel drivers) potentially more exploitable where a NULL check is missing. This was fixed in Windows several years ago (AFAIR, by default the area close to address 0 cannot be mapped, but there's a registry option to get the old behavior back).

Having 3rd party drivers in kernel-space makes the kernel exploitable regardless of what you do (including using full "accessing anything in user-space checking" like SMAP). Only having NULL pointer checking in this case is like putting a band-aid on a decapitated person's "neck stump".

Of course if you allow processes to use the area at 0x00000000, very few processes will use the area, so it's very unlikely that it'll make any difference to kernel bug/exploit detection.

Unmapping the page at 0x00000000 is a useful exploit mitigation regardless of whether you have any third party code in kernel space, and regardless of whether the code you're protecting runs in kernel or user mode. Really, the fact that kernel code uses null pointers and shares its address space with user mode means whoever controls 0x00000000 also has a bit of extra control over the kernel.

For example, first-party kernel code that accidentally skipped a null check could wind up reading or writing what it thinks is its own data from user-space controlled memory. Obviously the fix is to stop the kernel from dereferencing potentially-null pointers in the first place, since null pointer dereferences can cause other vulnerabilities (e.g. when the user controls the offset and you're also missing a bounds check). But that doesn't make unmapping 0x00000000 useless- experience shows that there will be kernel bugs, no matter how competent and experienced the developers are.

The true solution, aside from exploit mitigation, is an automated and systematic check. For example, the number of null pointer dereference vulnerabilities in safe Java programs is zero- all they can do is terminate the program in a controlled way. You don't have to do it at runtime, either- exclude null as a valid value from pointer types and the compiler will ensure that an attacker can't even cause a NullPointerException.

Brendan · **Posted:** Tue May 23, 2017 7:15 pm

Hi,

Rusky wrote:

Brendan wrote:

Of course if you allow processes to use the area at 0x00000000, very few processes will use the area, so it's very unlikely that it'll make any difference to kernel bug/exploit detection.

Unmapping the page at 0x00000000 is a useful exploit mitigation regardless of whether you have any third party code in kernel space, and regardless of whether the code you're protecting runs in kernel or user mode. Really, the fact that kernel code uses null pointers and shares its address space with user mode means whoever controls 0x00000000 also has a bit of extra control over the kernel.

For example, first-party kernel code that accidentally skipped a null check could wind up reading or writing what it thinks is its own data from user-space controlled memory. Obviously the fix is to stop the kernel from dereferencing potentially-null pointers in the first place, since null pointer dereferences can cause other vulnerabilities (e.g. when the user controls the offset and you're also missing a bounds check). But that doesn't make unmapping 0x00000000 useless- experience shows that there will be kernel bugs, no matter how competent and experienced the developers are.

Let's do a conservative estimate!

If there's a 0.1% chance that a process actually uses the area at 0x00000000 in the first place, multiplied by a 0.01% chance that the micro-kernel has a NULL pointer bug that isn't detected by other methods (e.g. unit tests, the fact that the data "disappears" when you switch virtual address space, the fact that something in user-space got its code or data trashed, etc); then there'd be a 0.00001% chance that the NULL pointer bug won't be detected if it only happens once, a 0.000005% chance that the NULL pointer bug won't be detected if it only happens twice, and a 0.00000000001% chance that the NULL pointer bug won't be detected if it happens 1 million times (e.g. every time a new process is spawned or something).

More realistically, (for a micro-kernel) it's probably more likely that a shark will be struck by lightning while that shark is biting the computer.

On the other side; if you assume that there's a 0.1% chance that a process uses the area at 0x00000000, then you'd also have to assume that 1 process out of 1000 had a reason for wanting to use the area at 0x00000000, and you'd have to assume that by not allowing a process to use the area at 0x0000000 (when it has a reason to want to use this area) you ruin some kind of potential (performance and/or complexity and/or other) benefit for 1 process out of 1000.

Rusky wrote:

The true solution, aside from exploit mitigation, is an automated and systematic check. For example, the number of null pointer dereference vulnerabilities in safe Java programs is zero- all they can do is terminate the program in a controlled way. You don't have to do it at runtime, either- exclude null as a valid value from pointer types and the compiler will ensure that an attacker can't even cause a NullPointerException.

The true solution is to realise that "NULL" is only one of many possible "bad pointer" values, and NULL pointer checks are only a partial solution to the "bad pointer value" problem. The worst case is a pointer that points to the wrong piece of kernel data (e.g. a pointer that points to "process data for the old process" and not "process data for the new process"), where it still points to kernel data and therefore can't be detectable by any hardware based tricks (not-present pages, SMAP, etc), and could still point to the correct type of data and therefore might not be detectable by compile-time tricks (e.g. type checking) either.

If you solve the full "bad pointer value" problem (e.g. using unit tests, mathematical proofs, etc) you have no reason to care about a small subset of the problem (NULL pointers).

Cheers,

Brendan

Sik · **Joined:** Wed Aug 17, 2016 4:55 am **Posts:** 251

The main reason null pointers are such a big deal is because unlike dangling pointers (which are 100% a bug, period), null can be a value that shows up legitimately (standing for a missing object), which makes null pointer errors much more common since they may come from an otherwise seemingly valid source (on that note: something to be said about code trusting too much its input).

The other reason why they're a big deal is that they're obvious to detect: they point to address 0. So you can just make that address crash. Dangling pointers are actually more dangerous, since they can point to valid data (different from what it used to) which can result in severe security flaws and there isn't anything about their value that gives away that they're meant to be invalid, so it's literally impossible to check for them.

Essentially:

1) It's worth checking for null pointers because they're the more common case and reliable to detect. It's not foolproof, but it definitely covers a large surface of the problem. At least you can make those processes fail in a reasonably predictable way (i.e. always crash instead of leaving room for potentially corrupting or stealing data).

2) If you want to avoid this issue at all then the only solution is to forbid any programming language that uses real addresses themselves as pointers. You may be able to get away with that for the kernel (since you control what language it's programmed on), but you can't control that on the programs being executed on the OS short of also forbidding native binaries altogether.

I think #2 is the biggest problem, any nice idea you may have will not matter the moment there's something not under your control.

Rusky · **Joined:** Wed Jan 06, 2010 7:07 pm **Posts:** 792

Sik wrote:

2) If you want to avoid this issue at all then the only solution is to forbid any programming language that uses real addresses themselves as pointers. You may be able to get away with that for the kernel (since you control what language it's programmed on), but you can't control that on the programs being executed on the OS short of also forbidding native binaries altogether.

You don't have to go that far- you just need the right sort of verification on the use of pointers. Whether you do that in a limited codebase you control, or by forcing a verifiable representation for all programs, you can still use plain vanilla addresses as your pointer representation.

As an example, the Rust language enforces, at compile time, that no pointers will be used in a way that compromises memory safety. It does this with the same sort of runtime model as C- actual memory addresses as pointers, no garbage collector, no VM, etc.

Sik · **Joined:** Wed Aug 17, 2016 4:55 am **Posts:** 251

How do you enforce a verifiable representation if your OS takes native binaries though? The whole point I was making is that a native binary could have been written in just about any language (or even different portions written in several languages) and the OS has absolutely no way to know, making any sort of enforcement impossible.

If your OS can run programs written in C in any form, you better get ready to deal with the risk of wrong addresses being accessed, because it will happen.

OSDev.org

Null pointers and pointer safety

Who is online