How can I impliment Paging

Octocontrabass · Post by **Octocontrabass** » Mon Feb 26, 2024 11:00 am

Locks by themselves aren't enough to protect you from speculative TLB fills. Any CPU sharing the same set of page tables might mispredict an address and load garbage into its TLB.

Atomics can protect you from speculative TLB fills, but only if you use them to ensure no CPU ever sees an unwanted present entry. You'd still need to clear memory before using it as part of your page tables. The x86 memory model is pretty strict, so there's not much overhead to this strategy - it's mostly preventing compiler optimizations from reordering specific writes.

Flushing the TLB is always an option. It's expensive, but maybe you'll decide that expense is acceptable for your OS.

rdos · Post by **rdos** » Tue Feb 27, 2024 1:55 am

Octocontrabass wrote:Locks by themselves aren't enough to protect you from speculative TLB fills. Any CPU sharing the same set of page tables might mispredict an address and load garbage into its TLB.

Well, in kernel, the page directory level is typically protected by region type. For instance, the byte linear memory allocator and the page based linear memory allocator both have locks before an allocation can start. If a linear address is generated that doesn't have a page directory entry, this will cause a page fault, regardless if the address is generated speculatively or not. If this occurs outside of locks, then it is a serious error in kernel code that would be discovered before it happens to corrupt TLB. Also, kernel level page directories entries are never freed, so this cannot happen because an old address is used which once was related to a valid page directory entry.

Also, kernel memory access is done through selectors, and so addresses in the above two regions cannot be speculatively generated since only specific selectors map them.

For user space, you might have a point. I probably should look into this, in addition to fixing the problems with demand paging.

Octocontrabass · Post by **Octocontrabass** » Tue Feb 27, 2024 12:29 pm

rdos wrote:If a linear address is generated that doesn't have a page directory entry, this will cause a page fault, regardless if the address is generated speculatively or not.

If the speculative page table walk ends in a not-present entry, then there is no problem. Even if the CPU loads the not-present entry into its TLB, a not-present entry can't cause unexpected translations. (Though, if I recall correctly, Intel and AMD CPUs will never load a not-present entry into the TLB.)

Speculative accesses don't cause page faults unless they are committed.

rdos wrote:Also, kernel memory access is done through selectors, and so addresses in the above two regions cannot be speculatively generated since only specific selectors map them.

Speculative execution may bypass privilege checks. You must always assume that any CPU may load any page table entry into its TLB at any time.

rdos · Post by **rdos** » Tue Feb 27, 2024 1:33 pm

Octocontrabass wrote:
rdos wrote:Also, kernel memory access is done through selectors, and so addresses in the above two regions cannot be speculatively generated since only specific selectors map them.
Speculative execution may bypass privilege checks. You must always assume that any CPU may load any page table entry into its TLB at any time.

My design is not sensitive to these attacks. The user mode code & data selectors both have a 3GB limit, and kernel data resides from 3GB to the top of memory. Thus, even if a successful attack on the paging system occurs, reading the data is stopped by limit checking at the segmentation level. Segment base & limits are always evaluated before paging, which is also why Intel claims that not using 4G selectors will incur penalties. However, this design makes sure that user mode cannot access kernel space even if it can bypass paging. It also makes sure that pointers passed from user mode in registers cannot access kernel data indirectly in kernel space.

Octocontrabass · Post by **Octocontrabass** » Tue Feb 27, 2024 1:40 pm

rdos wrote:My design is not sensitive to these attacks. The user mode code & data selectors both have a 3GB limit, and thus do not allow access to kernel data, regardless if a successful attack on the paging system can occur or not.

It's not an attack on the paging system, it's an attack on the privilege system. It bypasses segment limits just as easily as it bypasses page permissions. You must always assume that any CPU may load any page table entry into its TLB at any time.

(And the fact that it's an attack is irrelevant: the point is that you can't use segment limits to stop a CPU from filling its TLB.)

rdos · Post by **rdos** » Tue Feb 27, 2024 1:46 pm

Octocontrabass wrote:
rdos wrote:My design is not sensitive to these attacks. The user mode code & data selectors both have a 3GB limit, and thus do not allow access to kernel data, regardless if a successful attack on the paging system can occur or not.
It's not an attack on the paging system, it's an attack on the privilege system. It bypasses segment limits just as easily as it bypasses page permissions. You must always assume that any CPU may load any page table entry into its TLB at any time.

(And the fact that it's an attack is irrelevant: the point is that you can't use segment limits to stop a CPU from filling its TLB.)

It is an attack on the paging system using the TLB. Segment checks always preceed accessing linear memory, particularly the base must always added before an linear address can be used. This is true regardless if the CPU is doing speculative execution or not. And even if the limit checking occurs in paralell, the second stage in the attack that reads the cached TLB cannot execute since the limit ALWAYS must be checked before the instruction is completed. It's only poor emulators that lack limit checking. There is no CPU that I know of right now that will not do limit checking in protected mode.

If we are not talking about an attack, rather user mode accidentially trying to access kernel mode and messing up the TLB, that's a serious error that should be caught during testing.

Octocontrabass · Post by **Octocontrabass** » Tue Feb 27, 2024 2:58 pm

rdos wrote:And even if the limit checking occurs in paralell, the second stage in the attack that reads the cached TLB cannot execute since the limit ALWAYS must be checked before the instruction is completed.

But, if the instruction never completes, the limit doesn't need to be checked. Since the limit doesn't need to be checked until the instruction completes, speculative accesses can bypass the limit check.

rdos wrote:If we are not talking about an attack, rather user mode accidentially trying to access kernel mode and messing up the TLB, that's a serious error that should be caught during testing.

How would you catch this during testing? User mode doesn't try to access kernel mode, it's only speculative execution. You might need extremely specific conditions for speculative execution to cause a TLB fill while another CPU is manipulating the page tables.

rdos · Post by **rdos** » Tue Feb 27, 2024 3:37 pm

Octocontrabass wrote:
rdos wrote:And even if the limit checking occurs in paralell, the second stage in the attack that reads the cached TLB cannot execute since the limit ALWAYS must be checked before the instruction is completed.
But, if the instruction never completes, the limit doesn't need to be checked. Since the limit doesn't need to be checked until the instruction completes, speculative accesses can bypass the limit check.

Speculative access might be able to access the page tables if limits are checked in parallel, but user mode can never read or write the actual data since reading something requires passing the limit check. This is not so in Windows or Linux since they have 4G selectors, and so can access the paging structure of kernel mode too, and only the page privilege checks would stop it from doing so. So, if usermode can fill the TLB with invalid permissions, it can read out the data with no problem. That is what the attack exploits.

You might also note that if long mode maps physical memory at a fixed position to allow for easy access, then it would be rather easy to attack this and gain access to all physical memory in user space. Which is something of a nightmare, and a pretty good reason why physical memory should NOT be mapped in linear address space. Because what is not mapped cannot easily be accessed, neither by an attacker, nor by buggy kernel code.

Octocontrabass wrote:
rdos wrote:If we are not talking about an attack, rather user mode accidentially trying to access kernel mode and messing up the TLB, that's a serious error that should be caught during testing.
How would you catch this during testing? User mode doesn't try to access kernel mode, it's only speculative execution. You might need extremely specific conditions for speculative execution to cause a TLB fill while another CPU is manipulating the page tables.

Possibly, but it is so highly unlikely that I don't think I need to consider it. There probably are many more likely problems to solve first.

Octocontrabass · Post by **Octocontrabass** » Tue Feb 27, 2024 5:16 pm

rdos wrote:Speculative access might be able to access the page tables if limits are checked in parallel, but user mode can never read or write the actual data since reading something requires passing the limit check.

Whether you can or can't use bad TLB entries to exploit the kernel is beside the point. Bad TLB entries will cause your kernel to misbehave or crash, so you need to ensure that you modify page tables in a way that prevents bad TLB entries from being created.

iProgramInCpp · Post by **iProgramInCpp** » Sat Mar 02, 2024 3:53 am

nullplan wrote:You end up adding uninitialized memory to the paging structures, which can lead to the processor adding all sorts of nonsense to the TLB that would need to be cleared out once the page is initialized.

That is incorrect, as long as you aren't doing it wrong. Namely, you'll want to zero out the specific page before you insert it into the page table tree. Then no nonsense would be inserted into the TLB.

Also the processor doesn't "insert" TLB entries until you actually access them. This is why you don't have to use "invlpg" when you map memory in, only unmap.

Recursive paging is a very fine method of handling it, and I think a very clean one. (I'm not yet using it because I'm having skill issues regarding implementation details, but I could implement it relatively easily if I wrote my MM from scratch)

iProgramInCpp · Post by **iProgramInCpp** » Sat Mar 02, 2024 3:54 am

Octocontrabass wrote:Locks by themselves aren't enough to protect you from speculative TLB fills. Any CPU sharing the same set of page tables might mispredict an address and load garbage into its TLB.

Garbage will not be loaded into the TLB unless you specifically insert garbage into your page table tree. Simply zero out the page before inserting it.

Octocontrabass · Post by **Octocontrabass** » Sat Mar 02, 2024 1:14 pm

iProgramInCpp wrote:That is incorrect, as long as you aren't doing it wrong. Namely, you'll want to zero out the specific page before you insert it into the page table tree. Then no nonsense would be inserted into the TLB.

That requires a temporary mapping, which is the sort of thing recursive mappings are supposed to help you avoid.

iProgramInCpp wrote:Also the processor doesn't "insert" TLB entries until you actually access them. This is why you don't have to use "invlpg" when you map memory in, only unmap.

The processor can insert TLB entries at any time, whether you access them or not. Intel and AMD processors either don't allow non-present entries in their TLBs or automatically flush and retry before raising a page fault, so there's no need to flush the TLB when a page changes from not present to present, but in general you must flush the TLB when a present page changes, even if you never accessed the page with the earlier mapping.

iProgramInCpp · Post by **iProgramInCpp** » Sat Mar 02, 2024 11:17 pm

Octocontrabass wrote: That requires a temporary mapping, which is the sort of thing recursive mappings are supposed to help you avoid.

Yep, it does require a temporary mapping. No, this is not a thing that recursive mapping is supposed to help you avoid. Recursive mappings are supposed to help you avoid loading many page table levels into the TLB by faulting on the lowest level first, which recursively faults into higher levels, but only if necessary.

You can zero out a page without inserting it in the space in the page table tree where it actually belongs in one of the following ways (or maybe some other way):
- Reserve one page map slot in kernel space that you can quickly map something to, to zero out a page when faulting and putting the page in? (Ideally per core so you don't have to issue TLB shootdowns to other processors, if on SMP)
- Use an identity/offset mapping to access the physical memory region directly (no reason you shan't do this if you have plenty of address space like on x86_64)

This is, however, NOT actually necessary when you map in the final level of the page table tree, because the contents of the page before you zero it aren't potentially being read by a processor to fill in the TLB.

Octocontrabass wrote: you must flush the TLB when a present page changes, even if you never accessed the page with the earlier mapping.

Correct. Notice how I said "map memory in". Changing what memory is mapped in implies unmapping the previous page.

thewrongchristian · Post by **thewrongchristian** » Sun Mar 03, 2024 10:12 am

iProgramInCpp wrote: This is, however, NOT actually necessary when you map in the final level of the page table tree, because the contents of the page before you zero it aren't potentially being read by a processor to fill in the TLB.

They can potentially be read not just by the current processor, but also by any processor in the system.

Loading in uninitialised memory into the paging structure is very dangerous, as another thread on another processor can trigger a read of that page table even from user space, without intervention from the kernel. If the page table has a "mapping" for the address, because the uninitialised data happens to have the VALID bit set, then that mapping may be to a valid page that can be read by the referencing thread.

Would it be an easy exploit? Probably not. But it is a possible exploit that must be closed.

You must initialise the memory before you put in the paging structure, to not do so is a mistake.

iProgramInCpp · Post by **iProgramInCpp** » Sun Mar 03, 2024 11:06 am

thewrongchristian wrote:Loading in uninitialised memory into the paging structure is very dangerous

I said specifically the last level. The memory associated with the last level (whose entry belongs in a PT on x86 32-bit) isn't associated with the paging structure, rather being actual usable memory. It doesn't matter if the TLB reads the entry associated with that page before it's mapped. In fact you don't strictly have to zero that page out at all! Sure, you will be able to read what was in that page, but you're the kernel, and when dishing out pages to user space you of course would zero them out anyway.

Of course the other levels need to be zeroed out before being inserted. I was saying that the whole time!

OSDev.org

How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging

Re: How can I impliment Paging