OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 5:27 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 4:19 am 
Offline
Member
Member

Joined: Wed Aug 10, 2016 3:07 am
Posts: 31
Hello everyone,
I have a very simple question: how do I fix the meltdown vulnerability under my OS? I have read that the kernel code(but above all the data in kernel space) should be separated from the user code, but how?
Thank you for your answer.


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 6:06 am 
Offline
Member
Member

Joined: Mon Mar 27, 2006 12:00 am
Posts: 59
Location: UK
It depends if you have any sensitive data in your kernel - that is, data you don't want userspace code to be able to read (like keys, etc). Since meltdown only allows reading kernel memory, you only need protection if the kernel contains sensitive data.

If you're writing a monolithic kernel then probably you need it. If you're writing a microkernel then maybe not.

It also depends on the platform you're developing for. If your OS is just running on raspberry pis or something similar, those processors are not vulnerable and so you don't need any mitigation.

If you do need to protect against meltdown, then the first thing you need is some way of detecting whether you're on a vulnerable CPU (since you don't want to cause a massive slowdown if you don't have to). Second, you'll want to implement the feature called "kernel page table isolation". I'm not familiar with the specifics of how this is done (maybe someone else here has looked into it?) but I'd suggest looking at the linux kernel source for an example implementation.

Finally, I should point out that you can't "fix" meltdown since it's a bug in Intel processors. Technically you can only "mitigate" it :)


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 6:12 am 
Offline
Member
Member

Joined: Thu Jul 05, 2007 8:58 am
Posts: 223
The basic idea of kernel page table isolation is that different page tables are used in kernel mode (ring 0) versus user mode (ring 3) of the processor. This requires some small pieces of kernel code to still be present in the user space page tables, which are then responsible for doing the page table swap.

In essence, this piece of code is (one of) the first things to run after a mode switch, and would be responsible for switching in the kernel page tables when going from ring 3 to ring 0, and switching to the userspace one when going from ring 0 to ring 3.


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 7:01 am 
Offline
Member
Member

Joined: Tue May 13, 2014 3:02 am
Posts: 280
Location: Private, UK
One thing you can do is always immediately terminate a program if it causes a protection voilation page fault.

Meltdown relies on the OS giving applications a way to handle/ignore such faults so they can examine the state of the CPU cache after such a fault occurs. If the OS doesn't provide any way of doing that, there's no way to exploit the issue.


Also, one small advantage for those of us whose kernels are only 32-bit: hardware task switching can be used to improve the efficiency of KPTI...

Can anyone more familiar with x86 CPUs comment on whether simply performing a "WBINVD" in response to a priviledge violation page fault would provide significant mitigation? The only issue I can see there is that in a multi-core processor a thread running on another core may get the opportunity to inspect the CPU cache before the instruction runs (I suppose that's the same with the "immediately terminate" mitigation, but at least then the process's entire exploit has a very short time window, rather than just each read attempt)... Priviledge violation page faults shouldn't be common enough for it to cause a significant performance issue.

EDIT: Turns out that improving protection voilation page fault is not enough, see below...

_________________
Image


Last edited by mallard on Tue Jan 09, 2018 7:51 am, edited 3 times in total.

Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 7:07 am 
Offline
Member
Member

Joined: Wed Aug 10, 2016 3:07 am
Posts: 31
The idea would therefore be, when a system call is made, to load the kernel space and unload it after the system call is finished.
Does marking pages as non-present work properly (in relation to the flaw)?
My operating system is designed to run on x86 and is a monolithic kernel
In case of page errors, my bone kills the whole process if it accesses an unauthorized space (outside its address space): My OS is not concerned? Even if the problem is hardware


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 7:16 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
mallard wrote:
One thing you can do is always terminate a program if it causes a protection voilation page fault.

Meltdown relies on the OS giving applications a way to handle/ignore such faults...


That is not correct.

Meltdown works because of the CPU speculatively pre-fetches memory contents to cache. That fetch may be based on information that should trigger a page fault if that execution branch were actually executed -- but it isn't. The attack is done by checking what's in cache and what's not.

The following is a visualization of the process, not a showcasing how it's actually done:

Code:
unsigned kindex = 0xff80000; // kernel memory address
char * dummy = 0;

if ( /* some false condition */ )
{
    int v1 = dummy[ kindex ]; // WOULD trigger page fault
    unsigned uindex = ( v1 & 1 ) * 0x100; // 0x0 or 0x100
    int v2 = array[ uindex ]; // some array we have
}

// check whether array[ 0 ] or array[ 0x100 ] is in cache


The "if" part is never actually executed, so no page fault is ever triggered -- but we just determined the lowermost bit of 0xff80000.

This is a CPU bug, not something an OS "allows" for. The solution is to not have any critical memory mapped in ring 3 page tables (as davidv1992 described).

_________________
Every good solution is obvious once you've found it.


Last edited by Solar on Tue Jan 09, 2018 7:25 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 7:24 am 
Offline
Member
Member

Joined: Tue May 13, 2014 3:02 am
Posts: 280
Location: Private, UK
Solar wrote:
mallard wrote:
One thing you can do is always terminate a program if it causes a protection voilation page fault.

Meltdown relies on the OS giving applications a way to handle/ignore such faults...


That is not correct.

Meltdown works because of the CPU speculatively pre-fetches memory contents to cache. That fetch may be based on information that should trigger a page fault if that execution branch were actually executed.

The following is a visualization of the process, not a showcasing how it's actually done:

Code:
unsigned kindex = 0xff80000; // kernel memory address
char * dummy = 0;

if ( /* some false condition */ )
{
    int v1 = dummy[ kindex ]; // WOULD trigger page fault
    unsigned uindex = ( v1 & 1 ) * 0x100; // 0x0 or 0x100
    int v2 = array[ uindex ]; // some array we have
}

// check whether array[ 0 ] or array[ 0x100 ] is in cache


The "if" part is never actually executed, so no page fault is ever triggered -- but we just determined the lowermost bit of 0xff80000.


Thanks, that's a good example of why just improving handling of priviledge violation page faults is not enough. I though there'd have to be more to it when that approach seems to be not mentioned anywhere. Researching concrete information about this (and Spectre, which is much harder to mitigate) is difficult when 99.99% of articles are horribly dumbed-down summaries for the masses.

Also, by "never actually executed", I think you mean "never logically executed". The whole issue is that code that's not logically executed is actually executed thanks to speculative execution.

Still, implementing KPTI via hardware task switching is (as far as I can tell) still a valid mitigation for 32-bit systems where that's still possible (now if it emerges that AMD were aware of the issue back when they were designing x86_64, we'd have a nice little conspiracy theory there).

Solar wrote:
This is a CPU bug, not something an OS "allows" for.

No need to get patronising...
Solar wrote:
The solution is to not have any critical memory mapped in ring 3 page tables (as davidv1992 described)

Yes, KPTI...

_________________
Image


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 10:31 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
Why is KPTI + hardware switching faster than KPTI + software switching? Are there any benchmarks on that?

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 10:54 am 
Offline
Member
Member

Joined: Tue May 13, 2014 3:02 am
Posts: 280
Location: Private, UK
Korona wrote:
Why is KPTI + hardware switching faster than KPTI + software switching? Are there any benchmarks on that?


It's not "KPTI + hardware switching" it's "KPTI implemented using hardware task switching". I'm not talking about using hardware task switching for ordinary switching between procceses. That's known to be slow. I'm talking about using it to switch from userspace to kernelspace (and vice versa), keeping their page tables seperate.

Currently, conventional OSs have one task (TSS entry) per core and use "call gates" in the IDT to handle interrupts. Implementing KPTI "simply" means having two tasks per core (one for userspace and another for kernelspace) and using "task gates" to handle interrupts.

The "software" method requires a two-stage context switch (userspace to constantly-mapped-mini-kernel, mini-kernel to full kernel) the "hardware task switching" method reduces this to one stage and elminiates the need for the "mini-kernel". While I've not tested it (yet), it's hard to believe that's going to be any slower.

I doubt any "major" OS will choose to do it this way, since it's only possible on a 32-bit OS (We can speculate that had Meltdown been discovered earlier and hardware task switching become the standard way of dealing with it, it may have been preserved in long mode... Although CPUs would likely have been fixed instead, rendering it unniecissary.), but it's well within the "crazy ideas for a hobby OS" sphere.

_________________
Image


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 10:59 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
Well, its not that there is a separate "mini-kernel"; the IRQ stubs do the same work that they normally do (e.g. push registers, swap segment registers etc.) and just have to do an additional CR3 switch, so there is really no wasted work here. It is hard to believe that a CR3 switch by hardware tasking is so much faster than a CR3 normal CR3 switch that the additional costs of hardware tasking are offset by that. On the contrary, I would expect the hardware tasking (including CR3 switch) to be heavily microcoded and to be much slower than the manual CR3 switch.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 1:22 pm 
Offline
Member
Member
User avatar

Joined: Fri Feb 17, 2017 4:01 pm
Posts: 640
Location: Ukraine, Bachmut
Quote:
Hello everyone,
I have a very simple question: how do I fix the meltdown vulnerability under my OS? I have read that the kernel code(but above all the data in kernel space) should be separated from the user code, but how?
Thank you for your answer.

"should" is a strong word. it might. if you want. to turn your hobby OS into a hell dumb slowpoke to calm down your non-existent "enterprise" users that their private keys are OK. :lol:

seriously, I feel this hype will make more harm by "mitigations" in from of ugly compilers patches and other crutches into the code, than really some security "dangers". not a problem for hobby OSes, forget it! Or, jump into programming non OoO 53rd cortexes and others. Like me! xD

_________________
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 1:27 pm 
Offline
Member
Member
User avatar

Joined: Thu Mar 27, 2014 3:57 am
Posts: 568
Location: Moscow, Russia
In addition to what has been said here also see this (it's about improving KPTI performance with PCID): https://groups.google.com/forum/m/#!top ... 9mHTbeQLNU

_________________
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 1:43 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

mallard wrote:
Korona wrote:
Why is KPTI + hardware switching faster than KPTI + software switching? Are there any benchmarks on that?


It's not "KPTI + hardware switching" it's "KPTI implemented using hardware task switching". I'm not talking about using hardware task switching for ordinary switching between procceses. That's known to be slow. I'm talking about using it to switch from userspace to kernelspace (and vice versa), keeping their page tables seperate.


Hardware task switching is known to be slow regardless of what you use it for - almost everything that can be done with a single micro-code instruction can be done faster with multiple simpler instructions that don't use micro-code; and there are things that hardware task switching does (e.g. managing the "busy" bit for both TSSs) that could be skipped to make doing it in software even faster.

Note that for kernel system calls (but not IRQs) there's a chance the CPU supports SYSENTER or SYSCALL (that avoid GDT accesses and protection checks for CS and SS segment loads) and therefore there's a chance that hardware task switching would be even worse in comparison.

For interrupts (IRQs, exceptions, etc); if the interrupt occurs at CPL=3 you'd have to switch CR3 but if the interrupt occurs at CPL=0 you don't need to change CR3. In this case, you'd probably want to duplicate all of the interrupt handlers to avoid using hardware task switching for the "interrupt occurs at CPL=0" case - e.g. one IDT where all the IDT entries use task gates and a second IDT where all the IDT entries user interrupt/trap gates, with separate "interrupt handling stubs" for each case; where kernel does an "LIDT" to change which IDT CPU is using immediately after any switch from CPL=3 to CPL=0 and immediately before any switch from CPL=0 to CPL=3.

Also don't forget that (as far as I know) "meltdown" doesn't effect AMD CPUs and doesn't effect very old CPUs (Pentium II and later, and all Cyrix, Transmeta, NSC, SiS and IBM CPUs, and at least most of VIA's CPUs); and (because they didn't do "out-of-order") I suspect it doesn't effect the earliest Atom CPUs or the earliest Xeon Phi; and won't effect future Intel CPUs. I'd also assume that all good operating systems will eventually end up with logic to disable the meltdown mitigations for "trusted processes" running on CPUs that are effected. What this means is that you will want some sort of "if(CPU is effected and the process isn't trusted) { enable meltdown mitigation } else { don't enable meltdown mitigation }" logic; and you will want to minimise the differences between "mitigations enabled" and "mitigations disabled" throughout the kernel to minimise the impact on code maintenance.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 5:23 pm 
Offline
Member
Member
User avatar

Joined: Wed Oct 27, 2010 4:53 pm
Posts: 1150
Location: Scotland
How often does an OS actually need to access data that needs to be kept hidden? Wouldn't it be possible for the kernel itself to run under two different sets of page tables, with one of them shutting out any memory that you don't want apps to be able to access through these vulnerabilities? That way, if an interrupt occurs at CPL=3 you wouldn't need a CR3 switch unless the kernel actually needs to access private data, and I suspect that in most cases it doesn't.

_________________
Help the people of Laos by liking - https://www.facebook.com/TheSBInitiative/?ref=py_c

MSB-OS: http://www.magicschoolbook.com/computing/os-project - direct machine code programming


Top
 Profile  
 
 Post subject: Re: How to fix Meltdown on my OS ?
PostPosted: Tue Jan 09, 2018 10:08 pm 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
The problem is you never know what information is classified, and provided the urgency, the best fix would just unmap the whole kernel.

And I see your point, yes in future the kernel can re-map itself on lazy approach.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], SemrushBot [Bot] and 61 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group