OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Apr 19, 2024 7:55 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: That pesky FPU again
PostPosted: Mon Dec 30, 2019 5:05 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1604
Hi all,

I was just designing the task-switching for my OS, and as part of that, there is the question of FPU handling. That is, the handling of any register set besides the general purpose ones (for simplicity referred to as FPU from here on). First, there is the question of laziness or eagerness. In an age of increasing logical CPU counts, lazy FPU save seems like a bad idea, since a sleeping task with an unsaved FPU cannot be transferred to another CPU. So lazy FPU save would make the scheduler more complicated, and it already is plenty complicated.

The other thing is FPU restoration. I know that Linux uses lazy FPU restore, and this seems prudent, since apart from a slight simplification of the task-switching code, eager FPU restore doesn't really help anything, and lazy FPU restore makes for faster code (nothing is faster than the line not executed, right?). So I would argue for eager FPU save and lazy FPU restore. Any thoughts on this?

And finally, there is the matter of X86's implementation of this stuff. It is horrible. So there are at least three implementations, FSAVE, FXSAVE and XSAVE, and the latter is also extensible. My kernel is AMD64-only, so FSAVE can be ignored. My idea at the moment is to determine the size of the XSAVE area at boot time, and allocate one XSAVE area behind each task descriptor (that is, when allocated, the task descriptor will be sandwiched between the kernel stack and the XSAVE area), but the problem is that this burdens every process with the memory necessary to save FPU, but they may not need it, or at least not need all of it. And thanks to AVX-512, the register file is getting to be quite big.

However, I cannot allocate memory in the FPU fault handler, because that might fail. And if it does I have to kill the current process. Which is bad because I have no interface to tell the user that he process died for running out of memory. If allocation fails during task creation, I do. So I'm stuck between a rock (allocating too much memory, but always being able to tell the user what is going on) and a hard place (allocating too little memory, and if push comes to shove, suddenly killing processes).

More interesting still would be the question how to deal with all the different components of the XSAVE area. Only a handful of processes is ever going to use AVX-512, so why save and restore those registers all the time? My current design is to enable all the bits I know of in XCR0 at boot time and then just save and restore everything, but another possibility would be to change XCR0 on task switch, and only enable those parts the process is actually using. But that is only saving on run time, and as the Intel SDM explains, that is not going to be a lot, since the XSAVE and XRESTORE instructions already optimize for those parts of the state that are unchanged.

So, how do you guys deal with FPU in your OSes?

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Mon Dec 30, 2019 10:50 am 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5137
nullplan wrote:
I know that Linux uses lazy FPU restore, and this seems prudent, since apart from a slight simplification of the task-switching code, eager FPU restore doesn't really help anything, and lazy FPU restore makes for faster code (nothing is faster than the line not executed, right?). So I would argue for eager FPU save and lazy FPU restore. Any thoughts on this?

The Linux kernel developers found that lazy FPU restore was actually more expensive in most cases, due to the popularity of SSE optimizations in compilers and standard libraries. And thanks to CVE-2018-3665, eager FPU restore is now the default everywhere.

nullplan wrote:
So I'm stuck between a rock (allocating too much memory, but always being able to tell the user what is going on) and a hard place (allocating too little memory, and if push comes to shove, suddenly killing processes).

How many threads do you expect to have running at once? I think you're disproportionately worried about the memory use.

nullplan wrote:
Only a handful of processes is ever going to use AVX-512, so why save and restore those registers all the time?

Failing to "restore" empty ZMM registers will introduce a significant performance penalty, even if you've disabled their use in XCR0. I suspect the extra work to conditionally save/restore ZMM registers while taking this into account will be more expensive than unconditionally saving/restoring everything and letting the CPU decide how to optimize.

nullplan wrote:
So, how do you guys deal with FPU in your OSes?

With not a single line of relevant code written, this may change in the future, but the current plan for x64 is to use eager FPU save/restore since most tasks will be using SSE anyway. For actually performing the save/restore, the different methods will most likely be preferred in this order: XSAVES, XSAVEOPT, XSAVEC, XSAVE, FXSAVE.


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Mon Dec 30, 2019 1:46 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
I only use fsave on demand. No applications use SSE anyway, simply because the compiler was made before SSE and so doesn't use them.


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Mon Dec 30, 2019 2:25 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1604
rdos wrote:
I only use fsave on demand. No applications use SSE anyway, simply because the compiler was made before SSE and so doesn't use them.
So, no SMP support then? Or do you have an FSAVE IPI?

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Mon Dec 30, 2019 2:52 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
nullplan wrote:
rdos wrote:
I only use fsave on demand. No applications use SSE anyway, simply because the compiler was made before SSE and so doesn't use them.
So, no SMP support then? Or do you have an FSAVE IPI?


You need to handle SSEs (if you use them) on single core too, so it is not only an SMP issue. It gets more complicated when threads that use the FPU can be migrated to different cores. I solve it by having different logic on single core and multiple cores. If the operating system is running on multiple cores, FPU state is always saved in the context switch if the thread used the FPU since the last switch. If the operating system is running on a single core, the TS flag is set if the thread used the FPU, and lazy saving is used.


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Fri Jan 03, 2020 6:35 pm 
Offline
Member
Member

Joined: Sat Feb 27, 2010 8:55 pm
Posts: 147
Octocontrabass wrote:
nullplan wrote:
So I'm stuck between a rock (allocating too much memory, but always being able to tell the user what is going on) and a hard place (allocating too little memory, and if push comes to shove, suddenly killing processes).

How many threads do you expect to have running at once? I think you're disproportionately worried about the memory use.

Nullpplan, I'm not a big fan of the whole write-huge-memory-hog-programs-just-because-we-have-lots-of-ram-now thinking, but even I gotta agree with Octo here. I see where you're coming from; it's a bit stressful to think we've gone from needing a few dozen bytes to save the state to needing 1000 or so bytes when you get to AVX-512 -- it's such an extreme increase! Nevertheless, even my oldest computer -- a 486 from the early 90s -- has more than enough ram to justify a kilobyte for every task.

Octocontrabass wrote:
Failing to "restore" empty ZMM registers will introduce a significant performance penalty, even if you've disabled their use in XCR0.


I'm really lost here; how does not restoring empty regs cause a penalty? Especially if they're disabled?


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Sat Jan 04, 2020 4:42 am 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5137
azblue wrote:
I'm really lost here; how does not restoring empty regs cause a penalty? Especially if they're disabled?

Intel SDM volume 3A section 13.5.3 wrote:
XSAVE state components can subsequently be disabled in XCR0. However, disabling state components of AVX or AVX-512 that are not in initial configuration may incur power and performance penalty on SSE and AVX instructions respectively. If AVX state is disabled when it is not in its initial configuration, subsequent SSE instructions may incur a penalty. If AVX-512 state is disabled when it is not in its initial configuration, subsequent SSE and AVX instructions may incur a penalty. It is recommended that the operating systems and VMM set AVX or AVX-512 state components to their initial configuration before disabling them.

Based on the description in the manual, it seems pretty safe to assume that the penalty is due to a data dependency on the contents of the upper parts of the registers.

If the CPU throttles as a result of the increased power usage, the penalty could also affect non-vector code running on another core.


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Sat Jan 04, 2020 7:06 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1604
I do have one question remaining, then. If saving FPU is so cheap, both in terms of time and memory, why then do CPUs provide these mechanisms to be notified of FPU activity? Is that all just a vestige? It's not just on x86, too, I've seen similar interrupts on other architectures as well. If saving and restoring FPU on task switch unconditionally is both simpler and safer, why the effort?

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Sat Jan 04, 2020 8:30 am 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5137
It does still take time. If applications go enough context switches without using the FPU, there comes a point where lazy save/restore will be faster.

For x64, where every CPU is guaranteed to have SSE2 and compilers are smart enough to use it, applications usually do not go very long without using the FPU. This is the situation where eager save/restore makes the most sense.

Most other CPU architectures don't guarantee a FPU will be present, so an operating system may need to optimize for the case where most applications don't use the FPU often or at all. Additionally, some workloads may not be able to take advantage of the FPU, so there can be pressure to minimize task switch costs while still allowing use of the FPU.


Top
 Profile  
 
 Post subject: Re: That pesky FPU again
PostPosted: Mon Jan 06, 2020 5:21 pm 
Offline
Member
Member

Joined: Sat Feb 27, 2010 8:55 pm
Posts: 147
Octocontrabass wrote:
azblue wrote:
I'm really lost here; how does not restoring empty regs cause a penalty? Especially if they're disabled?

Intel SDM volume 3A section 13.5.3 wrote:
XSAVE state components can subsequently be disabled in XCR0. However, disabling state components of AVX or AVX-512 that are not in initial configuration may incur power and performance penalty on SSE and AVX instructions respectively. If AVX state is disabled when it is not in its initial configuration, subsequent SSE instructions may incur a penalty. If AVX-512 state is disabled when it is not in its initial configuration, subsequent SSE and AVX instructions may incur a penalty. It is recommended that the operating systems and VMM set AVX or AVX-512 state components to their initial configuration before disabling them.

Based on the description in the manual, it seems pretty safe to assume that the penalty is due to a data dependency on the contents of the upper parts of the registers.

If the CPU throttles as a result of the increased power usage, the penalty could also affect non-vector code running on another core.


Crazy! Sometimes I wish it was still the 90s when things were simpler, lol! Thank you for the info!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot], MichaelPetch, SemrushBot [Bot] and 160 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group