nullplan wrote:
I know that Linux uses lazy FPU restore, and this seems prudent, since apart from a slight simplification of the task-switching code, eager FPU restore doesn't really help anything, and lazy FPU restore makes for faster code (nothing is faster than the line not executed, right?). So I would argue for eager FPU save and lazy FPU restore. Any thoughts on this?
The Linux kernel developers found that lazy FPU restore was actually more expensive in most cases, due to the popularity of SSE optimizations in compilers and standard libraries. And thanks to CVE-2018-3665, eager FPU restore is now the default everywhere.
nullplan wrote:
So I'm stuck between a rock (allocating too much memory, but always being able to tell the user what is going on) and a hard place (allocating too little memory, and if push comes to shove, suddenly killing processes).
How many threads do you expect to have running at once? I think you're disproportionately worried about the memory use.
nullplan wrote:
Only a handful of processes is ever going to use AVX-512, so why save and restore those registers all the time?
Failing to "restore" empty ZMM registers will introduce a significant performance penalty, even if you've disabled their use in XCR0. I suspect the extra work to conditionally save/restore ZMM registers while taking this into account will be more expensive than unconditionally saving/restoring everything and letting the CPU decide how to optimize.
nullplan wrote:
So, how do you guys deal with FPU in your OSes?
With not a single line of relevant code written, this may change in the future, but the current plan for x64 is to use eager FPU save/restore since most tasks will be using SSE anyway. For actually performing the save/restore, the different methods will most likely be preferred in this order: XSAVES, XSAVEOPT, XSAVEC, XSAVE, FXSAVE.