Long-mode Kernels and the AMD64 ABI 'Red Zone'

Darwish · Post by **Darwish** » Sat Mar 20, 2010 7:46 am

It has costed me 6 days to debug and fix this, but it was really worth the effort.

It's interesting that no OSDev resources discussed this important topic before. From further inspection, there seems to be a high rate of hobby x86-64 kernels that get their leaf functions stacks silently overriden in case of an interrupt triggered in the right place.

Now to the story: somehow the montonic PIT interrupts that get triggered every 1 millisecond badly corrupted my kernel state. At first, I thought the handler code might have corrupted the kernel stack, but minimizing it only to acknowledging the local APIC:

Code: Select all

		push   %rax
		movq   $(VIRTUAL(APIC_PHBASE) + APIC_EOI), %rax
		movl   $0,(%rax)
		pop    %rax
		iretq

led to the same buggy behaviour.

It was weird. Once I enable interrupts and program the PIT to fire at a high rate, things go insane: random failed assert()s and page-fault exceptions get triggered all over the place. I even minimized the handler code more by ditching the IOAPIC and using the PIC in Automatic EOI mode. This has led to the absolute architecturally minimum x86 IRQ handler of:

Code: Select all

		iretq

but nothing really changed: the same ugly symptomps prevailed.

After days and days of disassembly and hex dumps, I found that GCC generated this assemly for memcpy() at -O0:

Code: Select all

		ffffffff80109c88:       55                      push   %rbp
		ffffffff80109c89:       48 89 e5                mov    %rsp,%rbp
		[snip]

		/* Bochs magic breakpoint */
		ffffffff80109caa:       66 87 db                xchg   %bx,%bx

		/* Our manual software interrupt */
		ffffffff80109cad:       cd f0                   int    $0xf0

		/* Failing code, specially last line */
		ffffffff80109caf:       48 8b 45 f8             mov    -0x8(%rbp),%rax
		ffffffff80109cb3:       0f b6 10                movzbl (%rax),%edx
		ffffffff80109cb6:       48 8b 45 f0             mov    -0x10(%rbp),%rax
		ffffffff80109cba:       88 10                   mov    %dl,(%rax)

Noticed something? Look again, especially at how the x86 ops accessed the stack. Yes, GCC kept parts of the leaf function local state below the stack pointer. Now when the interrupt was soft-triggerd, the CPU rightfully pushed CPU counter, status word, and stack pointer (%ss, %rsp, %rflags, %cs, %rip) which meant that parts of the kernel state got corrupted through the implicit CPU stack usage!

Now certainly the generated code is interrupts unsafe. Scanning the AMD64 ABI document for any paragraphs that mentioned the stack, the reason was found: it's the red zone. The zone is a 128-byte area, below the stack, mandated by the x86-64 ABI to be safe for use to leaf functions. It's also safe for higher level functions to use before they call any other function, where they'll need to 'reserve' the used parts of the zone beforehand by moving the stack further down.

All what was needed to fix the bug, like a magic pill, was instructing GCC not to use this x86-interrupts-unsafe zone:

Code: Select all

		-mno-red-zone

Note that the bug was much easier to trigger at -O0 due to -O0's heavy stack usage. At -O2 and -O3, the bug got triggered with much less frequency.

Everything became sane afterwards: the heavy test cases now works well while the PIT is firing rapidly at all possible optimization levels. I would really like to thank Brendan for advising me to further investigate the issue using Bochs binary single-stepping debugger when I was stuck

torshie · Post by **torshie** » Sat Mar 20, 2010 9:20 am

Very good article. Maybe put it in wiki?
I'm lucky. After reading the ABI, I add the -mno-red-zone flag to g++. You know why? I just follow linux, and didn't really know the difference between "red zone" and "no red zone"

nedbrek · Post by **nedbrek** » Sat Mar 20, 2010 10:43 am

Is this kernel or user code? If user code, are you not using a separate stack for ring 0?

Darwish · Post by **Darwish** » Sat Mar 20, 2010 12:09 pm

nedbrek wrote:Is this kernel or user code? If user code, are you not using a separate stack for ring 0?

That's purely kernel context; I don't have user-space support yet.

Darwish · Post by **Darwish** » Sat Mar 20, 2010 12:23 pm

torshie wrote:Very good article. Maybe put it in wiki?

Added here, with extra flags needed to disable emitting SSE ops.

torshie · Post by **torshie** » Sun Mar 21, 2010 12:18 am

Darwish wrote:
torshie wrote:Very good article. Maybe put it in wiki?
Added here, with extra flags needed to disable emitting SSE ops.

You can enable SSE ops by setting bit OXFXSR(9) of CR4, of course you need to save SSE registers before entering an interrupt handler.

quanganht · Post by **quanganht** » Sun Mar 21, 2010 12:42 am

great post! This is the first time I know about 'red zone'

xmm15 · Post by **xmm15** » Mon Dec 16, 2013 6:57 pm

Arrrrrrrggh!!! This has driven me insane for the past two weeks.

Yes, I know that I am resurecting an old thread but this one needs to be pinned for everyone to see. That red-zone thing drove me insane, and I just found out about it when I decompiled a leaf function. To my surprise there was no sub xxx,%rsp at the begining of the function. I wasn't sure what therms to type in google but finally I found the answer (because google did return quite a lot of results when asking "why gcc doesn't substract anything from rsp to create stack frame", we're not in 1997 anymore.)

But let me ask this: how can interrupts guarantee that this 128bytes red zone will be safe? the CPU will obviously crush the first 8 bytes when saving RIP because jumping to the handler right?

thepowersgang · Post by **thepowersgang** » Mon Dec 16, 2013 7:20 pm

It's standard to disable the red zone in kernel code, either that or you make sure to provide the red zone before calling the C code.

xmm15 · Post by **xmm15** » Mon Dec 16, 2013 8:31 pm

thepowersgang wrote:either that or you make sure to provide the red zone before calling the C code.

I'm not exactly sure what you mean. How would you do that? by changing the value of RSP before calling the c code?
Even then, what happens if an interrupt is generated once you are in the c code? I don't think it will be any less dangerous, am I right?

The only way I can think of right now is by running the handlers in a different ring level than the rest of the code so that the handlers would have their own stack. So since the handlers usually would run in ring 0, it would mean that there is absolutely no other options than to disable the red zone with the compiler flag for the kernel code. Unless there is a way to tell the CPU to sub 128 from rsp when calling an int.

thepowersgang · Post by **thepowersgang** » Mon Dec 16, 2013 9:07 pm

Sorry, actually, I'd forgotten what the red zone really was. You disable red zone in kernel code exactly because of this (the CPU will write within this zone when an interrupt fires)

sortie · Post by **sortie** » Tue Dec 17, 2013 6:05 pm

xmm15 wrote:Arrrrrrrggh!!! This has driven me insane for the past two weeks.

I share your pain. This drove me insane for three months. Surely you know already by now to compile your kernel code and kernel libc code (if such exists) with -mno-red-zone. Note that if you are using libgcc, that libgcc is built with the red zone enable, so there's an infinitesimal chance that some odd call into that blows up, though there such calls would happen on x86_64 normally. You may well wish to actually implement the red zone, though it may well be difficult because the x86_64 architecture is silly. I do remember some interrupt stack switching mechanism that may be exploited for your needs.

GloverTex · Post by **GloverTex** » Thu Jan 07, 2016 10:34 am

thepowersgang wrote:Sorry, actually, I'd forgotten what the red zone really was. You disable red zone in kernel code and read this review of PhenQ exactly because of this (the CPU will write within this zone when an interrupt fires)

Where do you place the code? Sorry if it's a dumb question but I'm a struggling newb lol.

mallardest · Post by **mallardest** » Thu Jan 07, 2016 10:52 am

GloverTex wrote:All what was needed to fix the bug, like a magic pill, was instructing GCC not to use this x86-interrupts-unsafe zone:
Code: Select all
      -mno-red-zone
Where do you place the code? Sorry if it's a dumb question but I'm a struggling newb lol.

It's a compiler flag. You need to pass it as a parameter to gcc.

SpyderTL · Post by **SpyderTL** » Fri Jan 08, 2016 5:14 pm

This has come up several times in the past. It's actually mentioned on the Calling Conventions page, in a footnote. But I'm all for adding more references to it to the Wiki.

edit: just realized how old this thread was..

OSDev.org

Long-mode Kernels and the AMD64 ABI 'Red Zone'

Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'

Re: Long-mode Kernels and the AMD64 ABI 'Red Zone'