OSDev.org https://forum.osdev.org/ |
|
Cyclical GP fault on iretq of timer interrupt in 64 bit OS https://forum.osdev.org/viewtopic.php?f=1&t=33115 |
Page 1 of 1 |
Author: | adrianmay [ Sun Aug 12, 2018 3:39 am ] |
Post subject: | Cyclical GP fault on iretq of timer interrupt in 64 bit OS |
I'm trying to write a 64 bit OS. It throws a GP on iretq from the timer interrupt handler, then repeatedly throws more GPs from the iretq of the GP handler. I know this because my generic handler prints the ISR number on the serial port, and it goes 32, 13, 13, 13, ... The error code for the first GP is 10, which is my data segment. I'm debugging it in qemu, so I can see quite a bit. Here's the situation at the iretq from the timer handler: Code: (gdb) disas isr_common,isr_head_2 Dump of assembler code from 0x8189 to 0x81c4: 0x0000000000008189 <isr_common+0>: callq 0x8125 <sayN100> 0x000000000000818e <isr_common+5>: cmp $0x20,%eax 0x0000000000008191 <isr_common+8>: jl 0x81a8 <isr_common.no_more_acks> 0x0000000000008193 <isr_common+10>: cmp $0x30,%eax 0x0000000000008196 <isr_common+13>: jge 0x81a8 <isr_common.no_more_acks> 0x0000000000008198 <isr_common+15>: cmp $0x28,%al 0x000000000000819a <isr_common+17>: jl 0x81a2 <isr_common.ack_master> 0x000000000000819c <isr_common+19>: push %rax 0x000000000000819d <isr_common+20>: mov $0x20,%al 0x000000000000819f <isr_common+22>: out %al,$0xa0 0x00000000000081a1 <isr_common+24>: pop %rax 0x00000000000081a2 <isr_common.ack_master+0>: push %rax 0x00000000000081a3 <isr_common.ack_master+1>: mov $0x20,%al 0x00000000000081a5 <isr_common.ack_master+3>: out %al,$0x20 0x00000000000081a7 <isr_common.ack_master+5>: pop %rax 0x00000000000081a8 <isr_common.no_more_acks+0>: cmp $0x24,%ax 0x00000000000081ac <isr_common.no_more_acks+4>: pop %rax 0x00000000000081ad <isr_common.no_more_acks+5>: pop %rax => 0x00000000000081ae <isr_common.end+0>: iretq 0x00000000000081b0 <isr_head_0+0>: pushq $0x55 ;DUMMY ERROR CODE 0x00000000000081b2 <isr_head_0+2>: mov $0x0,%eax 0x00000000000081b7 <isr_head_0+7>: push %rax 0x00000000000081b8 <isr_head_0+8>: jmp 0x8189 <isr_common> 0x00000000000081ba <isr_head_1+0>: pushq $0x55 ;DUMMY ERROR CODE 0x00000000000081bc <isr_head_1+2>: mov $0x1,%eax 0x00000000000081c1 <isr_head_1+7>: push %rax 0x00000000000081c2 <isr_head_1+8>: jmp 0x8189 <isr_common> That also shows a couple of "isr_head"s which are entered in the IDT, might push a dummy error code and jmp to isr_common. The stack looks correct to me: Code: (gdb) bt #0 0x00000000000081ae in isr_common.end () #1 0x0000000000008123 in LongMode.Nirv () #2 0x0000000000000010 in ?? () #3 0x0000000000000216 in ?? () #4 0x0000000000015000 in Pd () #5 0x0000000000000010 in ?? () #6 0x000000b8e5894855 in ?? () #7 0x78bf00000332e800 in ?? () #8 0x000003e3e8000000 in ?? () where: Code: 0x0000000000008122 <LongMode.Nirv+0>: hlt 0x0000000000008123 <LongMode.Nirv+1>: jmp 0x8122 <LongMode.Nirv> To be careful: Code: (gdb) info registers rax 0x55 85 rbx 0x80000011 2147483665 rcx 0xc0000080 3221225600 rdx 0x3f8 1016 rsi 0xb 11 rdi 0x3fc 1020 rbp 0x0 0x0 rsp 0x14fd8 0x14fd8 <Pd+36824> r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0x0 0 r13 0x0 0 r14 0x0 0 r15 0x0 0 rip 0x81ae 0x81ae <isr_common.end> eflags 0x97 [ CF PF AF SF ] cs 0x8 8 ss 0x10 16 ds 0x10 16 es 0x10 16 fs 0x10 16 gs 0x10 16 (gdb) x/32xg 0x14f00 0x14f00 <Pd+36608>: 0x0000000000841f0f 0x000000841f0f2e66 0x14f10 <Pd+36624>: 0x00841f0f2e660000 0x1f0f2e6600000000 0x14f20 <Pd+36640>: 0x2e66000000000084 0x0000000000841f0f 0x14f30 <Pd+36656>: 0x000000841f0f2e66 0x00841f0f2e660000 0x14f40 <Pd+36672>: 0x1f0f2e6600000000 0x2e66000000000084 0x14f50 <Pd+36688>: 0x0000000000841f0f 0x000000841f0f2e66 0x14f60 <Pd+36704>: 0x00841f0f2e660000 0x1f0f2e6600000000 0x14f70 <Pd+36720>: 0x2e66000000000084 0x0000000000841f0f 0x14f80 <Pd+36736>: 0x000000841f0f2e66 0x00841f0f2e660000 0x14f90 <Pd+36752>: 0x1f0f2e6600000000 0x2e66000000000084 0x14fa0 <Pd+36768>: 0x0000000000000020 0x0000000000008144 0x14fb0 <Pd+36784>: 0x0000000080000011 0x0000000000000020 0x14fc0 <Pd+36800>: 0x0000000000000020 0x0000000000000020 0x14fd0 <Pd+36816>: 0x0000000000000055 0x0000000000008123 0x14fe0 <Pd+36832>: 0x0000000000000010 0x0000000000000216 0x14ff0 <Pd+36848>: 0x0000000000015000 0x0000000000000010 Now I'll let it run to the GP handler head: Code: (gdb) break isr_head_13 Breakpoint 3 at 0x8236 (gdb) c Continuing. Breakpoint 3, 0x0000000000008236 in isr_head_13 () (gdb) bt #0 0x0000000000008236 in isr_head_13 () #1 0x0000000000000010 in ?? () #2 0x00000000000081ae in isr_common.no_more_acks () #3 0x0000000000000008 in ?? () #4 0x0000000000000097 in ?? () #5 0x0000000000014fd8 in Pd () #6 0x0000000000000010 in ?? () #7 0x0000000000000055 in ?? () #8 0x0000000000008123 in LongMode.Nirv () #9 0x0000000000000010 in ?? () #10 0x0000000000000216 in ?? () #11 0x0000000000015000 in Pd () #12 0x0000000000000010 in ?? () We see that it pushed the error code 0x10 after the usual stack with selector, flags and return address with selector, but the interesting thing is that my dummy error code from the timer (0x55) is back from the dead. We already know it was popped by the first iretq and I didn't push it this time: Code: (gdb) disas isr_head_13 Dump of assembler code for function isr_head_13: => 0x0000000000008236 <+0>: mov $0xd,%eax 0x000000000000823b <+5>: push %rax 0x000000000000823c <+6>: jmpq 0x8189 <isr_common> I guess that's just 16-byte alignment, but I'm not really involved in that. The stack was 16-byte aligned before the timer went off but the CPU pushed an odd number of longlongs. So why would it crash? The Intel docs say that GP with a selector means it tried to pop something out of range, but I see no such problem. Any help much appreciated. |
Author: | Brendan [ Sun Aug 12, 2018 5:51 am ] |
Post subject: | Re: Cyclical GP fault on iretq of timer interrupt in 64 bit |
Hi, adrianmay wrote: So why would it crash? The Intel docs say that GP with a selector means it tried to pop something out of range, but I see no such problem. It's very likely that the stack is messed up when you IRETQ (e.g. forgot to POP something), causing the CPU to complain because (e.g.) the value for CS or SS its trying to load from the stack isn't where the CPU thinks it should be. If that's the problem; then it's extremely unlikely that the compiler would have generated wrong code, which means that it's likely that the problem is in your assembly stubs (and the common interrupt handler if that's also in assembly). Would you mind posting the original assembly source code for the stubs (and the common interrupt handler if that's also in assembly); so we can see the whole thing (and not just fragments excluding "not taken" branches)? Note that this looks wrong: Code: (gdb) disas isr_head_13 Dump of assembler code for function isr_head_13: => 0x0000000000008236 <+0>: mov $0xd,%eax 0x000000000000823b <+5>: push %rax 0x000000000000823c <+6>: jmpq 0x8189 <isr_common> ..because it's modifying RAX before pushing it (causing the original value in RAX from interrupted code to be trashed); but that can't cause a GPF by itself. Cheers, Brendan |
Author: | Octocontrabass [ Sun Aug 12, 2018 6:10 am ] |
Post subject: | Re: Cyclical GP fault on iretq of timer interrupt in 64 bit |
adrianmay wrote: The stack looks correct to me: Code: #2 0x0000000000000010 in ?? () That looks an awful lot like your data selector was in CS when the IRQ occurred, which might explain the initial GPF. Try running your OS in Bochs. Bochs logs a lot of detail by default, including which protection check is causing each GPF. It's often enough to pinpoint the issue, but if not, you can post the log here along with a link to your code. |
Author: | adrianmay [ Mon Aug 13, 2018 5:35 am ] |
Post subject: | Re: Cyclical GP fault on iretq of timer interrupt in 64 bit |
Indeed it was because I had the data segment on the stack where the code seg should have been. When I put 8: in front of an earlier jump the problem went away. Thanks everybody! |
Page 1 of 1 | All times are UTC - 6 hours |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |