LtG wrote:
rdos wrote:
I always inline "neutral" code that generates an exception, and then patches it in the exception handler to whatever code the setup uses.
The "neutral" code is #UD or something like "int 0x80"?
But what's the benefit of doing it that way? Compared to putting it in the places the binary image (ELF, PE, what ever) explicitly requests? Don't you also need padding to account for possibly longer "syscall" sequences later?
Also isn't that like self-modifying code? I'm not against it, but won't it then interfere with an application that itself wants to do self-modifying of its own (usually problematic when two entities are modifying the same thing)..? I understand that you might not want to support self-modifying so for you it's a moot point, but from theory perspective why pick a solution that is more limiting with no extra benefits?
Simply put, is there anything that makes that solution better than the others?
No, it is not a simple int x code.
It's like this:
Code:
db 55h
db 67h
db 9Ah
dd gate_nr
dw 3
db 5Dh
When disassembled, it will be like this:
Code:
push ebp
call far 0003:gate_nr
pop ebp
The middle instruction will GP fault because of an access to null selector, and then typically is patched to an interrupt:
Code:
db 55h
db 0CDh
db 9Ah
dd gate_nr
dw 3
db 5Dh
Resulting code:
Code:
push ebp
int 9Ah
dd gate_nr
dw 3
pop ebp
The re-execution of the faulting instruction will then execute int 9A, which will go to the kernel int 9A handler. This handler will set the gate number to 0 and the selector to the new call-gate (and create it if it already didn't exist). Lastly, it will overwrite the int opcode with 90h:
Code:
db 55h
db 90h
db 9Ah
dd 0
dw gate-sel
db 5Dh
Corresponding to:
Code:
push ebp
nop
call gate-sel:0
pop ebp
The reason for the two-step process is to make this process multicore safe. The fault handlers are a little more complex because of this. Because it is multicore-safe, multiple cores can call this code at the same time, and both will end up calling the correct syscall.
The reason the initial code is a far-call is so the application debugger knows that it is a call (and thus is traceble), and so it can set a breakpoint after it to execute it. Invalid opcodes don't have these features.
When patching for sysenter instead, the code will be modified like this in the second stage (again, the int opcode is patched last):
Code:
db 55h
db 90h
db 0E8h
dd OFFSET app-stub
dw 9090h
db 5Dh
In code:
Code:
push ebp
nop
call app_stub
nop
nop
pop ebp
One app stub needs to be created in user-space per syscall, and it looks like this:
Code:
push ecx
push edx
mov ecx,esp
mov edx,gate_nr
sysenter
pop edx
pop ecx
ret
As can be seen, this is less efficient, but so is the use of sysenter. A note is that the OS also must implement an int 0E8h handler that is a copy of the int 9A handler, to handle the case where a core executes the int instruction before it is modified but after the vector is modified.