Writing your own Kernel Debugger (and using Windows)

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

Hi,

For reasons I decided to try and write a kernel debugger for my kernel project. It's not a general tool as much as a "project", but it turned out to be remarkably interesting and a lot easier than I imagined.
In case anyone is interested in this sort of thing I did a little writeup of it (and the code is public):

https://github.com/jarlostensen/joKDbg/ ... umentation

Part of the reason why I did this is that I insist on doing my kernel development on Windows, which means I don't have all the nice facilities for debugging that Linux provides. If anybody else has done or is doing something like this, in a more serious and proper way, then please let me know.

Disclaimer: this is not production code, it's just a fun project, but it works.

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

Just for reference; the core of the debugger in the kernel is (I think) quite simple and handled entirely inside the int3/1/fault ISR:

Code:

//this is the core of the debugger and handles int3, int1, faults, and breakpoints.
static void _debugger_isr_handler(interrupt_stack_t * stack) {
    
    if ( debugger_is_connected() ) {
        
        debugger_breakpoint_t* bp = 0;
        
        // -------------------------------------- running in debugger
        switch ( stack->handler_id ) {
            case 1: // TRAP 
            case 3: // breakpoint
            {               
                // see below: we have to restore breakpoint instructions 
                // whenever we pass a dynamic (runtime) breakpoint and that is done here.
                // _last_rt_bp is set to the last bp location and used to restore it
                if ( _last_rt_bp._active ) {

                    uint8_t instr = ((uint8_t*)_last_rt_bp._at)[0];
                    _last_rt_bp._instr_byte = instr;
                    // reset to int 3
                    ((uint8_t*)_last_rt_bp._at)[0] = _BREAKPOINT_INSTR;
                    _last_rt_bp._active = false;

                    // if we're not in the middle of a genuine trace we can clear TF
                    // if we don't do this then a runtime bp followed by a trace/single-step command 
                    // will result in the program continuning execution immediately, instead 
                    // of waiting in the debugger loop. 
                    if (_last_command != kDebuggerPacket_TraceStep) {
                        _CLEAR_TF(stack);
                        // we're done here
                        return;
                    }
                }
                
                // check if we've hit a programmatic breakpoint
                bp = _debugger_breakpoint_at(stack->rip - 1);
                if ( bp && bp->_active ) {
                    //_JOS_KTRACE_CHANNEL(kDebuggerChannel, "hit programmatic bp @ 0x%llx", bp->_at);

                    // re-set instruction byte
                    ((uint8_t*)bp->_at)[0] = bp->_instr_byte;
                    // back up so that we'll execute the full original instruction next
                    --stack->rip;
                }

                if ( !bp || bp->_active ) {
                    //_JOS_KTRACE_CHANNEL(kDebuggerChannel, "breakpoint hit at 0x%016llx", bp->_at);
                    
                    debugger_packet_bp_t bp_info;
                    _fill_in_debugger_packet(&bp_info, stack);
                    
                    // unwind the call stack
                    // here we just look up stack entries to see if they point to executable code.
                    // the debugger will check these for actual call sites
                    task_context_t* this_task = tasks_this_task();
                    uint64_t* rsp = (uint64_t*)stack->rsp;
                    const uint64_t* stack_end = (const uint64_t*)this_task->_stack_top;
                    
                    static vector_t callstack;
                    static bool callstack_initialised = false;
                    if (callstack_initialised) {
                        // re-use
                        vector_reset(&callstack);
                    } else {
                        vector_create(&callstack, 16, sizeof(uint64_t), _allocator);
                        callstack_initialised = true;
                    }
                    while (rsp < stack_end) {
                        if (peutil_phys_is_executable(_pe_ctx, *rsp, 0)) {
                            vector_push_back(&callstack, (void*)rsp);
                        }
                        ++rsp;
                    }
                    
                    bp_info._call_stack_size = (uint16_t)vector_size(&callstack);
                    debugger_send_packet(kDebuggerPacket_Breakpoint, &bp_info, sizeof(bp_info));
                    
                    if (bp_info._call_stack_size) {
                        debugger_send_packet(kDebuggerPacket_BreakpointCallstack, vector_data(&callstack), bp_info._call_stack_size * sizeof(uint64_t));
                    }
                }
            }
            break;
            case 6: // UD#
            {
                debugger_packet_bp_t bp_info;
                _fill_in_debugger_packet(&bp_info, stack);
                debugger_send_packet(kDebuggerPacket_UD, &bp_info, sizeof(bp_info));
            }
            break;
            case 13: // #GPF
            {
                debugger_packet_bp_t bp_info;
                _fill_in_debugger_packet(&bp_info, stack);
                debugger_send_packet(kDebuggerPacket_GPF, &bp_info, sizeof(bp_info));
            }
            break;
            case 14: // #PF
            {
                _debugger_packet_bp_t bp_info;
                _fill_in_debugger_packet(&bp_info, stack);
                debugger_send_packet(kDebuggerPacket_PF, &bp_info, sizeof(bp_info));
            }
            break;
            default:;
        }
        
        // enter loop waiting for further instructions from the debugger, exit when we get kDebuggerPacket_Continue 
        _debugger_loop(stack);
        
        if ( bp && bp->_active && !bp->_transient ) {
            // if we are coming out of a runtime breakpoint that's still active we need to re-set it
            _last_rt_bp._active = true;
            _last_rt_bp._at = stack->rip;
            _last_rt_bp._instr_byte = bp->_instr_byte;
            // make sure we trap immediately after this instruction again so that we can restore it
            _SET_TF(stack);
        }
        else {
            if (bp->_transient) {
                // the bp was transient; remove it completely
                _remove_breakpoint(bp);
            }
            _last_rt_bp._active = false;
        }
    }    
}

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

and....the debugger_loop code which handles commands between the debugger and kernel while we're in "breakpoint" mode;

Code:

#define _DISABLE_BP_IF_ACTIVE(bp)\
if (bp->_active) {\
    ((uint8_t*)bp->_at)[0] = bp->_instr_byte;\
        bp->_active = false;\
}
#define _RESTORE_BP_IF_INACTIVE(bp)\
if (!bp->_active) {\
    bp->_instr_byte = ((uint8_t*)bp->_at)[0];\
        ((uint8_t*)bp->_at)[0] = _BREAKPOINT_INSTR;\
            bp->_active = true;\
}

// wait for debugger commands.
// if isr_stack == 0 this will not allow continuing or single stepping (used by asserts)
static void _debugger_loop(interrupt_stack_t * isr_stack) {   

    if ( !debugger_is_connected() )
        return;

    //_JOS_KTRACE_CHANNEL(kDebuggerChannel, "entering debugger, stack is 0x%llx", isr_stack);
   
    bool continue_run = false;
    while(!continue_run) {
        debugger_serial_packet_t packet;
        debugger_read_packet_header(&packet);
        _last_command = packet._id;

        switch(packet._id) {
            case kDebuggerPacket_UpdateBreakpoints:
            {
                // NOTE:
                // this assumes that the debugger and kernel maintain a synchronised list of breakoints
                // and that the debugger sends the full list of breakpoints whenever something changes.
                // if the list of breakpoints is large (100's) then this is not efficient but that is an optimisation problem 
                // to be addressed later - if at all required.

                debugger_packet_breakpoint_info_t info;
                int num_packets = packet._length / sizeof(info);
                if (num_packets == 0) {
                    // clear all breakpoints

                    // first restore all instruction bytes
                    const size_t num_bps = vector_size(&_breakpoints);               
                    for (size_t bpi = 0; bpi < num_bps; ++bpi) {
                        debugger_breakpoint_t* bp = (debugger_breakpoint_t*)vector_at(&_breakpoints, bpi);                  
                        _DISABLE_BP_IF_ACTIVE(bp);
                    }               
                    vector_clear(&_breakpoints);
                }
                else {
                    // update specific breakpoints
                    void* packet_buffer = _allocator->alloc(_allocator, packet._length);
                    _JOS_ASSERT(packet_buffer);
                    debugger_read_packet_body(&packet, packet_buffer, packet._length);
                    debugger_packet_breakpoint_info_t* bpinfo = (debugger_packet_breakpoint_info_t*)packet_buffer;
                    
                    // _JOS_KTRACE_CHANNEL(kDebuggerChannel, "updating %d breakpoints", num_packets);
                    
                    while (num_packets-- > 0) {

                        debugger_breakpoint_t* bp = _debugger_breakpoint_at(bpinfo->_at);
                        if (bp) {
                            switch (bpinfo->_edc) {
                                case _BREAKPOINT_STATUS_ENABLED:
                                    _RESTORE_BP_IF_INACTIVE(bp);
                                    break;
                                case _BREAKPOINT_STATUS_DISABLED:
                                    _DISABLE_BP_IF_ACTIVE(bp);
                                    break;
                                case _BREAKPOINT_STATUS_CLEARED:
                                {
                                    _DISABLE_BP_IF_ACTIVE(bp);
                                    _remove_breakpoint(bp);
                                }
                                break;
                                default:;
                            }
                        }
                        else {
                            // new breakpoint, just add it to the list
                            debugger_breakpoint_t new_bp = { 
                                ._at = bpinfo->_at, 
                                ._instr_byte = ((uint8_t*)bpinfo->_at)[0], 
                                ._active = bpinfo->_edc == _BREAKPOINT_STATUS_ENABLED 
                            };
                            if (new_bp._active) {
                                ((uint8_t*)bpinfo->_at)[0] = _BREAKPOINT_INSTR;
                            }                     
                            vector_push_back(&_breakpoints, &new_bp);
                            
                            //_JOS_KTRACE_CHANNEL(kDebuggerChannel, "adding new breakpoint @ 0x%llx", bpinfo->_at);
                        }
                    }
                    
                    _allocator->free(_allocator, packet_buffer);               
                }
            }
            break;
            case kDebuggerPacket_ReadTargetMemory:
            {                
                //_JOS_KTRACE_CHANNEL("debugger", "kDebuggerPacket_ReadTargetMemory");
                debugger_packet_rw_target_memory_t rt_packet;
                debugger_read_packet_body(&packet, (void*)&rt_packet, packet._length);
                //_JOS_KTRACE_CHANNEL("debugger", "kDebuggerPacket_ReadTargetMemory 0x%llx, %d bytes", rt_packet._address, rt_packet._length);
                if( rt_packet._length ) {                    
                    // serialise directly from memory
                    debugger_send_packet(kDebuggerPacket_ReadTargetMemory_Resp, (void*)rt_packet._address, rt_packet._length);
                }
            }
            break;            
            case kDebuggerPacket_WriteTargetMemory:
            {
                debugger_packet_rw_target_memory_t rt_packet;
                debugger_read_packet_body(&packet, (void*)&rt_packet, sizeof(rt_packet));
                //TODO: sanity checks!
                if( rt_packet._length ) {
                    // serialise directly to memory
                    serial_read(kCom1, (char*)rt_packet._address, rt_packet._length);
                }
            }
            break;
            case kDebuggerPacket_TraversePageTable:
            {
                debugger_packet_page_info_t page_info_packet;
                debugger_read_packet_body(&packet, (void*)&page_info_packet, sizeof(page_info_packet));
                debugger_packet_page_info_resp_t resp_packet;
                resp_packet._address = page_info_packet._address;
                pagetables_traverse_tables((void*)page_info_packet._address, resp_packet._entries, 4);
                debugger_send_packet(kDebuggerPacket_TraversePageTable_Resp, (void*)&resp_packet, sizeof(resp_packet));
            }
            break;
            case kDebuggerPacket_RDMSR:
            {
                debugger_packet_rdmsr_t rdmsr_packet;
                debugger_read_packet_body(&packet, (void*)&rdmsr_packet, sizeof(rdmsr_packet));
                debugger_packet_rdmsr_resp_t resp_packet;
                resp_packet._msr = rdmsr_packet._msr;
                uint32_t lo, hi;
                x86_64_rdmsr(rdmsr_packet._msr, &lo, &hi);
                resp_packet._lo = lo;
                resp_packet._hi = hi;
                debugger_send_packet(kDebuggerPacket_RDMSR_Resp, (void*)&resp_packet, sizeof(resp_packet));
            }
            break;
            case kDebuggerPacket_TraceStep:
            {
                if ( isr_stack ) {
                    // switch on the trap flag so that it will trigger on the next instruction after our iret
                    _SET_TF(isr_stack);
                    continue_run = true;
                }
            }
            break;
            case kDebuggerPacket_SingleStep:
            {
                if ( isr_stack ) {
                    // check if the next instruction is indeed something to skip, i.e. a call
                    ZydisDecodedInstruction instruction;
                    if (ZYAN_SUCCESS(ZydisDecoderDecodeBuffer(&_zydis_decoder, (void*)isr_stack->rip, INTEL_AMD_MAX_INSTRUCTION_LENGTH, &instruction)) ) {
                        if ( instruction.mnemonic == ZYDIS_MNEMONIC_CALL ) {
                            // we can skip this instruction so we'll set a bp after it and continue execution
                            debugger_breakpoint_t* bp = _set_breakpoint(isr_stack->rip + instruction.length);
                            // this is a TRANSIENT breakpoint, i.e. it will be removed as soon as it's hit
                            bp->_transient = true;
                            _CLEAR_TF(isr_stack);
                        }
                        else {
                            // if it's not a CALL we just treat it as a normal single instruction step (and we will in fact pretend it is)                     
                            _last_command = kDebuggerPacket_TraceStep;
                            _SET_TF(isr_stack);
                        }
                        continue_run = true;
                    }
                }
            }
            break;
            case kDebuggerPacket_Continue:
            {
                if ( isr_stack ) {
                    _CLEAR_TF(isr_stack);
                    continue_run = true;
                }
            }
            break;
            default:
            {
                _JOS_KTRACE_CHANNEL(kDebuggerChannel, "unhandled packet id %d, length %d", packet._id, packet._length);
            }
            break;
        }
    }
}

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Yeah, the barebone of a debugger is pretty simple. Things are getting more problematic when you want to set up breakpoints in different processes and single-step, you want it to be multiplatform, and you realize so many things are not covered by the debugger's protocol, so you better off writing your own (gdb does not support dumping of physical memory, the paging tables, etc.).

My debugger is like this: the platform-independent code (approx. 1100 SLoC), the platform-dependent hooks (approx. 200 SLoC), and the disassembler (approx. 1500 SLoC), and looks like this (interface also available through VT terminal on serial port): screenshot.

For a simple and easy to be integrated, multiplatform debugger which has a disassembler, I'd recommend minidbg (MIT licensed, the debugger as well as the disassembler).

Cheers,
bzt

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3194

I can use my application debugger to trace into kernel. I have a debugger stub that runs in user-mode and which uses syscalls to implement the debugger functions. While the debugger is running usermode code, it uses the normal debug interface defined for applications. When a debugged thread traces into a syscall, the normal single step trap interrupt will block the thread and mark it as "debugged" (by adding it to a specific scheduling list). The debug stub will detect this and signal to the debugger that a mode switch happened. The debugger I use is the ordinary Watcom debugger, and it runs on Windows over TCP/IP. The memory access function in the debugger will allow inspecting & modifying kernel data. It will also allow source-level debugging of kernel device drivers. This is useful when these are C based, which a minority are. I cannot directly hook into kernel, but to debug random code I link it to the "test" syscall, and then run the test application that uses the test syscall. This way I can debug many initialization scenarios, like how the HID device parses descriptors, or how the HDA device handles codec configurations.

I have a pure kernel monitor too, but it is standalone running in it's own process in kernel mode. It can show the register state of all threads that are blocked as "debugged". It can also trace, pace and run them. There are functions to view memory (and modify it) as well as a function to view & modify physical memory. This is useful for assembly based drivers, but a bit awkward for C based. The debug monitor also have a disassembler.

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

Very, cool, and thanks for the confirmation that writing your own is a realistic alternative.
I'll definitely use yours as a reference point for further work (I quite enjoyed doing this so I'll probably keep at it, I really want source line debugging support too).

bzt wrote:

Yeah, the barebone of a debugger is pretty simple. Things are getting more problematic when you want to set up breakpoints in different processes and single-step, you want it to be multiplatform, and you realize so many things are not covered by the debugger's protocol, so you better off writing your own (gdb does not support dumping of physical memory, the paging tables, etc.).

My debugger is like this: the platform-independent code (approx. 1100 SLoC), the platform-dependent hooks (approx. 200 SLoC), and the disassembler (approx. 1500 SLoC), and looks like this (interface also available through VT terminal on serial port): screenshot.

For a simple and easy to be integrated, multiplatform debugger which has a disassembler, I'd recommend minidbg (MIT licensed, the debugger as well as the disassembler).

Cheers,
bzt

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

You're basing it on this? https://open-watcom.github.io/open-watcom-v2-wikidocs/pguide.html Can you share some more info on how you're doing that, have you just written your own kernel side back end to hook into it?
TCP/IP is a bit too soon for me, I've got serial but no network stack in my kernel.

rdos wrote:

I can use my application debugger to trace into kernel. I have a debugger stub that runs in user-mode and which uses syscalls to implement the debugger functions. While the debugger is running usermode code, it uses the normal debug interface defined for applications. When a debugged thread traces into a syscall, the normal single step trap interrupt will block the thread and mark it as "debugged" (by adding it to a specific scheduling list). The debug stub will detect this and signal to the debugger that a mode switch happened. The debugger I use is the ordinary Watcom debugger, and it runs on Windows over TCP/IP. The memory access function in the debugger will allow inspecting & modifying kernel data. It will also allow source-level debugging of kernel device drivers. This is useful when these are C based, which a minority are. I cannot directly hook into kernel, but to debug random code I link it to the "test" syscall, and then run the test application that uses the test syscall. This way I can debug many initialization scenarios, like how the HID device parses descriptors, or how the HDA device handles codec configurations.

I have a pure kernel monitor too, but it is standalone running in it's own process in kernel mode. It can show the register state of all threads that are blocked as "debugged". It can also trace, pace and run them. There are functions to view memory (and modify it) as well as a function to view & modify physical memory. This is useful for assembly based drivers, but a bit awkward for C based. The debug monitor also have a disassembler.

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3194

psjarlo wrote:

You're basing it on this? https://open-watcom.github.io/open-watcom-v2-wikidocs/pguide.html

Yes, that's the tool chain I'm using.

psjarlo wrote:

Can you share some more info on how you're doing that, have you just written your own kernel side back end to hook into it?

No, I use an user process to start the debug server process. Unlike more typical OSes, the debugger doesn't freeze the application (or core) that is debugged, but instead the scheduler blocks it on a debug list. This can happen when the debugged application is in user space, and then the normal debug interface is activated, or in kernel space, in which case the current register state is saved in the thread control block and then the scheduler picks another thead to run. The debug server has a list of active threads in the application, and can switch between those and trace them. Tracing in kernel is done by setting the TF in the register state and then unblocking the thread. The debugged thread will then execute one instruction, get an single step exception and then the single step exception handler will block the thread again. Breakpoints in kernel space are handled with hardware breakpoints, as writing an int 3 instruction to kernel space is not a good idea. This limits kernel debugging to four breakpoints per thread (actually three since one is used when pacing).

psjarlo wrote:

TCP/IP is a bit too soon for me, I've got serial but no network stack in my kernel.

I can run it through a serial port too, or use the local RDOS version of the debugger, but I prefer to use TCP/IP.

To make it work, I have adapted the Watcom debugger to understand RDOS executables, both applications and device drivers. The toughest change was to implement per-thread blocking instead of freezing the whole application, which is standard in the debugger.

Ethin · **Posted:** Sun Jul 04, 2021 1:43 pm

psjarlo wrote:

Very, cool, and thanks for the confirmation that writing your own is a realistic alternative.
I'll definitely use yours as a reference point for further work (I quite enjoyed doing this so I'll probably keep at it, I really want source line debugging support too).

bzt wrote:

Yeah, the barebone of a debugger is pretty simple. Things are getting more problematic when you want to set up breakpoints in different processes and single-step, you want it to be multiplatform, and you realize so many things are not covered by the debugger's protocol, so you better off writing your own (gdb does not support dumping of physical memory, the paging tables, etc.).

My debugger is like this: the platform-independent code (approx. 1100 SLoC), the platform-dependent hooks (approx. 200 SLoC), and the disassembler (approx. 1500 SLoC), and looks like this (interface also available through VT terminal on serial port): screenshot.

For a simple and easy to be integrated, multiplatform debugger which has a disassembler, I'd recommend minidbg (MIT licensed, the debugger as well as the disassembler).

Cheers,
bzt

My (only) fault with this debugger implementation is that it uses a lot of C-specific stuff (and things that I as a Rust programmer would consider quite *unsafe*). E.g.: It uses `goto`, which makes it a bit troublesome to port to other languages. I could probably hack together aport if I tried hard enough though. Rust doesn't have goto (though I still to this day don't really understand why, its a pretty useful construct), but it does have an awesome and blazingly fast x86 disassembler that I'd use instead of the handwritten one.
Additionally looking at the code it also defines the uint8_t, uint16_t, ..., types manually instead of using stdint.h. But those are the main issues that I have with it. Its a good implementation to get going with, its just not very portable across other languages without a lot of back-bending, so to speak.

psjarlo · **Joined:** Wed Sep 04, 2019 9:51 am **Posts:** 13

Thanks for the tips both, I've got some ground to cover before I am at the level your debugger efforts are but it's so far both been a very interesting effort and helped me find bugs in my nascent kernel.

My main effort now is getting source-level debugging working. I'm working on an UEFI kernel the executable is PE and the debug information PDB and this is one reason where Python helps with the many modules and relative ease getting things working (PDB is a not a well documented format and it has a very particular way of storing information.)
I do have the ability to look up variables in memory and dump their contents with type information which is really useful and I know the source line->instruction offset information is in the PDB so it's just a matter of time before I manage to pry it open.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Ethin wrote:

It uses `goto`, which makes it a bit troublesome to port to other languages.

First, `goto` is a valid ANSI C keyword, it is perfectly fine to use it (just do not overuse it, which is also true for any other language feature). Second, it's mostly used in sprintf to make it compact, so if your preferred language supports sprintf there's no reason to port that function in the first place. The one and only "goto dis" outside of sprintf could be avoided easily by duplicating the "disassemble bytecode" block (lines 347 and 355 to 365). That's no more than 11 additional SLoC.
Under no circumstances would I call this "troublesome".

Ethin wrote:

it does have an awesome and blazingly fast x86 disassembler that I'd use instead of the handwritten one.

I've just checked, iced has dependencies, even some Google code, and it's over 1 Mbytes in size. My implementation works for AArch64 too, not handwritten, it is generated by a script for speed and compact size, and is just ca. 40Kbytes. (Feel the difference: 1024K vs. 40K, even x86 and ARM disassemblers combined is no more than 187K)

Ethin wrote:

Additionally looking at the code it also defines the uint8_t, uint16_t, ..., types manually instead of using stdint.h.

Yeah, because not all bare metal projects have stdint.h. For userspace code you can rely on stdint (either as a header file or as a compiler built-in, but it must exists), but for freestanding mode it depends on the compiler (as there might be no include files at all, unless you compile a cross-compiler with sysroot support, and gcc might have a built-in version of that header but other compilers might not). If this bothers you so much, just replace the typedefs with an include, I've used the standard names so this should be no prob, this is hardly a roadblock for porting.

Ethin wrote:

its just not very portable across other languages without a lot of back-bending, so to speak.

I'm not so sure about that, but granted, being easily portable to other languages was never its goal, being usable without dependency in any C project was.

Cheers,
bzt

Ethin · **Posted:** Tue Jul 06, 2021 12:00 pm

bzt wrote:

Ethin wrote:

It uses `goto`, which makes it a bit troublesome to port to other languages.

First, `goto` is a valid ANSI C keyword, it is perfectly fine to use it (just do not overuse it, which is also true for any other language feature). Second, it's mostly used in sprintf to make it compact, so if your preferred language supports sprintf there's no reason to port that function in the first place. The one and only "goto dis" outside of sprintf could be avoided easily by duplicating the "disassemble bytecode" block (lines 347 and 355 to 365). That's no more than 11 additional SLoC.
Under no circumstances would I call this "troublesome".

Did I ever say that it wasn't a perfectly legitimate usage of ANSI C? I've used it before -- I know exactly what goto does and I'm okay with its use. It just makes it a bit more troublesome to port it to other language because it requires violating the DRY principal when those languages don't possess that keyword.

bzt wrote:

Ethin wrote:

it does have an awesome and blazingly fast x86 disassembler that I'd use instead of the handwritten one.

I've just checked, iced has dependencies, even some Google code, and it's over 1 Mbytes in size. My implementation works for AArch64 too, not handwritten, it is generated by a script for speed and compact size, and is just ca. 40Kbytes. (Feel the difference: 1024K vs. 40K, even x86 and ARM disassemblers combined is no more than 187K)

The difference is that's a single x86 disassembler and it supports a lot more functionality than yours does. Yours might support more architectures but that one supports more functionality specific to x86: all the various different ISA differences that've occurred over the years, all the output formats, etc. The compiler is smart enough to eliminate dead code, and that crates size is controllable via crate features. The various features specify what's included:

Decoder: enable instruction decoding/disassembly
Encoder: enable instruction encoding/reassembly
block_encoder: enables the block encoder, which also enables the encoder
op_code_info: enables the retrieval and examination of instruction opcodes
instr_info: enables retrieval and examination of full instructions
gas, intel, masm, nasm: enables an instruction disassembly format
fast_fmt: enables the fast formatting routines, speeding up formatting by at least 3.3x
std: enables depending on the standard library
exhaustive_enums: enables exhaustive enumerations (covering all possible values). Definitely increases code size
no_vex, no_evex, no_xop, no_d3now: disables various instruction subsets

So, as you can see, the disassembler is highly customizable and you can configure precisely what you want. The compiler will take care of the rest. Toggling a crate feature determines whether the associated code is emitted at all. It is a direct analog to the C preprocessors conditional expressions. The reason your disassembler is smaller is because it is highly tuned for your use-case. The iced disassembler, on the other hand, is not only more generalized, but is written with a cross-language architecture in mind.

bzt wrote:

Ethin wrote:

Additionally looking at the code it also defines the uint8_t, uint16_t, ..., types manually instead of using stdint.h.

Yeah, because not all bare metal projects have stdint.h. For userspace code you can rely on stdint (either as a header file or as a compiler built-in, but it must exists), but for freestanding mode it depends on the compiler (as there might be no include files at all, unless you compile a cross-compiler with sysroot support, and gcc might have a built-in version of that header but other compilers might not). If this bothers you so much, just replace the typedefs with an include, I've used the standard names so this should be no prob, this is hardly a roadblock for porting.

Have you ever seen a compiler that does not include or generate stdint.h? I would consider it a defect in the compiler if it didn't for cross-compilation purposes. The C standard mandates that uint[8/16/32/64]_t be there equivalent bit widths, but does not mandate such conditions for [signed/unsigned] char, short, long, and long long. Though it is unlikely, it is still a gamble to depend on the fact that those types will be the bit widths that you expect. (e.g. On a RISC-V system, unsigned long long may be 128 bits, not 64.) See section 6.2.5 of C18 for more info, as well as footnotes 38, 39, and 40.

bzt wrote:

Ethin wrote:

its just not very portable across other languages without a lot of back-bending, so to speak.

I'm not so sure about that, but granted, being easily portable to other languages was never its goal, being usable without dependency in any C project was.

Cheers,
bzt

Fair enough.
Edit: to clarify: the op_code_info and instr_info features explicitly enable retrieving and examining extra information about the instructions that are decoded. (And, yes, there are *a lot* of functions that one can use on an instruction -- see the InstructionInfo and the OpcodeInfo struct for the info that these features enable, and see the Instruction struct for a list of all the functions that are available on individual instructions.) I could very easily see the iced-x86 crate being used in, say, professional disassemblers or debuggers (or, hell, hobby disassemblers/debuggers, even), if only because it allows a deep-dive look at instructions, as well as in-place modification of them and moving them around. So the additional reason its so large is because its an assembler and disassembler in one package. Granted, other disassemblers offer the same level of analytical analysis, but you have to admit that this is definitely a neat project.

nexos · **Joined:** Tue Feb 18, 2020 3:29 pm **Posts:** 1071

bzt wrote:

Yeah, because not all bare metal projects have stdint.h.

According to ISO/IEC C, all freestanding and hosted libraries have stdint.h. If a compiler doesn't provide stdint.h, then its breaking ISO C, and should be ignored.
Your debugger is great, @bzt, nice and simple. I guess, @Ethin, it wasn't ever designed to support Rust.
Also, large code doesn't really mean its "bad". Listen, I'm all for KISS, but if a feature will add lots of functionality and speed, then I'm all for it. Remember, less code != more speed.

Ethin · **Posted:** Tue Jul 06, 2021 12:39 pm

nexos wrote:

bzt wrote:

Yeah, because not all bare metal projects have stdint.h.

According to ISO/IEC C, all freestanding and hosted libraries have stdint.h. If a compiler doesn't provide stdint.h, then its breaking ISO C, and should be ignored.
Your debugger is great, @bzt, nice and simple. I guess, @Ethin, it wasn't ever designed to support Rust.
Also, large code doesn't really mean its "bad". Listen, I'm all for KISS, but if a feature will add lots of functionality and speed, then I'm all for it. Remember, less code != more speed.

Precisely. The relevant clauses of ISO C (C18) are clauses 1-9 of section 6.2.5.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Ethin wrote:

Did I ever say that it wasn't a perfectly legitimate usage of ANSI C? I've used it before -- I know exactly what goto does and I'm okay with its use. It just makes it a bit more troublesome to port it to other language because it requires violating the DRY principal when those languages don't possess that keyword.

Ahhhhhh. My issue was that you called that "troublesome". Just for you, here's a patch that removes that single goto.

Code:

        if(dbg_cmd[0]=='n') {
-           dbg_regs[31]=disasm(dbg_regs[31]?dbg_regs[31]:dbg_regs[30],NULL);
-           dbg_cmd[0]='i';
-           goto dis;
+           os=oe=dbg_regs[31]=disasm(dbg_regs[31]?dbg_regs[31]:dbg_regs[30],NULL);
+           while(os<=oe) {
+                a=os;
+                os=disasm(os,str);
+                dbg_printf("%8x:",a);
+                for(j = 32; a < os; a++, j -= 3) {
+                    dbg_printf(" %02x",*((unsigned char*)a));
+                }
+                for(; j > 0; j--)
+                    dbg_printf(" ");
+                dbg_printf("%s\n",str);
+            }
+            continue;
        } else

And voilá, dbg_main is entirely, 100% goto-free. This is the modification that you called "troublesome". And for the sprintf, as I've said you can pick any implementation that you like, you're not tied to that minimal implementation minidbg is shipped with. You probably already have a printf / sprintf in your klib anyway.

Ethin wrote:

The difference is that's a single x86 disassembler and it supports a lot more functionality than yours does.

That's exactly why it sucks big time. A disassembler has one, and only one job: decode instruction into a mnemonic without using further libraries or allocating memory. That's all.

Ethin wrote:

So, as you can see, the disassembler is highly customizable and you can configure precisely what you want.

Functionality that nobody needs and nobody wants. Say what happens if you forget to configure something in and it won't be able to disassemble instructions because of that? See? Useless and just asks for trouble. A disassembler should decode instructions, as many as possible, with as little effort and memory footprint and dependencies as possible. Every dependency and configuration complexity is just a punch to the face of portability.

Ethin wrote:

Have you ever seen a compiler that does not include or generate stdint.h?

Of course I have. MSVC likes to use its own defines instead of the standard ones. But to gave you an exact example within OSDev realm, you don't have stdint.h neither the uint*_t defines in an UEFI environment (yes, you can integrate my debugger into an UEFI loader too). But just for you, I've added ifdef guards around the typedefs.

Ethin wrote:

but you have to admit that this is definitely a neat project.

I can't. It is exactly that kind of bloated software full with useless and unnecessary features and additional library dependencies that I can't stand. Seriously, iced depends on Google Hashing? WTF? And does iced require dynamic memory allocation? I bet it does, so how do you call it from an ISR? Or what would you do if you have to debug the memory allocation in your kernel? (Which is a valid use case, and don't tell me that Rust has its own allocator, because the Linux guys right now having a really hard time replacing Rust "alloc" with a kernel-provided one because the standard crate is buggy as hell and causes kernel panic.)

Plus there's really no need for op_code_info and instr_info and all the other stuff in a debugger. That's why code analysis is a compile time C header generation feature in my disassembler, because I never turn this feature on. (But if you want, you can generate a version of my disassembler that outputs detailed instruction info into a JSON string without any additional dependencies, it will add a few kilobytes to the code, still not megabytes.)

Cheers,
bzt

OSDev.org

Writing your own Kernel Debugger (and using Windows)

Who is online