OSDev.org

The Place to Start for Operating System Developers
It is currently Tue Mar 19, 2024 12:01 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: Adding new code breaks loading the kernel
PostPosted: Sat Jan 14, 2023 10:39 am 
Offline

Joined: Sat Jan 14, 2023 9:55 am
Posts: 3
Hi, I've been reading around here for the past couple months while learning about OS internals.

Goal: Write a basic, minimal, x86 bootloader and 32bit kernel to fulfill my curiosity about OS internals. (full time job is in no way related, it's full on python).
Code: https://github.com/eliaonceagain/edu-x86-bootloader-kernel

Checkpoints
- Real mode 16bit bootloader
- load gdt
- setup video mode
- enable protected mode
- setup interrupts
- setup tss
- create processes
- scheduler to round robin created processes on every clock interrupt
- enable paging

All the above "works". And I'm writing "works" because I'm sure there are stuff that are misconfigured but are magically working.

Current problem:
Adding any new C code makes the kernel not load.
Stage 1 bootloader (src/bootloader.asm) makes far jump to stage 2 (src/kernel_init.asm) and it remains stuck there.
New code could be as simple as:
Code:
echo "void helloworld(){}" > src/filler.c

Suspect
Using bochs gui I've managed to pinpoint this to the instruction that loads the tss
src/kernel_init.asm -> setup_task_register -> ltr ax -> hang

Reproduction
Code:
git clone https://github.com/EliaOnceAgain/edu-x86-bootloader-kernel.git
cd edu-x86-bootloader-kernel && echo "void helloworld(){}" > src/filler.c
make clean && make run

Requesting help
It seems that the core code is not stable enough but I'm missing a direction to follow.
Some questions in random order:
- Why is adding new code breaks loading the kernel?
- How to properly configure TSS? (tss segment defined in src/gdt.asm)
- Should I setup a stack in different way? (currently it's set in src/bootloader.asm as starting from 0x7C00 downwards)

Any tips / suggestions / changes related or unrelated to the above questions would be highly appreciated

Thanks,
Elia


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 8:45 am 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 670
How big is your kernel when it stops loading properly. Looking at your bootloader you read a maximum of 15 sectors (7.5KiB). Any chance as your kernel has grown that it exceeds that? I'd try building your kernel but there are multiple redefinition errors when linking.

I see the multiple redefinitions is because you define global objects in your header files. If you load such a header file into more than one file those objects will be redefined in each C file that uses it. Put definitions of global objects in the .C file and an `extern` declaration in the header.

As an example in process.h you have these definitions:
Code:
process_t *process_table[15];
int processes_count, curr_pid;
Declare them extern instead:
Code:
extern process_t *process_table[15];
extern int processes_count, curr_pid;
Then in process.c you can define them somehwere after you include process.h like this:
Code:
#include "process.h"
#include "vsa.h"        /* vsa_t, alloc()                                   */

process_t *process_table[15];
int processes_count, curr_pid;
There is a similar problem in paging.c/paging.h and scheduler.c/scheduler.h


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 9:32 am 
Offline
Member
Member

Joined: Fri Apr 08, 2022 3:12 pm
Posts: 54
Debugger is your friend. I use gdb. While it is a bit of pain to use in realmode you can make your life easier with some predefined macros.
Checking the actual state in memory is the best way to see what happened. And then single stepping in loading process to see what's happening in real time. In my bootloader I put signature at the end of the loader, defined in linker script .signature which helped me do a quick check in memory if all contents of the memory was loaded.

Stack should be set so you don't overwrite your data as it grows down and doesn't hit relevant section in memory (such as BIOS area at 0:400h).

There are way more qualified people here who could help you; I kept my notes of sort of standard memory layout in my pmbr code.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 9:58 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4591
Location: Chichester, UK
As Michael says, problems when you add code almost certainly mean that you are not loading the whole kernel.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 11:43 am 
Offline

Joined: Sat Jan 14, 2023 9:55 am
Posts: 3
Thank you for the input, I will work on sorting out the multiple redefinitions.

Meanwhile, the size of the non-zero area in the kernel binary is <5kb so I expect 15sectors to cover it

While debugging further, encountered this:
Code:
00017824976i[BIOS ] Booting from 0000:7c00
00017923971e[CPU0 ] LTR: doesn't point to an available TSS descriptor!
00017923971e[CPU0 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
00017923971e[CPU0 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
00017923971i[CPU0 ] CPU is in protected mode (active)
00017923971i[CPU0 ] CS.mode = 16 bit
00017923971i[CPU0 ] SS.mode = 16 bit
00017923971i[CPU0 ] EFER   = 0x00000000
00017923971i[CPU0 ] | EAX=60000028  EBX=00001c00  ECX=00092000  EDX=00000018
00017923971i[CPU0 ] | ESP=00007bd8  EBP=00007bfe  ESI=000e7ca9  EDI=0000ffac
00017923971i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af PF cf
00017923971i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00017923971i[CPU0 ] |  CS:0900( 0004| 0|  0) 00009000 0000ffff 0 0
00017923971i[CPU0 ] |  DS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  SS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  ES:0900( 0005| 0|  0) 00009000 0000ffff 0 0
00017923971i[CPU0 ] |  FS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  GS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] | EIP=0000025d (0000025d)
00017923971i[CPU0 ] | CR0=0x60000011 CR2=0x00000000
00017923971i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00017923971e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
00017923971i[SYS  ] bx_pc_system_c::Reset(HARDWARE) called
00017923971i[CPU0 ] cpu hardware reset


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 11:58 am 
Offline
Member
Member

Joined: Fri Apr 08, 2022 3:12 pm
Posts: 54
I only glanced at your code but I see gdt_size_in_bytes has to be (6*8)-1. Similar correction is needed for IDT.
Some exceptions do push error code and may require different cleanup upon exit, you are probably not iret-ing properly from some of them.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 1:06 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5069
You load your kernel to 0x9000, and your kernel is linked to run at 0x9000, but you jump to 0x0900:0x0000 to run it. You've already added several hacks to try to work around this discrepancy (using "- start" to fix addresses) when you could have instead jumped to 0x0000:0x9000 and avoided the problem entirely.

In your bootloader, you store DL (the boot drive) into memory using a memory reference relative to DS before you've set DS.

In load_gdt you disable interrupts, then in init_video_mode you call INT 0x10 which may return with interrupts enabled.

In enable_protected_mode you set CR0.PE without immediately loading CS with a new valid code selector. You must not place any instructions between the MOV that sets CR0.PE and the instruction that sets CS. It's also important to load data segment registers with protected mode data segments before using them to access memory in protected mode.

In remap_pic you enable all IRQs, including IRQs you are not yet prepared to handle.

You never set the upper bits of ESP.

Your kernel_main() returns, but start_kernel is not prepared to handle a return.

There may be other issues, I didn't look at everything.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 1:40 pm 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 670
Octocontrabass wrote:
There may be other issues, I didn't look at everything.
The entering protected mode, doing `ret` and calling other functions in quasi 16-bit protected mode stood out to me.

Another issue I noticed is that the TSS itself is only a DWORD in size.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Tue Jan 17, 2023 3:24 pm 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 670
I had a few more minutes to look at things. I think I can understand why adding a new file (in your case) when it appears the kernel isn't exceeding 7.5KiB yet (another issue that will bite you later) could make things fail where they might have worked previously (by luck).

The other issue is this - In your `makefile` you have:
Code:
$(LNK) $(LDFLAGS) $(BUILDDIR)/*.$(OBJEXT) -o $(BUILDDIR)/kernel.elf
In general there isn't anything wrong with this but you are at the mercy of the of the order of object files returned by the file system. Since you are ultimately generating a BINARY file the entry point will be the first code in memory loaded at 0x0900:0x0000. The problem is that there isn't a guarantee that `kernel_init.o` is listed first with your linker command line. That code you want to be executed before anything else. If the first object file happens to be `filler.o` that will start first, and probably `ret` into no mans land.

To fix this you can alter your `linker.ld` file to ensure the `.text` section of obj/kernel_init.o is always first. So this should probably fix that issue:
Code:
.text 0x09000 :
  {
    code = .; _code = .; __code = .;
    obj/kernel_init.o(.text)
    *(.text)
  }
This way no matter what order the objects are listed when linking, the .text (code) section of obj/kernel_init.o will be first.

Octocontrabass previously commented with a number of things that should be fixed. But once you start getting into Ring 3 with interrupts/exceptions you are going to need a proper TSS structure. For the moment you have it defined as a DWORD and that is quite a substantial problem for user mode (ring 3). I wrote a Stackoverflow question/answer about the TSS here https://stackoverflow.com/questions/54876039/creating-a-proper-task-state-segment-tss-structure-with-and-without-an-io-bitm that contains information, a structure, and some external links. The TSS has a bit of a sordid history.


Top
 Profile  
 
 Post subject: Re: Adding new code breaks loading the kernel
PostPosted: Mon Jan 23, 2023 12:32 pm 
Offline

Joined: Sat Jan 14, 2023 9:55 am
Posts: 3
Thank you, everyone. Appreciate your insights and comments. It was of tremendous help.
I fixed (most) of the mentioned points, and still have to dive deeper into TSS / IDT to make sure they behave correctly.

Quote:
The problem is that there isn't a guarantee that `kernel_init.o` is listed first with your linker command line

This was it. In alphabetical order it always came first.

Cheers,
Elia

PS. always a pleasure to read your SO answers MichaelPetch.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: belliash, Google [Bot], Majestic-12 [Bot], SemrushBot [Bot] and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group