OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 4:44 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 1 post ] 
Author Message
 Post subject: Qemu Raspberry Pi 3B Adding/Removing NOPs Results in UDF
PostPosted: Mon May 01, 2023 11:49 am 
Offline

Joined: Mon May 01, 2023 6:24 am
Posts: 1
Edit: I've solved this issue. The problem was around how Qemu behaves depending on how you tell it which image to run. In order to get all 4 cores running and avoid the infinite loop at the `ret` instruction, I had to replace `-kernel ./build/kernel8.elf` with `-device loader,file=./kernel8.img`. This post made it clear what each flag does: https://stackoverflow.com/questions/584 ... 7#58434837

Hello. I've been working through https://github.com/s-matyukevich/raspberry-pi-os to learn bare metal programming on the Raspberry Pi and have run into a issue. One of the exercises for lesson 01 is to get your code running on all 4 cores, which I've done (using some help from exercise solutions), but I ran into an odd problem:

I ran the code and it worked fine with on all 4 cores and I get the correct output:

Code:
❯ qemu-system-aarch64 -M raspi3b -kernel ./build/kernel8.elf -smp 4 -serial null -serial stdio
primary: 0
secondary: 1
secondary: 2
secondary: 3


The primary and secondary strings are sent over the uart with Qemu printing them to stdout.

I should note, that I'm passing
Code:
-kernel kernel8.elf
to qemu to it starts running all of my code on all 4 cores. I've read that passing an ELF file causes the boot process to align with the expectations used by aarch64 Linux, so perhaps all of this can be solved by passing in the IMG file and having my code explicitly start all 4 cores. I also don't have access to a real Pi at the moment, so I haven't tested this on real hardware.

I later went to make small changes and noticed that adding/removing opcodes (even nop or mov x0, x0) would cause the code to get caught in an infinite loop and never send output to the uart. I've done some digging, but am not sure where else to look. While adding/removing nops in my code base might be one way to work around this, it feels like I've done something wrong and figuring this out is a great learning opportunity.

The multi-core uart printing works with this code at the top of `_start`:
Code:
.globl _start
_start:
    nop
    nop
    mrs   x0, mpidr_el1


and fails if I make this change:
Code:
.globl _start
_start:
    nop
    nop
    nop // Including this line will cause an exception
    mrs   x0, mpidr_el1


Here is my code in its current state where it runs and sends the text to the uart for every core: https://github.com/seusher/rpiosdev/blo ... src/boot.S
The README in /lesson01/ has the commands I'm using to build, run, and debug the code. boot.S has 3 `nop` codes beneath `_start`, with one being commented out. None of those nops are required, but I've been using them to test various conditions. Uncommenting the 3rd `nop` results in the unexpected (to me) behavior. It also doesn't matter where the nops are, and the failure occurs even when I add/remove opcodes throughout the file.

There are two cases:

1. Failure case (uncommenting //nop)
2. Success case (leaving //nop commented out)

I've done some digging with lldb and have found a few things:

1. In all cases, when lldb connects and the qemu execution is stopped, the cores are all waiting at `uart_send()`, which is the first line of code in the `.text` section, so it shouldn't surprise me that it starts there, but I expeced it to start at 0x00 where the `.text.boot` code is. The code in `.text.boot` does eventually get executed in the success case and breakpoints get hit in that code.

2. In the failure case, the cores get to the `ret` opcode at the bottom of `uart_init()` and jump to an unknown address (x30 is updated right before by `ldp x29, x30, [sp], #32`), which then causes an exception, and the next opcode executed is a `UDF` at 0x0000000000000000.

3. In the success case, the cores get to the `ret` opcode at the bottom of `uart_init()` and jump to an unknown address (for the same reason as before), but the next step is that the PC jumps to 0x200, which is back in the `uart_init()` code. I'm not sure why the behavior is different.

I've included a bunch of other debugging notes in boot.S to help me stay organized, so perhaps someone with more experience in this will be able to see the problem immediately.

At this point, I'm not quite sure how to move forward in tracking down how adding a nop instruction changes the behavior to this extent. My suspicions are that the solution isn't actually working properly even in the success case, and somehow adding in this extra cycle is making that apparent, but I'd love some help to figure out where to go from here.

Thanks!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Majestic-12 [Bot] and 66 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group