OSDev.org
https://forum.osdev.org/

[AArch64 / Bare metal] Need help with CPU communication
https://forum.osdev.org/viewtopic.php?f=1&t=33444
Page 1 of 1

Author:  dublevsky [ Fri Jan 18, 2019 11:36 am ]
Post subject:  [AArch64 / Bare metal] Need help with CPU communication

Been trying to solve this for a week, decided to reach out for help.

What I have is code running on bare metal (RPI Model B 3+).
I'm trying to initialize every CPU with general stuff and then wait for CPU0 to zero out the BSS (and stuff like MMU setup in the future).
After CPU0 initialized all the stuff it needed to it is supposed to release secondary CPUs and then every single CPU jumps into the kernel by calling kmain.
kmain containts very primitive waiting function (for now, just to check if other CPUs get there) and prints out every CPU's id.
The problem is only CPU0 gets to kmain.

start.S
Code:
#include "asm/macros.h"
#include "arch/arch.h"
#include "board/spec.h"

.section .bss.stack
        .align 8
        .skip ARCH_STACK_SIZE * BOARD_NUM_CPUS
DATA(___stack_end)
   
.section .data
        .align 8
DATA(cpu_barrier)
        .long 1

.section .text
cpuid .req x9

FUNCTION(_start)
        // ----------------------------------------
        // Initialization to carry out on every CPU
        // ----------------------------------------

        // Find out which CPU we are running at
        mrs cpuid, mpidr_el1
        and cpuid, cpuid, #0xff

        // Set up the stack
        adr x0, ___stack_end
        ldr x1, =ARCH_STACK_SIZE
        mul x1, x1, cpuid
        sub sp, x0, x1

        // -----------------------------------
        // Initialization to carry out on CPU0
        // -----------------------------------
        cbnz cpuid, .Lwait_for_primary_cpu

        // Zero out the bss section
        // Note: relies on ___bss and ___bss_end being 16 byte aligned
        adr x0, ___bss
        adr x1, ___bss_end
        sub x1, x1, x0
        cbz x1, .Lbss_init_done
.Lbss_init_loop:
        stp xzr, xzr, [x0], #16
        sub x1, x1, #16
        cbnz x1, .Lbss_init_loop
.Lbss_init_done:

        // Release secondary cpus
        adr x0, cpu_barrier
        str xzr, [x0]
        b .Lwait_for_primary_cpu_done

        // Wait for primary cpu
.Lwait_for_primary_cpu:
        adr x0, cpu_barrier
.Lwait_for_primary_cpu_loop:
        ldr x1, [x0]
        cbnz x1, .Lwait_for_primary_cpu_loop
.Lwait_for_primary_cpu_done:

        // Jump into the kernel
.Lkernel_entry:
        mov x0, cpuid
        bl kmain
   
.Lhang:
        wfe
        b .Lhang


asm/macros.h

Code:
#ifndef INCLUDE_ASM_MACROS_H
#define INCLUDE_ASM_MACROS_H

#define FUNCTION(x)             .global x; .type x, STT_FUNC; x:
#define DATA(x)                 .global x; .type x, STT_OBJECT; x:
   
#define LOCALFUNCTION(x)        .type x, STT_FUNC; x:
#define LOCALDATA(x)            .type x, STT_OBJECT; x:
   
#endif /*INCLUDE_ASM_MACROS_H*/


kmain.c
Code:
#include <stdint.h>
#include "peripherals/mu.h"
   
static void wait(const uint64_t c)
{
        for (uint64_t i = 0; i < c; ++i) {
                __asm__ volatile("nop");
        }
}

void kmain(const uint64_t cpuid)
{
        if (cpuid) {
                wait(1000000 * cpuid);
        } else {
                mu_init(9600);
        }
        mu_putc((char)cpuid + '0');
}


link64.ld
Code:
ENTRY(_start)

SECTIONS
{
   . = 0x80000;

   .text : {
      *(.text)
   }

   .rodata : {
      *(.rodata)
   }

   .data : {
      *(.data)
   }

   .bss : {
      . = ALIGN(8);
      *(.bss.stack)
      . = ALIGN(16);
      ___bss = .;
      *(.bss)
      . = ALIGN(16);
      ___bss_end = .;
   }
}


Would love some help, because I'm going crazy at this point.

Author:  Octocontrabass [ Fri Jan 18, 2019 1:48 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

How are you telling all of the CPUs to jump to _start?

Author:  zaval [ Fri Jan 18, 2019 1:52 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

I honestly haven't even touched multiprocessing stuff yet, so hardly I'd be helpful, but, seriously, looking at your code, I am wondering - why do you think secondary CPUs are even running? Where is it seen? They won't run just because your bootstrap cpu writes 0 into some variable, you need to wake them up first! And it all goes to the way it's done on RPi. With all that VC things... who knows. But I guess your secondary CPUs aren't running. Firmware starts on CPU0, your code takes control over on it too and that's all. No secondary CPUs on the scene. Learn more on secondary CPU bring up for RPi.

Author:  dublevsky [ Fri Jan 18, 2019 1:57 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Octocontrabass wrote:
How are you telling all of the CPUs to jump to _start?


zaval wrote:
I honestly haven't even touched multiprocessing stuff yet, so hardly I'd be helpful, but, seriously, looking at your code, I am wondering - why do you think secondary CPUs are even running? Where is it seen? They won't run just because your bootstrap cpu writes 0 into some variable, you need to wake them up first! And it all goes to the way it's done on RPi. With all that VC things... who knows. But I guess your secondary CPUs aren't running. Firmware starts on CPU0, your code takes control over on it too and that's all. No secondary CPUs on the scene. Learn more on secondary CPU bring up for RPi.


RPI bootloader does the stuff it needs then every single core enters _start.

Author:  Octocontrabass [ Fri Jan 18, 2019 2:18 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Which bootloader are you using that sends every CPU to _start? The official bootloaders only start one CPU and leave the others halted.

Author:  dublevsky [ Fri Jan 18, 2019 2:23 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Octocontrabass wrote:
Which bootloader are you using that sends every CPU to _start? The official bootloaders only start one CPU and leave the others halted.


I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.

Author:  nullplan [ Fri Jan 18, 2019 3:02 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Cache coherency problems? Is it possible the other cores never see the update to cpu_barrier? Do you need a barrier in that loop and at the point where you write the variable?

Author:  dublevsky [ Fri Jan 18, 2019 3:13 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

nullplan wrote:
Cache coherency problems? Is it possible the other cores never see the update to cpu_barrier? Do you need a barrier in that loop and at the point where you write the variable?


Neither MMU nor I/D caches are enabled yet.

Author:  Octocontrabass [ Fri Jan 18, 2019 3:24 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

dublevsky wrote:
I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.

That code uses "kernel_old=1" in config.txt to bypass the boot stub.

You are not using "kernel_old=1" in your config.txt, so the firmware's default boot stub is running (or armstub8.bin from your SD card), and that boot stub is halting all but one of the CPUs.

Author:  dublevsky [ Fri Jan 18, 2019 3:29 pm ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Octocontrabass wrote:
dublevsky wrote:
I'm using the official RPi one. Pretty sure every single CPU is awake as this simple code works.

That code uses "kernel_old=1" in config.txt to bypass the boot stub.

You are not using "kernel_old=1" in your config.txt, so the firmware's default boot stub is running (or armstub8.bin from your SD card), and that boot stub is halting all but one of the CPUs.


You have a point. BRB, checking this out.

EDIT: It's midnight for me, but I checked some resources and Octocontrabass's answer seems to be right. I will work on this tomorrow and will reply with a full solution if it works.

Author:  bzt [ Sat Jan 19, 2019 5:21 am ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

Hi,

Your code will be executed on all cores no matter what you do in config.txt. This is the case even if config.txt does not exists (recommended).

The memory cache is wired per core, but you have one RAM. Therefore if you change the memory from one core, you need to refresh the cache in other cores. To do that, either map the memory as non-cacheable, outter-shareable or implicitly use a data barrier (dsb).

Cheers,
bzt

Author:  dublevsky [ Sat Jan 19, 2019 7:31 am ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

bzt wrote:
Hi,

Your code will be executed on all cores no matter what you do in config.txt. This is the case even if config.txt does not exists (recommended).

The memory cache is wired per core, but you have one RAM. Therefore if you change the memory from one core, you need to refresh the cache in other cores. To do that, either map the memory as non-cacheable, outter-shareable or implicitly use a data barrier (dsb).

Cheers,
bzt


Hi,

I/D caches are not enabled yet.

I'm currently working on Octocontrabass's answer. I revised this asnwer on /r/asm and decided to google for 'raspberry pi cpu-release-addr', which led me to Device Tree Blobs. After compiling bcm2710-rpi-3-b-plus.dtb back to .dts format and looking into it there's indeed a cpu-release-addr parameter for every cpu. I'm currently writing a quick and dirty mailbox interface implementation to check this, so no progress yet.

EDIT: ok, I checked the code with kernel_old=1 and disable_commandline_tags=1 in config.txt and it still doesn't work, so I'm gonna try using dsb and report back.
EDIT2: wrapping every single load and store into with 'dsb sy' didn't work either.

Author:  dublevsky [ Sat Jan 19, 2019 9:10 am ]
Post subject:  Re: [AArch64 / Bare metal] Need help with CPU communication

OK. After being a complete idiot for ~1 week I finally got it working.

Big thanks to Octocontrabass and /u/TNorthover.

Solution:
If you have no custom boot options in config.txt RPI bootloader will load your image at 0x8000 for kernel7.img (32-bit kernel) or 0x80000 for kernel8.img (64-bit kernel).
The stubs that are used for loading in that case are armstub7.S and armstub8.S. As I'm writing a 64-bit kernel for AArch64 I looked into the process of booting in armstub8.S.

After some minimal CPU initialization the bootloader loads Device Tree Blob (Flattened Device Tree) address to x0 and kernel entry address (_start in my case) to x4 for CPU0 and CPU0 jumps to the specified address.
CPU[1:3], on the other hand, load x4 with their respective barrier's address and sit in a loop, which consists of 2 steps: Waiting For Event (WFEing), then checking x4 for a non-zero value.

x4 = x5 + (x6 << 3), where
x5 = spin_cpu0 address - basically a base address for cpu 'barriers'. Equals to 0xd8
x6 = cpu id

So by writing value '0x80000' or '&_start' to 0xe0, 0xe8 and 0xf0 and then Sending an EVent (SEVing) from CPU0 CPU[1:3] wakes up and jumpts to _start.

Attachments:
Screenshot from 2019-01-19 16-56-01.png
Screenshot from 2019-01-19 16-56-01.png [ 13.48 KiB | Viewed 3475 times ]

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/