OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 3:59 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: G++ behavior of storing char* literals causing issues
PostPosted: Thu Aug 25, 2022 4:46 pm 
Offline
Member
Member

Joined: Thu Aug 25, 2022 3:54 pm
Posts: 27
I've been fumbling around with writing a protected mode operating system in C++ using G++ and nasm. I've written a pretty basic real mode command line interface in assembly before but haven't done anything like this with anything other than assembly before.

Anyway I've got a dynamic memory allocation system set up that I'm happy with as well as a custom made "dynamic array" data container that's kind of like the vector but with a few differences.

One problem that has had me stuck for a while is the way G++ allocates and stores char* literals. I never can reliably access the strings without them getting corrupted by.. something. I'm talking about when you do something like:
Code:
char *foo = "bar";

or
Code:
char[] foo = "bar";


My kernel loads into ram at location 0x7e00. The size of the program (right now) is 7329 bytes. This means any memory beyond 0x9AA1 and before 0x7e00 should theoretically be a safe place to store stuff. Well, anytime there's something surrounded by quotes in the form of "blah blah blah", it gets stored somewhere but the characters don't always get copied to that location or at least are no longer in that location by the time any other code gets to them.

For example, if I run the code:
Code:
char test[] = "test111";

Running it will result in the string "test111" getting copied to memory location 28364 or 0x6ECC. With the pointer getting set to 0x6ECC. Reading that memory shows that it worked correctly. "t" is at 0x6ECC, "e" is at 0x6ECD and so on and so fourth. However most of the time, this doesn't work. If I do the same exact thing but modify the string slightly, it doesn't work. For example.
Code:
char test[] = "test11";

will set the pointer to a value of 28349 or 0x6EBD. The memory at location 0x6EBD will be 0, the next one will be 0 and everything will just be zeros.
Also, doing
Code:
char *test = "foo";
never works at all. Only the [] operator makes it somewhat work.

Now you may be wondering: why is this a problem? Because manually allocating with calloc doesn't solve the problem. I can do:
Code:
char *test = (char*)calloc(5, sizeof(char);
test = "test";

While calloc will indeed allocate an array of the specified size where it's intended to go, using the char array literal of test = "test" changes the value of the pointer and attempts to copy the text to that location instead of where calloc initially allocated the array. It doesn't matter if the = "test" assignment was way shorter, the same size or way longer, it always reallocates it. This means is usually doesn't work except for sometimes. This problem is especially detrimental to allowing me to accomplish anything because it uses the same broken behavior for copying char array literals as function parameters which means that doesn't work. I've even written functions to search for strings in all of ram. Anytime the chars just get lost like that, they don't appear elsewhere. It's not just a miscalculated pointer, the data is just completely gone out of existence.

Is there a way to modify the way gcc allocates this stuff? I mean besides spending days, perhaps weeks trying to figure out where in the source code gcc deals with allocating char arrays and modifying it to work with my memory allocation system instead of whatever memory allocation convention its currently using that's messing things up. But I guess if that's the only way then it is what it is.


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Fri Aug 26, 2022 8:21 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
What you describe is indicative of the .rodata section not being stored or linked correctly. Investigate where it is in your kernel file. Maybe you aren't loading enough sectors? I certainly heard of that before.

SomeGuyWithAKeyboard wrote:
Code:
char *test = (char*)calloc(5, sizeof(char);
test = "test";
That on the other hand is indicative of a programmer who doesn't know either C or C++. You create a pointer, assign to it the result of calloc(), then overwrite the pointer with the value of the string literal "test". Meaning you have now leaked the 5-char object allocated with calloc(), and "test" points somewhere completely different. Maybe you meant to copy the string into the pointer, but that would have been a call to memcpy(). But since strings currently don't work for you, even that wouldn't help.

You know, if strings never work but most other code does, then that's indicative of your program not running where you think it does. A debugger would help you figure this out.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Fri Aug 26, 2022 2:08 pm 
Offline
Member
Member

Joined: Thu Aug 25, 2022 3:54 pm
Posts: 27
nullplan wrote:
What you describe is indicative of the .rodata section not being stored or linked correctly. Investigate where it is in your kernel file. Maybe you aren't loading enough sectors? I certainly heard of that before.

I know I'm loading enough sectors I have this for my loader code that happens while still in real mode:
Code:
; start putting in values:
mov ah, 2h    ; int13h function 2
mov al, 50    ; we want to read a lot of sectors I guess. 50 * 512 = ~25kb = a lot
mov ch, 0     ; from cylinder number 0
mov cl, 2     ; the sector number 2 - second sector (starts from 1, not 0)
mov dh, 0     ; head number 0
xor bx, bx   
mov es, bx    ; es should be 0
mov bx, 7e00h ; 512bytes from origin address 7e00h
int 13h


512*50 + 0x7e00= 0xE200 which *should* work since it's not copying more than 64kb.

One of the few things I haven't yet exhausted every possible thing I can possibly try is the way it links.

Now I had a lot of problems getting makefiles and linking to actually work. I wasn't able to get anything to compile using my own intuition. I wasn't able to get anything to work from looking at "working" examples either. I had to basically make a script that has g++ compile it into a standalone file, compile the assembly separately with nasm into a standalone file and then combine those 2 files together. This isn't the same as that elf and link.ld business everyone else does but it makes everything except string literals work which is more than any of my attempts at using elfs and the ld linker was able to achieve.

My compile script is:
Code:
g++ -march=i486 -m32 -nostartfiles -ffreestanding -nostdlib -nolibc -nodefaultlibs -Ttext 0x7e00 system.cpp -o system
objcopy -O binary -j .text system system.raw
nasm bootloader.asm -o bootloader.bin
cat system >> bootloader.bin


This does introduce the problem of rodata indeed not getting compiled. I can get rodata with "objcopy -O binary -j .rodata system system_rodata.raw" of course but it's essentially useless because gcc has a special place in memory ro data is supposed to go and copying it to bootloader.bin with cat doesn't put it in the right place. There doesn't seem to be a easy way to make g++ put rodata where you want like you can with the text block via the "-Ttext 0x7e00" parameter.
From what i've gathered, possibly the only way to specify where ro data goes is with a linker script. Unfortunately, I cannot for the life of me get ld or link.ld scripts to work for some reason.

I've tried using a linker script with something like:
Code:
g++ -march=i486 -m32 -nostartfiles -ffreestanding -nostdlib -nolibc -nodefaultlibs -Tlink.ld system.cpp -o system
nasm bootloader.asm -o bootloader.bin
cat system.raw >> bootloader.bin

with a link.ld of
Code:
OUTPUT_FORMAT("elf32-i386")
ENTRY(begin)
SECTIONS
{
    . = 0x7e00;

    .text BLOCK(8K) : ALIGN(4K)
    {
        *(.text)
    }

    .rodata BLOCK(4K) : ALIGN(4K)
    {
        *(.rodata)
    }

    .data BLOCK(4K) : ALIGN(4K)
    {
        *(.data)
    }

    .bss BLOCK(4K) : ALIGN(4K)
    {
        *(.bss)
    }

    end = .;
}


but it never links correctly enough to even boot. I've tried fidgeting around, trying stuff some other projects on the internet and extensively trying all kinds of different command line parameters in the man pages but the best I can ever get is an error of
Quote:
warning: cannot find entry symbol lf_i386; defaulting to 0000000000008000
. Copying this to memory location 0x8000 doesn't allow it to boot, naming a function "lf_i386" anywhere in my c++ code doesn't fix it, trying something like ".lf_i386 = 0x7e00" in the linker file doesn't fix it and that's about all the things there are to try.

What can I do to make a link.ld script actually work and potentially solve my rodata string problems?


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Fri Aug 26, 2022 2:41 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
SomeGuyWithAKeyboard wrote:
Quote:
warning: cannot find entry symbol lf_i386; defaulting to 0000000000008000
Well, that error means you are somehow passing the option "-elf_i386" to the linker. Which obviously doesn't work. If anything it should be "-m elf_i386". But even simpler ought to be to just get a cross-compiler for i386-elf going. Then you don't need any emulation options.

Your current approach strips out all sections not named ".text", and so you will likely have a problem as soon as templates enter the mix. There is a fix, but obviously you need to fix your linker instead

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Fri Aug 26, 2022 2:54 pm 
Offline
Member
Member

Joined: Thu Aug 25, 2022 3:54 pm
Posts: 27
Trying the following
Code:
g++ -march=i486 -m32 -std=c++17 -nostartfiles -ffreestanding -fPIE -Ttext 0x7e00 system.cpp
nasm -g -F dwarf bootloader.asm
ld -o bootloader.bin bootloader.o -Ttext 0x7c00 --oformat=binary
ld -o system.raw system -Tlink.ld
cat system.raw >> bootloader.bin

Doesn't work. It complains the dwarf parameter isn't valid and when I remove that, ld will refuse to touch anything nasm spits out. Ld will just report "file format not recognized, treating as link script"
I guess from further investigation, it seems ro data wants to be before the executable text block. Since rodata changes in size, this means I can't jump to my program from assembly without some kind of global label that somehow allows nasm to see stuff in c++ code which is another thing I haven't been able to get to work.

Here is my github repository for this:
https://github.com/Xeraster/SimpleProtectedModeOS

If anyone knows what exactly I need to do to accomplish this, I would love to hear it. It's really frustrating because I just can't get the compiler or linker to cooperate. If ONLY there was a parameter to manually define the rodata location on g++ without having to go through ld to do so. I suppose I could make a really ugly hack where I declare a character array in the begin function using string literals, search memory for that exact string (to find the rodata that I manually attached to the end of .text with a script) and then memcpy that of whatever size the rodata is to wherever the pointer address of the character array is. Would be way better if I could figure out how to get the linker to work so I don't have to do this. I'm pretty stumped and I feel as though I have exhausted every possible thing to make ld work but hopefully someone on here will know the secret after seeing my source code.


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Fri Aug 26, 2022 11:10 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
SomeGuyWithAKeyboard wrote:
Code:
g++ -march=i486 -m32 -std=c++17 -nostartfiles -ffreestanding -fPIE -Ttext 0x7e00 system.cpp
nasm -g -F dwarf bootloader.asm
ld -o bootloader.bin bootloader.o -Ttext 0x7c00 --oformat=binary
ld -o system.raw system -Tlink.ld
cat system.raw >> bootloader.bin

That's not going to work. Maybe you were trying to do something like this instead?

Code:
i686-elf-g++ -c -march=i486 -std=c++17 -ffreestanding -o system.o system.cpp
nasm -g -f elf32 -o bootloader.o bootloader.asm
i686-elf-g++ -nostdlib -o disk.bin -T link.ld bootloader.o system.o -lgcc

If you don't yet have a cross-compiler, something along the lines of "g++ -m32 -no-pie" might work as a temporary substitute for "i686-elf-g++", but you need a cross-compiler.

Your linker script is... odd. ALIGN() and BLOCK() mean the same thing, so it doesn't make sense to have any BLOCK() statements. Other than that, it's pretty close. Some small changes will get you what you want.

Code:
OUTPUT_FORMAT("binary")
SECTIONS
{
    . = 0x7c00;

    .text : ALIGN(1K)
    {
        bootloader.o(.text)
        *(.text .text.*)
    }

    .rodata : ALIGN(4K)
    {
        *(.rodata .rodata.*)
    }


This is incomplete, but I think you can fill in the rest by copying the .rodata section. I changed the output format to a flat binary, but you can use ELF and objcopy it into a flat binary if you prefer. The wildcards I've used may not be enough to catch all of the sections GCC emits.

There are problems with bootloader.asm as well. The first few lines should look like this:
Code:
CPU 586
bits 16

SECTION .text

Note the removal of the org statement, the addition of the bits statement, the removal of square brackets around the cpu statement, and the change from "text" to ".text".

To reference symbols in another object file, declare the symbol with "extern" in your assembly. This will allow your bootloader to do things like call global constructors and jump to the correct entry point.

There are too many problems with your C++ code for me to fix it. You can't include <cmath> or <sys/io.h>. You don't need code to initialize global variables - those are in the .data section. You do need code to initialize global constructors - I'm not familiar enough with the C++ ABI to tell you how, but the wiki has some information that might be helpful if it's not too outdated.


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Sat Aug 27, 2022 1:55 pm 
Offline
Member
Member

Joined: Thu Aug 25, 2022 3:54 pm
Posts: 27
I was able to use your suggestions to actually get it to compile and link. Thanks a lot! I didn't know you were supposed to / could use g++ a second time to link. :oops:

I needed to use "-fno-pie" instead of "-no-pie" to make it work or else it would spit out a bunch of undefined reference to "`_GLOBAL_OFFSET_TABLE_'" errors.


Top
 Profile  
 
 Post subject: Re: G++ behavior of storing char* literals causing issues
PostPosted: Sun Aug 28, 2022 12:36 am 
Offline
Member
Member

Joined: Sun Jun 23, 2019 5:36 pm
Posts: 618
Location: North Dakota, United States
You do know that if you built a cross-compiler instead of trying to hack it together with a hosted build of GCC most of your problems would go away, right? I mean, you'd have a new set of problems like trying to include files that don't exist and all that, but you've completely skipped the cross-compiler step which is going to cause you all kinds of problems because your compiler is going to assume that your running in a hosted environment, which means its going to let you do things that you can't actually do (or that you shouldn't do).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: belliash, SemrushBot [Bot] and 77 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group