Studying C programming

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

Kevin wrote:

~ wrote:

Why are we using binary-mode libraries and references instead of optimizing by using text-mode libraries/applications and produce binary-mode files?

Is the keyword that you're looking for Link Time Optimisation (LTO)? Or what other kind of optimisation are you thinking of?

Quote:

Why are we using binary-mode-only executable header generation intermixed (inefficiently at least at system-level programming) with text-mode-only sources, when their format looks trivial and very cheap (just like casual hand-made databases written before Fox Pro for DOS) when converted to C or assembly code, but becomes very confusing if only the tool-chain ever produces them for making for a more ignorant programmer?

Essentially for the same reason why we write in a high-level language rather than assembly if we can, or we use an assembler rather than a hexeditor in order to create binaries at least. It's something that the machine is perfectly able to generate, which saves me time and avoids mistakes.

I very much prefer working on actually creative tasks rather than doing stupid legwork that the computer could be doing for me.

This is no "stupid legwork" for me. We only need to create a few different executable skeletons, so it's much more lightweight than using exclusively pre-built binary-generation utilties without ever getting involved in how things work. I cannot expect to call myself, let alone properly function as a system programmer if I don't even know how the executables are built and loaded without the help of a third-party tool.

I need real portability. Being a single person who needs things be written and work most of the time for years (why not script and trivialize compiled languages not by automating the build process but by increasing the knowledge and full code resources on how things work without only ever using binary-blob header stubs?) I cannot afford to leave my code trapped to just a past version of a Linux distro or toolchain. Have you seen how the bigger a project is the more things it needs to detect from a system and how there are old and useful programs that cannot be compiled or run in a practical way unless an old machine with an older Linux distro is used? I don't want to dramatically update and having to modify my code and linking scripts just to keep being able to compile my OS without errors or strange bugs under newer distros. It defeats any portability that the application-level code could have if it gets stalled in having to use specific external compiler/linker/library versions.

I cannot afford that to happen, so I need to build real code portability which doesn't depend on the libraries installed on a system nor in the toolchain, but in producing proper executable structures (after building many executables). Plus, I need that knowledge for things like being able to load drivers from CD installation disks, under DOS or under my own OS (what I mean is loading the Windows drivers for 9x or NT that come in the original CDs in other OS, so I need to know all about the executable format to implement that driver layer).

Anything that you don't know and get to learn is actually a creative work, specially if you need to discover and learn the most basic functions and structures of a system. From there more creative and original work can be done, but if we just use the executables that the compiler/linker make we are actually being much less creative. Also, those sort of executables will never stop compiling and running properly, and then we will also be able to take our creative work to many more platforms than if we didn't know how to build an executable by hand (a simple casual-database-like piece of code to write in C or assembly that will become familiar after a few compiles and studying iterations).

C needs more fundamental structures and library functions that we can easily hand-pick,copy and paste instead of just including header files and stub/header binary blobs. Do you see how only using generated executables is the same than copying and pasting source code without understanding it, just that here we are copy/pasting binary blobs?

Assembly needs more automatic-sized instructions and data types built into the language itself. It's a very easy to update language for Intel/AMD.

With that, they all (C and assembly, and even C++ with hand-picked library code for the C++ run-time) will become dramatically similar and exchangeable.

Schol-R-LEA · **Posted:** Mon Sep 05, 2016 8:42 pm

~ wrote:

Assembly needs more automatic-sized instructions and data types built into the language itself. It's a very easy to update language for Intel/AMD.

That's a tall order, given that there are no data types in assembly language at all. There are instructions that will treat the values given to them as integers, or bit fields, or addresses, or floating-point reals, and operate on them as if they were a certain size, but there is nothing keeping an assembly programmer from performing an 32-bit integer divide on a byte 3-6 of 80-bit FP value by a bytes 7-10 of a 10-byte string of ASCII characters. Assembly values don't have types or sizes, assembly operations do.

(There are a handful of systems which do have hardware support for things like tagged memory, such as the LispMs, but they are very much in the minority. Even in those cases, my understanding is that the type tags are mainly used for things like garbage collection, and aren't inspected by the majority of instructions, so it is not like the typing systems in most high-level languages.)

This isn't something you can add to assembly language, either, not without enough macrology to make (as I said earlier) a sort of pocket compiler. The entire point of assembly language is to put a very thin, human-readable layer over the stream of machine opcodes that for the program instructions, and give a few basic tools such as automatically calculated labels, macro substitutions for common repeated forms, and a few directives for controlling how the assembler performs its job. Assembly language mnemonics are just a name for the opcode (or a group of related opcodes) given as a convenience over trying to code in some more direct representation of opcodes such as octal or hex.

glauxosdever · **Posted:** Tue Sep 06, 2016 2:56 am

Hi,

I'm posting it here too for completeness.

I think he should start researching the differences between different CPU architectures (x86, ARM, SPARC, MIPS), so he sees that his idea of portable assembler does not make sense.

Next, what does the phrase "DOS gives unrestricted access to hardware" even mean? If you want unrestricted access to hardware, realise you can do these below, while not having DOS limitations:

Write Linux kernel modules (as already suggested numerous times)
Write your own OS (are we not in http://osdev.org/?)

I sincerely expect that ~ takes the minimal effort to do the above.

Regards,
glauxosdever

Kevin · **Posted:** Tue Sep 06, 2016 3:18 am

~ wrote:

This is no "stupid legwork" for me. We only need to create a few different executable skeletons, so it's much more lightweight than using exclusively pre-built binary-generation utilties without ever getting involved in how things work.

So what do you think a linker does? Of course, it has some kind of an executable skeleton that it uses.

I don't know what you mean exactly by "lightweight", but given that the linker doesn't do anything different than you would be doing manually (otherwise one of you wouldn't be producing correct binaries) I don't think it's true.

Quote:

I cannot expect to call myself, let alone properly function as a system programmer if I don't even know how the executables are built and loaded without the help of a third-party tool.

See, that's a problem. In fact, it is the problem with your endeavour. You're enthusiastic about improving something that you don't even understand. But if you don't understand something, you can't possibly know enough to improve it. This is probably the reason why some ideas look brilliant to you, but I and many others on this forum (who can rightly be called system programmers) are vehemently opposed to them.

Now what can you do about this? Start actually writing an OS. Not a fancy one that solves all the hard problems, but just something like the monolithic Unix clone that everyone and their dog does here. Doing this will teach you a lot of stuff (amongst others how executable formats work, because you will need to load application binaries) and then, once you know how the existing stuff works, you can propose something better.

Quote:

I need real portability. Being a single person who needs things be written and work most of the time for years (why not script and trivialize compiled languages not by automating the build process but by increasing the knowledge and full code resources on how things work without only ever using binary-blob header stubs?) I cannot afford to leave my code trapped to just a past version of a Linux distro or toolchain.

That's fair enough. Do you know when the executable format changed for the last time in Linux? ELF was introduced in 1995. The older a.out format that was used before that is still supported today. And that's it. Two formats in 25 years, where the current one has been in use for 21 years. I don't think that's the part that you need to worry about.

I think being a single person who needs things be written and work most of the time for years, you should concentrate on not doing unnecessary work. Replacing a perfectly fine toolchain is unnecessary work, and a lot of it, even if it may look simple to you at the moment. It will grow more and more complex as you need more advanced features from the toolchain. It's the same problem as with a handcrafted bootloader. It seems simple at the beginning, but it will be very limiting in the end unless you invest a lot of time in it.

Quote:

Have you seen how the bigger a project is the more things it needs to detect from a system and how there are old and useful programs that cannot be compiled or run in a practical way unless an old machine with an older Linux distro is used?

Yes. This has nothing to do with executable formats but with being able to build against different library versions. You could avoid it by bundling a specific library version with the source of every application and then linking statically (you could technically still link dynamically, but the result would be the same as static linking as the library will be used only by that application), so that you can unconditionally assume that the library has all the features that are present in this specific version.

Of course, this would be a horrible approach. Not only would the library be duplicated both on disk and in memory for every application, using up much more space than a single copy, but also, if a bug in the library is fixed, you'd have to fix the copy in every single application and then distribute updates for all those full applications instead of just one small library. Imagine that the bug fixed has some security impact and you're set for a disaster.

So, yes, configure scripts are annoying sometimes. They are a price I'm willing to pay for having each library installed centrally and only once. You may choose a different strategy for your OS to avoid this and I would consider it a bad idea, but then people are still using Windows where shipping libraries with applications is still a widely used practice as far as I know.

Quote:

don't want to dramatically update and having to modify my code and linking scripts just to keep being able to compile my OS without errors or strange bugs under newer distros. It defeats any portability that the application-level code could have if it gets stalled in having to use specific external compiler/linker/library versions.

How often did your build chain already break your OS project, and in which ways? We've ruled out the executable format as a problem, and I think you're rather unlikely to use external libraries in it, whose features might need checking. So what's left that could make you "dramatically update and having to modify my code and linking scripts just to keep being able to compile my OS"?

I think you're wasting endless time to solve a non-problem.

By the way, further improving the stability (as in "doesn't change") of the build environment is one of the reasons why it's generally recommend in this forum that you use a cross-compiler and toolchain built specifically for your OS project. This gives you a defined version of everything involved, so no unexpected changes unless you actively and knowingly update your cross-toolchain.

Quote:

I cannot afford that to happen, so I need to build real code portability which doesn't depend on the libraries installed on a system nor in the toolchain, but in producing proper executable structures (after building many executables).

Whether or not your code relies on external libraries is a decision that isn't made by your toolchain, but by your code. If you don't want to rely on tools that could possibly change in new versions - even though there is no real reason to avoid them - you should stop using an assembler and instead write machine code in a hex editor.

Quote:

Plus, I need that knowledge for things like being able to load drivers from CD installation disks, under DOS or under my own OS (what I mean is loading the Windows drivers for 9x or NT that come in the original CDs in other OS, so I need to know all about the executable format to implement that driver layer).

Yes, you will learn about executable formats when you write a loader for them. One more reason why you don't need to write the format manually in order to learn about it.

Quote:

Also, those sort of executables will never stop compiling and running properly

You're not doing anything different from a normal toolchain, so why would the result differ? If you link statically (which is what everyone who starts with OS development does), the only reason why a program stops running is that the environment it runs in (e.g. syscall interface) changes. And then your handcrafted executables will stop running, too.

Quote:

With that, they all (C and assembly, and even C++ with hand-picked library code for the C++ run-time) will become dramatically similar and exchangeable.

We'll leave that for another time, but apparently you still have to learn about languages, too.

OSDev.org

Studying C programming

Who is online