OSDev.org
https://forum.osdev.org/

C/C++ Compiler
https://forum.osdev.org/viewtopic.php?f=2&t=33397
Page 1 of 1

Author:  ~ [ Thu Dec 27, 2018 11:58 am ]
Post subject:  C/C++ Compiler

Image COMPILER-2018-12-27.zip

http://sourceforge.net/p/c-compiler/

I've been developing a C compiler all this year 2018.
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript, and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.

Now that there are only 4 days left for this year, I am announcing it.

I just managed to implement a set of basic text processing functions to recognize C keywords and some syntax.

The most important thing was developing a file called C.ASM, which helps to generate code for implementing the C conventions for calling, declaring function bodies and declaring local or global variables.

The most complex thing I managed was being able to make cout << "Hello"; work in pure assembly using MSVCIRT.DLL (C:\COMPILER\C\X86\EXAMPLES\HAND_ASM\CPP\OOP_CPP\01_00.cpp).


I would like to keep developing it, but next year I will work on implementing full 32-bit paging functionality for a formal memory allocator based on finding fast free/used/reserved pages. I will work more slowly on my compiler as I need to use it, until I find time, another year, to study how to process complex expressions and text in general for source code and structured binaries.

Author:  alexfru [ Thu Dec 27, 2018 7:57 pm ]
Post subject:  Re: C/C++ Compiler

~ wrote:
The idea is a compiler capable of ... and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


FYI, gcc has an option to put functions into individual sections. So, if your .c (and therefore .o) file has 10 functions and only one is being pulled at link time, you will get just that one function, not all 10.

Author:  Schol-R-LEA [ Thu Dec 27, 2018 8:40 pm ]
Post subject:  Re: C/C++ Compiler

Having seen the last version of it you posted, I am approaching this with trepidation. I fear that bleeding eyes may be in my near future.

EDIT #1: I am currently (at 2150 EST on 2018-12-27) on my fourth attempt to download a 20MB file from the unstable piece of effluvia that is Archefire. Tilde, you do know you can attach files to posts, right? Also, 20MB for a hand-coded compiler? Please tell me you didn't include the executables and object files, because seriously, that would be just rude.

EDIT #2: Yep, it's all there. sigh Why do I even bother...

Author:  Solar [ Thu Dec 27, 2018 8:51 pm ]
Post subject:  Re: C/C++ Compiler

~ wrote:
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript...


Note that this concept is breaking the language specification.

For example, unless I explicitly...

Code:
#include <stdio.h>


...there may not be a declaration of a function named printf(). This is a requirement of the language C.

And as include statements are part of the language specification, not implementing include functionality is also breaking the language (and most, if not all, existing code).

Also, I assume that "the whole environment" refers to the respective standard libraries for C and C++. Aside from wondering how you intend to provide these in this "header-less environment" of yours, note that the very purpose of C / C++, or any real programming language actually, is to interface with third-party libraries that the compiler vendor may never have heard of.

The way to interface with these third-party libraries, in C/C++, is through header files that provide the declarations necessary for the compiler to actually do its job.

I.e., you're implementing a new language that is neither C nor C++, and will not work with existing code, only with code explicitly written for your "language that isn't really C or C++".

~ wrote:
...and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


If you think this is how existing compilers / linkers work, you are sorely mistaken. Which makes me question your qualification to make a project like this happen in the first place.

I've written a partial implementation of the C standard library. I've worked as a professional C++ coder for the last, oh, about 16 years now. I'd say what you're presenting here is...

  • ...based on a faulty understanding of how existing C/C++ toolchains work,
  • ...not solving any problems anybody (except you?) is actually having with C/C++,
  • ...is far outside the scope you, or even a team of a handful of "you's", can pull off in any realistic time scale.

You want to toy with a custom compiler for a custom language, which might be quite similar to C/C++ even, go right ahead.

But please don't call it "a C/C++ compiler", as it isn't. These two languages explicitly forbid what you are describing, and there is absolutely no need for doing it in the first place.

I'd be happy to explain the various details I hinted at.

~ wrote:
...to study how to process complex expressions and text in general for source code and structured binaries.


If you have to study that yet, stay away from C++. Seriously. That language isn't just "C plus a bit", it's the litmus test for compiler builders, as C++ is among the ugliest beasts imaginable as far as parsing the language is concerned. C isn't quite that simple to begin with, but it's a walk in the park compared to C++.

You're setting yourself up for a train wreck. Try lower hanging fruit...

Author:  Schol-R-LEA [ Thu Dec 27, 2018 9:11 pm ]
Post subject:  Re: C/C++ Compiler

Solar wrote:
You're setting yourself up for a train wreck. Try lower hanging fruit...


Clearly you haven't read the code yet. The train is well and truly wrecked already.

Seriously, the few parts of it that are intelligible at all show a massive, deliberate ignorance of every rule of writing clear and concise code which I know of, and not a shred of knowledge about compiler design can be found in any part of it.

I did note that ~ took absolutely no heed of the previous advice, as almost everything I critiqued about the early cuts of the program are not only still there, but greatly expanded upon. Reading this make me want to cry in frustration and horror.

~, I am going to be blunt: STOP. You don't know what you are doing.

Go read a book on compiler design - any book on compiler design, because, honestly, even a bad one would be better than what you think you are doing now. You are not merely trying to reinvent the wheel; you are trying to reinvent the high-tech wheels from a Mars rover using a screwdriver and bits of bubble gum, and it isn't working.

As I have said before, there is no other subject in computer science that has been as thoroughly studied as compilers and interpreters. You are hurting yourself by not learning more about it before trying to write one.

Author:  iansjack [ Fri Dec 28, 2018 8:25 am ]
Post subject:  Re: C/C++ Compiler

Quote:
only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers
With that lack of understanding you are going to have a tough time writing anything approaching a compiler.

It's not true for static linking, let alone the situation when dynamic libraries are used.

Author:  nullplan [ Fri Dec 28, 2018 3:28 pm ]
Post subject:  Re: C/C++ Compiler

iansjack wrote:
It's not true for static linking, let alone the situation when dynamic libraries are used.

Well, if dynamic libraries are used, the entire library is linked in, but hey, at least the text sections are shared across processes. If the kernel supports that. Not the data sections, though. And of course you have to pay for the position-independent code and the relocations at every start (or, with lazy binding, when you call a function). Oh, and the text section sharing doesn't help if a process is the only user of a library (version).

I am not a fan of dynamic linking.

Author:  iansjack [ Fri Dec 28, 2018 3:42 pm ]
Post subject:  Re: C/C++ Compiler

I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.

(And, to be strictly accurate, dynamic libraries are not linked in to the executable.)

Author:  MichaelFarthing [ Sat Dec 29, 2018 4:01 am ]
Post subject:  Re: C/C++ Compiler

iansjack wrote:
I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.



Hm. Well I agree with the words here, Ian, but not I'm afraid with the meaning you were wanting to convey! :| :)

Author:  Schol-R-LEA [ Sat Dec 29, 2018 9:37 am ]
Post subject:  Re: C/C++ Compiler

Ordinarily, I would want to try and bring this thread back onto the original topic, but given what that topic was, I can understand why no one wants to go back to it...

Still, let's at least give ~ some help. He clearly needs it. I'll start with the obvious and necessary part which Tilde doesn't seem to have yet: the grammar.

So, I'll write a simple grammar for the lexical analyzer. The grammar for the parser can wait; in many ways, the lexer is more crucial, as it is where most compilers spend 80% or more of their time. I don't expect ~ to write a high-performance Deterministic Finite Automaton for it the way professional compilers do, but at least knowing what you are looking for will help.

Actually, to make it even simpler, let's start with the lexer for the preprocessor, which should really sort of be a separate thing from the compiler (mostly, it is useful for them to share some symbols but that's getting ahead of things).

So, a regular grammar for the lexemes of a subset of the C preprocessor in Extended Backus-Naur Form:

Code:
token ::= keyword | identifier | paren
keyword ::= "#"("include" | "define" | "if" | "ifdef" | "ifndef" |  "elif" | "else" | "endif" | pragma")
identifer ::= alpha {alphanum}
alpha ::= "A" | "a" |"B" | "b" | "C" | "c"  ... | Z" | "z"
alphanum ::= alpha | digit
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
paren ::= lparen | rparen
lparen ::= "("
rparen ::= ")"


I am ignoring the contents of the body of a #define, as a simple preprocessor won't actually look at the body of the defined macros. A more complete preprocessor will, but I am trying to keep things simple here.

For those unfamiliar with EBNF, these are a set of what are called 'production rules', which describe in a compact way what makes up a given type of grammar element. This particular grammar translates to:

  • A preprocessor token is either a keyword, an identifier, or a parenthesis.
  • A keyword is a hash character ("#") followed by one of a set of literals: "include", "define", "if, "ifdef", "ifndef", "elif", "else", "endif", or "pragma".
  • An identifier consists of a letter, followed by zero or more letters or digits.
  • A parenthesis is either a left parenthesis "(" or a right parenthesis ")".

Now, at this point you might be asking why you would go to the trouble of making something like this. The reason is simple: you can use this as a guide for how to code the lexer itself, either directly for a simple ad-hoc lexer, or for defining the states of a Deterministic Finite State Automaton for a more formal lexer. I can discuss that in greater detail later if Tilde wants.

The lexer for the C code itself is a good deal more complicated; to give you a leg up on that, I will write out the EBNF for basic number recognition for you as well:

Code:
number ::= "0" | non-zero-digit [integer] | "0" octal-integer | "0x" hex-integer | fp-number |  signed-number
non-zero-octal-digit ::= "1" | "2" | "3" | "4" | "5" | "6" | "7"
octal-digit ::= "0" | non-zero-octal-digit
non-zero-digit ::= non-zero-octal-digit | "8" | "9"
digit ::= "0" | non-zero-digit
non-zero-hex-digit ::= non-zero-digit | "A" | "a" |"B" | "b" | "C" | "c" | "D" | "d" | "E" | "e" | "F" | "f"
hex-digit ::= "0" | non-zero-hex-digit
integer ::= digit [{digit}]
octal-integer ::= octal-digit [{octal-digit}]
hex-integer ::= hex-digit [{hex-digit}]
fp-number ::= "." integer | integer "." integer
signed-number ::= ("+" | "-" ) (integer | fp-number)


Once again, I'm deliberately ignoring some things like exponential notation, in order to keep it simple. Note also that as it is now, it would not be strictly deterministic, as there are some potential issues with the definition of fp-number; this can be ironed out later.

I might go over the actual preprocessor grammar (that is, the grammar for parsing the preprocessor directives) later, once I am convinced that ~ has actually understood this post and why it is relevant.

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/