OSDev.org

The Place to Start for Operating System Developers
It is currently Mon Sep 23, 2019 1:02 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: C/C++ Compiler
PostPosted: Thu Dec 27, 2018 11:58 am 
Offline
Member
Member
User avatar

Joined: Tue Mar 06, 2007 11:17 am
Posts: 1146
Image COMPILER-2018-12-27.zip

http://sourceforge.net/p/c-compiler/

I've been developing a C compiler all this year 2018.
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript, and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.

Now that there are only 4 days left for this year, I am announcing it.

I just managed to implement a set of basic text processing functions to recognize C keywords and some syntax.

The most important thing was developing a file called C.ASM, which helps to generate code for implementing the C conventions for calling, declaring function bodies and declaring local or global variables.

The most complex thing I managed was being able to make cout << "Hello"; work in pure assembly using MSVCIRT.DLL (C:\COMPILER\C\X86\EXAMPLES\HAND_ASM\CPP\OOP_CPP\01_00.cpp).


I would like to keep developing it, but next year I will work on implementing full 32-bit paging functionality for a formal memory allocator based on finding fast free/used/reserved pages. I will work more slowly on my compiler as I need to use it, until I find time, another year, to study how to process complex expressions and text in general for source code and structured binaries.

_________________
http://190.53.3.113/api (My OS compatible with DOS)

(udocproject@yahoo.com)
-----------------------------
IP for hosts file (all domains):
190.53.3.113 api.exe


Last edited by ~ on Fri Dec 28, 2018 12:30 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Thu Dec 27, 2018 7:57 pm 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 976
~ wrote:
The idea is a compiler capable of ... and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


FYI, gcc has an option to put functions into individual sections. So, if your .c (and therefore .o) file has 10 functions and only one is being pulled at link time, you will get just that one function, not all 10.


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Thu Dec 27, 2018 8:40 pm 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1489
Location: Athens, GA, USA
Having seen the last version of it you posted, I am approaching this with trepidation. I fear that bleeding eyes may be in my near future.

EDIT #1: I am currently (at 2150 EST on 2018-12-27) on my fourth attempt to download a 20MB file from the unstable piece of effluvia that is Archefire. Tilde, you do know you can attach files to posts, right? Also, 20MB for a hand-coded compiler? Please tell me you didn't include the executables and object files, because seriously, that would be just rude.

EDIT #2: Yep, it's all there. sigh Why do I even bother...

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Thu Dec 27, 2018 8:59 pm, edited 3 times in total.

Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Thu Dec 27, 2018 8:51 pm 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7412
Location: Germany
~ wrote:
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript...


Note that this concept is breaking the language specification.

For example, unless I explicitly...

Code:
#include <stdio.h>


...there may not be a declaration of a function named printf(). This is a requirement of the language C.

And as include statements are part of the language specification, not implementing include functionality is also breaking the language (and most, if not all, existing code).

Also, I assume that "the whole environment" refers to the respective standard libraries for C and C++. Aside from wondering how you intend to provide these in this "header-less environment" of yours, note that the very purpose of C / C++, or any real programming language actually, is to interface with third-party libraries that the compiler vendor may never have heard of.

The way to interface with these third-party libraries, in C/C++, is through header files that provide the declarations necessary for the compiler to actually do its job.

I.e., you're implementing a new language that is neither C nor C++, and will not work with existing code, only with code explicitly written for your "language that isn't really C or C++".

~ wrote:
...and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


If you think this is how existing compilers / linkers work, you are sorely mistaken. Which makes me question your qualification to make a project like this happen in the first place.

I've written a partial implementation of the C standard library. I've worked as a professional C++ coder for the last, oh, about 16 years now. I'd say what you're presenting here is...

  • ...based on a faulty understanding of how existing C/C++ toolchains work,
  • ...not solving any problems anybody (except you?) is actually having with C/C++,
  • ...is far outside the scope you, or even a team of a handful of "you's", can pull off in any realistic time scale.

You want to toy with a custom compiler for a custom language, which might be quite similar to C/C++ even, go right ahead.

But please don't call it "a C/C++ compiler", as it isn't. These two languages explicitly forbid what you are describing, and there is absolutely no need for doing it in the first place.

I'd be happy to explain the various details I hinted at.

~ wrote:
...to study how to process complex expressions and text in general for source code and structured binaries.


If you have to study that yet, stay away from C++. Seriously. That language isn't just "C plus a bit", it's the litmus test for compiler builders, as C++ is among the ugliest beasts imaginable as far as parsing the language is concerned. C isn't quite that simple to begin with, but it's a walk in the park compared to C++.

You're setting yourself up for a train wreck. Try lower hanging fruit...

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Thu Dec 27, 2018 9:11 pm 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1489
Location: Athens, GA, USA
Solar wrote:
You're setting yourself up for a train wreck. Try lower hanging fruit...


Clearly you haven't read the code yet. The train is well and truly wrecked already.

Seriously, the few parts of it that are intelligible at all show a massive, deliberate ignorance of every rule of writing clear and concise code which I know of, and not a shred of knowledge about compiler design can be found in any part of it.

I did note that ~ took absolutely no heed of the previous advice, as almost everything I critiqued about the early cuts of the program are not only still there, but greatly expanded upon. Reading this make me want to cry in frustration and horror.

~, I am going to be blunt: STOP. You don't know what you are doing.

Go read a book on compiler design - any book on compiler design, because, honestly, even a bad one would be better than what you think you are doing now. You are not merely trying to reinvent the wheel; you are trying to reinvent the high-tech wheels from a Mars rover using a screwdriver and bits of bubble gum, and it isn't working.

As I have said before, there is no other subject in computer science that has been as thoroughly studied as compilers and interpreters. You are hurting yourself by not learning more about it before trying to write one.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Thu Dec 27, 2018 9:29 pm, edited 4 times in total.

Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Fri Dec 28, 2018 8:25 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 3466
Location: Chichester, UK
Quote:
only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers
With that lack of understanding you are going to have a tough time writing anything approaching a compiler.

It's not true for static linking, let alone the situation when dynamic libraries are used.


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Fri Dec 28, 2018 3:28 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 241
iansjack wrote:
It's not true for static linking, let alone the situation when dynamic libraries are used.

Well, if dynamic libraries are used, the entire library is linked in, but hey, at least the text sections are shared across processes. If the kernel supports that. Not the data sections, though. And of course you have to pay for the position-independent code and the relocations at every start (or, with lazy binding, when you call a function). Oh, and the text section sharing doesn't help if a process is the only user of a library (version).

I am not a fan of dynamic linking.


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Fri Dec 28, 2018 3:42 pm 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 3466
Location: Chichester, UK
I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.

(And, to be strictly accurate, dynamic libraries are not linked in to the executable.)


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Sat Dec 29, 2018 4:01 am 
Offline
Member
Member
User avatar

Joined: Thu Mar 10, 2016 7:35 am
Posts: 163
Location: Lancaster, England, Disunited Kingdom
iansjack wrote:
I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.



Hm. Well I agree with the words here, Ian, but not I'm afraid with the meaning you were wanting to convey! :| :)


Top
 Profile  
 
 Post subject: Re: C/C++ Compiler
PostPosted: Sat Dec 29, 2018 9:37 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1489
Location: Athens, GA, USA
Ordinarily, I would want to try and bring this thread back onto the original topic, but given what that topic was, I can understand why no one wants to go back to it...

Still, let's at least give ~ some help. He clearly needs it. I'll start with the obvious and necessary part which Tilde doesn't seem to have yet: the grammar.

So, I'll write a simple grammar for the lexical analyzer. The grammar for the parser can wait; in many ways, the lexer is more crucial, as it is where most compilers spend 80% or more of their time. I don't expect ~ to write a high-performance Deterministic Finite Automaton for it the way professional compilers do, but at least knowing what you are looking for will help.

Actually, to make it even simpler, let's start with the lexer for the preprocessor, which should really sort of be a separate thing from the compiler (mostly, it is useful for them to share some symbols but that's getting ahead of things).

So, a regular grammar for the lexemes of a subset of the C preprocessor in Extended Backus-Naur Form:

Code:
token ::= keyword | identifier | paren
keyword ::= "#"("include" | "define" | "if" | "ifdef" | "ifndef" |  "elif" | "else" | "endif" | pragma")
identifer ::= alpha {alphanum}
alpha ::= "A" | "a" |"B" | "b" | "C" | "c"  ... | Z" | "z"
alphanum ::= alpha | digit
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
paren ::= lparen | rparen
lparen ::= "("
rparen ::= ")"


I am ignoring the contents of the body of a #define, as a simple preprocessor won't actually look at the body of the defined macros. A more complete preprocessor will, but I am trying to keep things simple here.

For those unfamiliar with EBNF, these are a set of what are called 'production rules', which describe in a compact way what makes up a given type of grammar element. This particular grammar translates to:

  • A preprocessor token is either a keyword, an identifier, or a parenthesis.
  • A keyword is a hash character ("#") followed by one of a set of literals: "include", "define", "if, "ifdef", "ifndef", "elif", "else", "endif", or "pragma".
  • An identifier consists of a letter, followed by zero or more letters or digits.
  • A parenthesis is either a left parenthesis "(" or a right parenthesis ")".

Now, at this point you might be asking why you would go to the trouble of making something like this. The reason is simple: you can use this as a guide for how to code the lexer itself, either directly for a simple ad-hoc lexer, or for defining the states of a Deterministic Finite State Automaton for a more formal lexer. I can discuss that in greater detail later if Tilde wants.

The lexer for the C code itself is a good deal more complicated; to give you a leg up on that, I will write out the EBNF for basic number recognition for you as well:

Code:
number ::= "0" | non-zero-digit [integer] | "0" octal-integer | "0x" hex-integer | fp-number |  signed-number
non-zero-octal-digit ::= "1" | "2" | "3" | "4" | "5" | "6" | "7"
octal-digit ::= "0" | non-zero-octal-digit
non-zero-digit ::= non-zero-octal-digit | "8" | "9"
digit ::= "0" | non-zero-digit
non-zero-hex-digit ::= non-zero-digit | "A" | "a" |"B" | "b" | "C" | "c" | "D" | "d" | "E" | "e" | "F" | "f"
hex-digit ::= "0" | non-zero-hex-digit
integer ::= digit [{digit}]
octal-integer ::= octal-digit [{octal-digit}]
hex-integer ::= hex-digit [{hex-digit}]
fp-number ::= "." integer | integer "." integer
signed-number ::= ("+" | "-" ) (integer | fp-number)


Once again, I'm deliberately ignoring some things like exponential notation, in order to keep it simple. Note also that as it is now, it would not be strictly deterministic, as there are some potential issues with the definition of fp-number; this can be ironed out later.

I might go over the actual preprocessor grammar (that is, the grammar for parsing the preprocessor directives) later, once I am convinced that ~ has actually understood this post and why it is relevant.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group