OSDev.org
https://forum.osdev.org/

C/C++ Compiler
https://forum.osdev.org/viewtopic.php?f=2&t=33397
Page 1 of 2

Author:  ~ [ Thu Dec 27, 2018 11:58 am ]
Post subject:  C/C++ Compiler

Image COMPILER-2018-12-27.zip

http://sourceforge.net/p/c-compiler/

I've been developing a C compiler all this year 2018.
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript, and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.

Now that there are only 4 days left for this year, I am announcing it.

I just managed to implement a set of basic text processing functions to recognize C keywords and some syntax.

The most important thing was developing a file called C.ASM, which helps to generate code for implementing the C conventions for calling, declaring function bodies and declaring local or global variables.

The most complex thing I managed was being able to make cout << "Hello"; work in pure assembly using MSVCIRT.DLL (C:\COMPILER\C\X86\EXAMPLES\HAND_ASM\CPP\OOP_CPP\01_00.cpp).


I would like to keep developing it, but next year I will work on implementing full 32-bit paging functionality for a formal memory allocator based on finding fast free/used/reserved pages. I will work more slowly on my compiler as I need to use it, until I find time, another year, to study how to process complex expressions and text in general for source code and structured binaries.

Author:  alexfru [ Thu Dec 27, 2018 7:57 pm ]
Post subject:  Re: C/C++ Compiler

~ wrote:
The idea is a compiler capable of ... and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


FYI, gcc has an option to put functions into individual sections. So, if your .c (and therefore .o) file has 10 functions and only one is being pulled at link time, you will get just that one function, not all 10.

Author:  Schol-R-LEA [ Thu Dec 27, 2018 8:40 pm ]
Post subject:  Re: C/C++ Compiler

Having seen the last version of it you posted, I am approaching this with trepidation. I fear that bleeding eyes may be in my near future.

EDIT #1: I am currently (at 2150 EST on 2018-12-27) on my fourth attempt to download a 20MB file from the unstable piece of effluvia that is Archefire. Tilde, you do know you can attach files to posts, right? Also, 20MB for a hand-coded compiler? Please tell me you didn't include the executables and object files, because seriously, that would be just rude.

EDIT #2: Yep, it's all there. sigh Why do I even bother...

Author:  Solar [ Thu Dec 27, 2018 8:51 pm ]
Post subject:  Re: C/C++ Compiler

~ wrote:
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript...


Note that this concept is breaking the language specification.

For example, unless I explicitly...

Code:
#include <stdio.h>


...there may not be a declaration of a function named printf(). This is a requirement of the language C.

And as include statements are part of the language specification, not implementing include functionality is also breaking the language (and most, if not all, existing code).

Also, I assume that "the whole environment" refers to the respective standard libraries for C and C++. Aside from wondering how you intend to provide these in this "header-less environment" of yours, note that the very purpose of C / C++, or any real programming language actually, is to interface with third-party libraries that the compiler vendor may never have heard of.

The way to interface with these third-party libraries, in C/C++, is through header files that provide the declarations necessary for the compiler to actually do its job.

I.e., you're implementing a new language that is neither C nor C++, and will not work with existing code, only with code explicitly written for your "language that isn't really C or C++".

~ wrote:
...and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.


If you think this is how existing compilers / linkers work, you are sorely mistaken. Which makes me question your qualification to make a project like this happen in the first place.

I've written a partial implementation of the C standard library. I've worked as a professional C++ coder for the last, oh, about 16 years now. I'd say what you're presenting here is...

  • ...based on a faulty understanding of how existing C/C++ toolchains work,
  • ...not solving any problems anybody (except you?) is actually having with C/C++,
  • ...is far outside the scope you, or even a team of a handful of "you's", can pull off in any realistic time scale.

You want to toy with a custom compiler for a custom language, which might be quite similar to C/C++ even, go right ahead.

But please don't call it "a C/C++ compiler", as it isn't. These two languages explicitly forbid what you are describing, and there is absolutely no need for doing it in the first place.

I'd be happy to explain the various details I hinted at.

~ wrote:
...to study how to process complex expressions and text in general for source code and structured binaries.


If you have to study that yet, stay away from C++. Seriously. That language isn't just "C plus a bit", it's the litmus test for compiler builders, as C++ is among the ugliest beasts imaginable as far as parsing the language is concerned. C isn't quite that simple to begin with, but it's a walk in the park compared to C++.

You're setting yourself up for a train wreck. Try lower hanging fruit...

Author:  Schol-R-LEA [ Thu Dec 27, 2018 9:11 pm ]
Post subject:  Re: C/C++ Compiler

Solar wrote:
You're setting yourself up for a train wreck. Try lower hanging fruit...


Clearly you haven't read the code yet. The train is well and truly wrecked already.

Seriously, the few parts of it that are intelligible at all show a massive, deliberate ignorance of every rule of writing clear and concise code which I know of, and not a shred of knowledge about compiler design can be found in any part of it.

I did note that ~ took absolutely no heed of the previous advice, as almost everything I critiqued about the early cuts of the program are not only still there, but greatly expanded upon. Reading this make me want to cry in frustration and horror.

~, I am going to be blunt: STOP. You don't know what you are doing.

Go read a book on compiler design - any book on compiler design, because, honestly, even a bad one would be better than what you think you are doing now. You are not merely trying to reinvent the wheel; you are trying to reinvent the high-tech wheels from a Mars rover using a screwdriver and bits of bubble gum, and it isn't working.

As I have said before, there is no other subject in computer science that has been as thoroughly studied as compilers and interpreters. You are hurting yourself by not learning more about it before trying to write one.

Author:  iansjack [ Fri Dec 28, 2018 8:25 am ]
Post subject:  Re: C/C++ Compiler

Quote:
only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers
With that lack of understanding you are going to have a tough time writing anything approaching a compiler.

It's not true for static linking, let alone the situation when dynamic libraries are used.

Author:  nullplan [ Fri Dec 28, 2018 3:28 pm ]
Post subject:  Re: C/C++ Compiler

iansjack wrote:
It's not true for static linking, let alone the situation when dynamic libraries are used.

Well, if dynamic libraries are used, the entire library is linked in, but hey, at least the text sections are shared across processes. If the kernel supports that. Not the data sections, though. And of course you have to pay for the position-independent code and the relocations at every start (or, with lazy binding, when you call a function). Oh, and the text section sharing doesn't help if a process is the only user of a library (version).

I am not a fan of dynamic linking.

Author:  iansjack [ Fri Dec 28, 2018 3:42 pm ]
Post subject:  Re: C/C++ Compiler

I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.

(And, to be strictly accurate, dynamic libraries are not linked in to the executable.)

Author:  MichaelFarthing [ Sat Dec 29, 2018 4:01 am ]
Post subject:  Re: C/C++ Compiler

iansjack wrote:
I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.



Hm. Well I agree with the words here, Ian, but not I'm afraid with the meaning you were wanting to convey! :| :)

Author:  Schol-R-LEA [ Sat Dec 29, 2018 9:37 am ]
Post subject:  Re: C/C++ Compiler

Ordinarily, I would want to try and bring this thread back onto the original topic, but given what that topic was, I can understand why no one wants to go back to it...

Still, let's at least give ~ some help. He clearly needs it. I'll start with the obvious and necessary part which Tilde doesn't seem to have yet: the grammar.

So, I'll write a simple grammar for the lexical analyzer. The grammar for the parser can wait; in many ways, the lexer is more crucial, as it is where most compilers spend 80% or more of their time. I don't expect ~ to write a high-performance Deterministic Finite Automaton for it the way professional compilers do, but at least knowing what you are looking for will help.

Actually, to make it even simpler, let's start with the lexer for the preprocessor, which should really sort of be a separate thing from the compiler (mostly, it is useful for them to share some symbols but that's getting ahead of things).

So, a regular grammar for the lexemes of a subset of the C preprocessor in Extended Backus-Naur Form:

Code:
token ::= keyword | identifier | paren
keyword ::= "#"("include" | "define" | "if" | "ifdef" | "ifndef" |  "elif" | "else" | "endif" | pragma")
identifer ::= alpha {alphanum}
alpha ::= "A" | "a" |"B" | "b" | "C" | "c"  ... | Z" | "z"
alphanum ::= alpha | digit
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
paren ::= lparen | rparen
lparen ::= "("
rparen ::= ")"


I am ignoring the contents of the body of a #define, as a simple preprocessor won't actually look at the body of the defined macros. A more complete preprocessor will, but I am trying to keep things simple here.

For those unfamiliar with EBNF, these are a set of what are called 'production rules', which describe in a compact way what makes up a given type of grammar element. This particular grammar translates to:

  • A preprocessor token is either a keyword, an identifier, or a parenthesis.
  • A keyword is a hash character ("#") followed by one of a set of literals: "include", "define", "if, "ifdef", "ifndef", "elif", "else", "endif", or "pragma".
  • An identifier consists of a letter, followed by zero or more letters or digits.
  • A parenthesis is either a left parenthesis "(" or a right parenthesis ")".

Now, at this point you might be asking why you would go to the trouble of making something like this. The reason is simple: you can use this as a guide for how to code the lexer itself, either directly for a simple ad-hoc lexer, or for defining the states of a Deterministic Finite State Automaton for a more formal lexer. I can discuss that in greater detail later if Tilde wants.

The lexer for the C code itself is a good deal more complicated; to give you a leg up on that, I will write out the EBNF for basic number recognition for you as well:

Code:
number ::= "0" | non-zero-digit [integer] | "0" octal-integer | "0x" hex-integer | fp-number |  signed-number
non-zero-octal-digit ::= "1" | "2" | "3" | "4" | "5" | "6" | "7"
octal-digit ::= "0" | non-zero-octal-digit
non-zero-digit ::= non-zero-octal-digit | "8" | "9"
digit ::= "0" | non-zero-digit
non-zero-hex-digit ::= non-zero-digit | "A" | "a" |"B" | "b" | "C" | "c" | "D" | "d" | "E" | "e" | "F" | "f"
hex-digit ::= "0" | non-zero-hex-digit
integer ::= digit [{digit}]
octal-integer ::= octal-digit [{octal-digit}]
hex-integer ::= hex-digit [{hex-digit}]
fp-number ::= "." integer | integer "." integer
signed-number ::= ("+" | "-" ) (integer | fp-number)


Once again, I'm deliberately ignoring some things like exponential notation, in order to keep it simple. Note also that as it is now, it would not be strictly deterministic, as there are some potential issues with the definition of fp-number; this can be ironed out later.

I might go over the actual preprocessor grammar (that is, the grammar for parsing the preprocessor directives) later, once I am convinced that ~ has actually understood this post and why it is relevant.

Author:  ~ [ Fri Oct 18, 2019 7:36 am ]
Post subject:  Re: C/C++ Compiler

How to Code an stdcall or cdecl Function

Here, if we set .ret_bytes to anything other than 0, the function becomes stdcall, automatically popping parameter bytes from the stack.

With this skeleton we can see how easy it is to make a C/C++ translator, and how easy it is to write C-like functions with local variables, stack parameters, return value in WIDEAX, even by hand or with an integrated compiler translator for regular functions.

Look how easy it is to assign local labels to parameters and variables, just as if we were programming in plain C:
Code:
;Inputs (push order):
;       Param 1
;       Param 0
;
;;
C_function_skeleton:
;Create stack frame:
;;
  push widebp
  pushfwide
  mov widebp,widesp
  add widebp,wideword_sz*3   ;Go past saved flags, WIDEBP,
  ;;                         ;and return address to
                             ;directly access parameters

;Stack parameters:
;;
  %xdefine .Param0   wideword[widebp]
  %xdefine .Param1   wideword[widebp+wideword_sz]



;Variables start:
;;
  %xdefine .Var0 wideword[widebp-((wideword_sz*3)+(wideword_sz*1))]
  %xdefine .Var0_byte byte[widebp-((wideword_sz*3)+(wideword_sz*1))]
  %xdefine .Var1 wideword[widebp-((wideword_sz*3)+(wideword_sz*2))]
  sub widesp,wideword_sz*2

;Number of parameter bytes to discard by the function on return.
;If this is NOT 0, the function is stdcall, otherwise it's cdecl:
;;
  %xdefine .ret_bytes wideword_sz*2


;Save used registers:
;;
  push widecx
  push widedx
  push widedi




















;Code start:
;;
  mov .Var0,53  ;Initialize Var0 and copy it to Var1
  push .Var0
  pop .Var1

;Code end:
;;




















;Restore used registers:
;;
  pop widedi
  pop widedx
  pop widecx

add widesp,wideword_sz*2      ;Release local stack variables


;Discard stack frame:
;;
  popfwide
  pop widebp
retwide .ret_bytes  ;Return discarding N parameter bytes

Author:  iansjack [ Fri Oct 18, 2019 8:06 am ]
Post subject:  Re: C/C++ Compiler

Note that, if working in 64-bits, most ABIs use registers rather than the stack to pass parameters/return values.

This is one of the problems with trying to write assembler code that is portable to 32 and 64 bits. It seems to me that, in chasing the rather illusory goal of compatibility, you are ending up with an inefficient implementation.

Author:  ~ [ Sun Nov 24, 2019 3:08 pm ]
Post subject:  Re: C/C++ Compiler

Defining Typed Variables for Immediate Access in Assembler

Instead of making the usual code:
Code:
;Declaration:
;;
variable0 dd 0


;Low-level access:
;;
mov dword[variable0],0





We can do the C-like code:
Code:
;Declaration:
;;
%xdefine variable0 dword[_variable0]
%macro variable0. 2   ;Manual type-cast macro for this variable
   ;%1 can be char, short, etc., which
   ;in turn expand to byte, word, dword, wideword
   ;specific to 16, 32 or 64-bit mode:
   ;;
    mov %1[_variable0],%2
%endmacro
_variable0 dd 0


;High-level access:
;;
mov variable0,0



Which would be the equivalent to declare an int for 32-bit platforms or int32_t/uint32_t.

-----------------------------
iansjack wrote:
Note that, if working in 64-bits, most ABIs use registers rather than the stack to pass parameters/return values.

This is one of the problems with trying to write assembler code that is portable to 32 and 64 bits. It seems to me that, in chasing the rather illusory goal of compatibility, you are ending up with an inefficient implementation.
You can make functions specific to an ABI. Those would assemble with that style in all modes, and then you can make other functions that don't follow any format. You can arrange your code elegantly to choose the best internal functions for each environment in which the program will be built for.

Author:  Schol-R-LEA [ Mon Nov 25, 2019 11:09 am ]
Post subject:  Re: C/C++ Compiler

Should I ever look into this project again, I foresee nasal demons and bleeding eyes... seriously, you seem to be determined to break both ABI compatibility (for any and all OSes other than perhaps your own) and the C language standard at every turn.

At this point, you'd be better off designing a whole new language - and yes, I am fully aware of how large a project that would be - rather than folding, spindling, and mutilating the existing language on a whim. While you can make a dialect of C if you like - it isn't as if there's some sort of Parser Police who would hunt you down for it - if you then claim that it is still compliant C, then you are going to anger anyone else using your hybrid language expecting it to follow the standard language's rules.

As for ABI compliance, well, if you are bound and determined to ensure that your compiler will never interop with any existing libraries, ever, then you do you, I guess, but don't be surprised when it blows up in your face.

Of course, all of this is predicated on your success in compiler development, and given what you've done with that so far, I think everyone save you can see that this is several bridges too far.

Let me ask you again: have you read any of the books or tutorials (or watched any of the videos) we've recommended in the past, and are you applying any of what they say? Please, just give us some sort of answer on this, since as things stand, we don't know what you know, and don't know how to give you any more advice.

I do know that, from your own statements, you've read the old Crenshaw "Let's Build a Compiler" tutorial, yet so far you seem to be ignoring most of what it says, which puzzles me to no end. It speaks to something I've said before: if you write and act like a crackpot, and there is no evidence that you know or understand what you are saying, we can only conclude that you are a crackpot even if you aren't because that's what the evidence is pointing to.

That having been said, I have noticed that you haven't updated the ZIP file with your source code - and I'll address the matter of misusing Sourceforge shortly - which means that, even if you have made significant progress on your compiler (hopefully including fixing all of the problems I have previously mentioned) - we can't see any of that, so we can only base our opinions of your current progress on a single archive file from a year ago.

On that topic, you seem to have misunderstood the point of version control repositories such as Sourceforge. While it is possible - and distressingly common - for projects to upload a single archive there for quick download, the real intent of a site such as SF or Github is to serve as a host for your VCS repo. We've discussed this topic at length on this forum, as well as in the wiki, but let me repeat the point: IF YOU DON'T USE A VCS, EVENTUALLY YOU WILL LOSE YOUR WORK, or worse, be unable to perform a regression on a hidden bug and have to scrap a whole section of existing code. While there are other ways to solve such problems, version control software is designed to facilitate this, and using it should be a no-brainer.

Uploading a single archive file with everything in it doesn't count as version control. You need to use something like Subversion, Git, or Mercurial, and use it consistently. You need to ensure that only the source and resource files get included in the repo, not the object files and executables. You need to have the individual source files visible to anyone browsing your repo, so they can review it (and maybe contributed bug fixes and additional code, should someone want to bother). What you have now is a flat-out misuse of Sourceforge.

Take a look at some other people's code repos, both on SF and on other hosts, and see how they do things correctly. On Sourceforge, a good example of how to use it with SVN is FreeDOS, which I expect you are at least passingly familiar with. For GIT and Github, try Mezzano OS. You should be able to see right away how they differ from what you are doing. Note that both of these hosting platforms have services which automatically pack the latest code release into either a ZIP or a tarball, while at the same time allowing access to individual files through the version control systems.

Please, for your own sake, try emulating their approaches for this and any other projects you're working on.

Author:  dseller [ Wed Nov 27, 2019 3:55 am ]
Post subject:  Re: C/C++ Compiler

Any compiler that doesn't use a lexer and parser component, is bound to be ridiculously complex and redundant with code. I am really curious to see if this compiler will ever actually work, and produce a valid executable from any arbitrary piece of C code that it processes.

Also, this post:

Quote:
Defining Typed Variables for Immediate Access in Assembler


I am absolutely 100% not understanding what those assembly snippets/macros have to do with writing a C compiler :| Unless of course it has no real backend and it would emit assembler, which would be a proper design decision from ~. At least that would restrict his scope somewhat.

Page 1 of 2 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/