Public Domain C/C++ Compiler

Luca1 · **Joined:** Tue Apr 11, 2017 11:18 am **Posts:** 15

Schol-R-LEA wrote:

And to the shock and amazement of absolutely no one.

Thanks for making me laugh after having to facepalm. =D>

Schol-R-LEA · **Posted:** Tue Dec 26, 2017 7:38 pm

OK, I am taking a somewhat more detailed look at it now, and have some questions.

What is this supposed to mean?
Code:
//Comments with PHP-style verbosity for the full
//effects that each of them cause. It will make
//for a much better reference from the code itself
//and much easier to read and understand.
Why did you comment out some of the header inclusions, and insert their details into the file itself? Why are some not commented out, and more specifically, why did you add some of their function prototypes anyway?
Are you aware that <unistd.h>, <sys/stat.h>, and <sys/types.h> are Unix/POSIX specific, and in particular, that there is no requirement for a C library on a non-POSIX OS such as Windows to support functions such as stat() and mkdir()?
Why did you prefix ever function in the file with OPCODE__Application_ and OPCODE__Compiler_, when the whole thing is a single file and there are no external declarations for them? Even from a namespace control perspective, this make no sense.
Why any of these specialized versions of standard operations, defined in terms of those standard operations but in a way that is less clear than the standard versions? This isn't what was talking about when I mentioned buffer control! The whole point of that is to hide these details, not bring them front and center!!!!!
Code:
int OPCODE__Application_get_Executable_Path(char **Executable_Path);

FILE * OPCODE__Application_open_Local_ReadOnly_ASCII_7bit_Name_BinaryFile(char *file_name);

FILE * OPCODE__Application_open_Local_ReadWrite_ASCII_7bit_Name_BinaryFile(char *file_name);

int OPCODE__Application_read_C_long_BinaryFile(FILE *srcfilehandle, long *longvalue);

int OPCODE__Application_write_C_long_BinaryFile(FILE *srcfilehandle, long longvalue);

int OPCODE__Application_write_C_long_BinaryFile_Position(FILE *srcfilehandle, long longvalue, unsigned long filepos);

int OPCODE__Application_write_C_long_BinaryFile_PositionPreserve(FILE *srcfilehandle, long longvalue, unsigned long filepos);

int OPCODE__Application_read_C_unsigned_char_byte_BinaryFile(FILE *srcfilehandle, unsigned char *charvalue);

long OPCODE__Application_read_C_unsigned_char_buffer_BinaryFile(FILE *srcfilehandle, unsigned char **charvalue, long length);

long OPCODE__Application_write_C_unsigned_char_buffer_BinaryFile(FILE *srcfilehandle, unsigned char *charvalue, unsigned long length);
long OPCODE__Application_writeln_C_unsigned_char_buffer_BinaryFile(FILE *srcfilehandle, unsigned char *charvalue, unsigned long length);

long OPCODE__Application_write_C_unsigned_char_buffer_BinaryFile_PositionPreserve(FILE *srcfilehandle, unsigned char *charvalue, unsigned long length);
long OPCODE__Application_writeln_C_unsigned_char_buffer_BinaryFile_PositionPreserve(FILE *srcfilehandle, unsigned char *charvalue, unsigned long length);

long OPCODE__Application_get_Opened_File_Size(FILE *filehandle);

int OPCODE__Application_move_File_Position(FILE *srcfilehandle, long newabspos);

long OPCODE__Application_get_File_Position(FILE *srcfilehandle);

int OPCODE__Application__is_EOF(FILE *filehandle);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_find_Nearest_NewlineMarkers(FILE *src, FILE *lines_dat_file);

void OPCODE__Compiler_copy_ASCIIZ_string(unsigned char *src, unsigned char **dest);

int OPCODE__Compiler_Print_ASCII_7bit_String(char *str);

int OPCODE__Compiler_Println_ASCII_7bit_String(char *str);

int OPCODE__Compiler_Print_ASCII_7bit_C_long_HexString(long value);

int OPCODE__Compiler_is_Version_String_Request(char *cmdlinearg);

void OPCODE__Compiler_Print_Version_to_Screen();

int OPCODE__Compiler_Create_build_Directory(char *str);

int OPCODE__Compiler_Open_Global_Compilation_Files();

int OPCODE__Compiler_define_ASCII_text_lines(FILE *lines_metadata_file, FILE *srcfilehandle);

int OPCODE__Compiler_Load_ASCII_Text_File(FILE *filehandle, FILE *line_data_file);

void OPCODE__Compiler_Unload_ASCII_Text_File(FILE *filehandle);

int OPCODE__Compiler_Switch_ASCII_Text_File(FILE *filehandle);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_readline(FILE *src, FILE *lines_dat_file, unsigned long linenumber);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_readline_char_size(FILE *src, FILE *lines_dat_file, unsigned long linenumber, unsigned long charpos, unsigned long size);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_getLength_line_char(FILE *lines_dat_file, unsigned long linenumber, unsigned long charpos);

long OPCODE__Application_read_C_ASCIIZ_byte_string_BinaryFile(FILE *srcfilehandle, unsigned long length);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_goto_line_char(FILE *src, FILE *lines_dat_file, unsigned long linenumber, unsigned long charpos);

long OPCODE__Compiler_BinarySourceFile_ASCII_8bit_get_line_char_from_fileOffset(FILE *src, FILE *lines_dat_file, unsigned long fileOffset, unsigned long *linenumber, unsigned long *charpos);

int OPCODE__Compiler_BinarySourceFile_ASCII_8bit_compare_string(FILE *src0, unsigned long file0ptrpos, FILE *src1, unsigned long file1ptrpos, unsigned long length, int movePointer);

int OPCODE__Compiler_ASCII_8bit_compare_string(unsigned char *str0, unsigned char *str1, unsigned long str0len);
Why is a compiler targeting an assembler even handling binary files at all?
Binary source files? WTF are you talking about with that?
Why specify 7-bit ASCII vs 8-bit ASCII, when there is no such thing as a standard 8-bit ASCII character set? What are you trying to say with those names? Is it to allow for UTF-8 (which is not ASCII, even if the ASCII character set is a subset of the standard UTF-8 code points), or some other 8-bit+ ASCII-derived encoding like Latin-1, and if so, why not call it that?
Why are all of your 'buffers' declared as pointers? Are you intending to dynamically allocate the buffers in every single case, and if so, why, and how do you intend to determine the sizes for the buffers before you have read them? The usual practice is to allocate a large-ish buffer (say, 1024 bytes), fill that, and process that buffer's worth of data; you would then re-load that buffer as needed, rather than trying to find the file size ahead of time. Why isn't that adequate for your needs, when the alternative you seem to want involves filesystem-specific details that the standard C library can't access?
Why are you hardcoding file paths, of all things? Seriously, file paths????
Reading the source file line by line... don't. Just don't. Seriously, if you can't figure that much out yet, you have no business writing code in C, never mind writing a C compiler in C.
This, just... THIS!
Code:
//General use variables:
///
//General use variables:
///
unsigned char *COMPILER__fileBuf0000=NULL;
long COMPILER__fileBuf0000_logical_file_offset=0;
long COMPILER__fileBuf0000_file_buffer_offset=0;
long COMPILER__fileBuf0000_charbuflen=0;
long COMPILER__fileBuf0000_origfilepos0=0;
long COMPILER__fileBuf0000_newlineMarkerSZ=0;
long COMPILER__fileBuf0000_bufferOverrun=0;

Appalling naming 'convention' aside, it never occurred to you to bundle these into a struct?
Why are you using a long for the newline marker size - hell, why is it there at all, rather than, say, a null-delimited string of the newline for this system? Actually, let me re-phrase that as: WHY THE HECK WOULD THE COMPILER NEED TO EVEN KNOW WHAT THE NEWLINE IS, IN A LANGUAGE WHICH TREATS ALL WHITESPACE IDENTICALLY (not counting line comments and preprocessor operations)? You don't need to know the size of the newline, because no matter what else, it is going to be either a whitespace character, or a group of two or more whitespace characters - and any sequence of one or more whitespace characters is treated as a single space. Period, end of subject.
You might object to this in saying that you need to know how many lines the program has. This, however, is only true for the purposes of producing a program listing and error messages - the compiler itself doesn't need that information at all. For those things that do need it? The tokenizer should handle it. THAT'S WHY YOU HAVE ONE. For most of the things in the compiler, the exact text of the token isn't important, and for those things where it does matter, such as literals, it should be handled in the symbol table.
What in Eris' name is the 'bufferOverrun' counter for, and why would it need to be a long? Are you expecting to be compiling source files larger than 4 GiB, and if so... no, I won't even finish that sentence, I honestly don't want to know.

And this is still only scratching the surface. I am genuinely at a loss as to what you think you are doing with this, or what you think any of this garbage has to do with compiling C code.

zaval · **Posted:** Wed Dec 27, 2017 3:36 pm

it gets ugly, you all. let him alone, you, complainers, behave like he is forcing you to use his compiler, or like you have nothing to do else than to mock another member ejecting your unstoppable stupid sneer. does anybody really need this? all these "omg", "wtf", "holy guacamole" "I was laughing so hard", "unobtrusively" suggest you are seeing yourselves as f&cking supercoders or whatever supersomething, which you are not. everybody here is free to tell about his/her work nevermind how stupid it looks for some, and it is absolutely unnecessary to throw sh!t on that person just because you are bored. and as of criticism, if to say something, then it should be really helpful and objective at least, as School-R-Lea did (I hope), and if you don't have to say something like that and everything you can is that garbage, it's better to STFU.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

@Schol-R-LEA
That wall of text made me want to gauge my eyes out...

For someone who wants to make a C compiler, he seems unaware of the fact that the C standard only guarantees unique identifiers up to 32 characters; and that was once a much lower 6 characters.

Now, I'm not going to argue whether or not verbose vs terse identifiers is better. There's good arguments for both cases. However, C in particular doesn't lend itself well to long identifier names. It has no way of implementing namespacing, aside from using structs with function pointers or macro-hackery like this:

Code:

#ifndef LIB_H
#define LIB_H 1

extern int lib_var;

extern void lib_func();

#ifdef USING_LIB
#define var lib_var
#define func lib_func
#endif

#endif

Code:

#define USING_LIB

#include "lib.h"

...

And this shouldn't be considered a solution at all.

My usual trick to maintaining C code is to treat every source file as if it's a class. All variables and functions that should be private are given the 'static' attribute, and each of these 'classes' has it's own header to expose the public members.

When writing a parser, I typically start by writing out the grammar rules for the language in something like EBNF notation:

Code:

assignment : identifier, "=", expression ;
identifier : /[A-Z_a-z][0-9A-Z_a-z]*/
...

and then for each rule, I create a data structure and a function that matches that exact rule and nothing else. In psuedo-code, it's something like this:

Code:

typedef struct _TreeNode {
    _TreeNode *nextSibling;
    _TreeNode *firstChild;
} TreeNode;

typedef struct {
    TreeNode treeNode;
    Identifier identifier;
    Expression expression;
} Assignment;

bool getAssignment(Assignment *assignment, Context *context) {
    Token token;

    // expect identifier
    if (!getIdentifier(&assignment->identifier, context))
        return false;

    // expect "="
    if (!getToken(&token, context) || token.type != TOKEN_EQUALS)
        return false;
    
    // expect expression
    if (!getExpression(&assignment->expression, context))
        return false;

    return true;
}

I left out the memory management details for clarity, but it's a quick and dirty way to build a syntax tree. That being said, I only use this technique for prototyping or writing simple tools to get the job done because it utilizes recursion. If it's for production code, I typically go to my dry-erase board and plan out a state machine which I implement in a single function.

From here, walk the syntax tree and keep track of temporary variables to generate SSA-form IR code. The IR code gets shovelled into the optimizer, which generates the assembly code. If you know what you're doing, writing a compiler isn't that hard. Writing an optimizer is what's hard.

Schol-R-LEA · **Posted:** Fri Dec 29, 2017 8:03 am

I will admit that I was a bit more... upset than I meant to be, and that I did veer into hysteria and mockery in places. My intention was to review the code and make recommendations, or at least ask for clarification about the parts I couldn't follow, but... well, let's just say it that the I was rather put off by it.

A lot of what has been worrying me is that ~ doesn't seem to want to listen to advice. The code demonstrates not so much incompetence as ignorance, and it is an ignorance for which there is no real justification - these are well-studied topics, and we have given ~ many pointers towards reference material about it.

While there is always room for new ideas, this is one place where you would really need to know the existing methods first before trying to innovate. It is really unlikely for anyone to just stumble into a better solution in such a thoroughly explored problem space.

Also, the problem itself is inherently difficult, and is one which programmers applied themselves for two decades to figuring out before the solid foundations were laid for it. Just because that work was done 50-60 years ago, by the first two generations of programmers, doesn't mean it was something one can single-handedly reinvent today. It was solved early on, yes, but it was solved then because it had to be solved before almost anything else in computer science could really progress.

Tilde isn't just turning his back on compiler technology by ignoring the foundations, he is turning his back on the heart of computer science itself - material without which the entire field is lost. This isn't just the basis for modern compilers; it is the basis for more than half of the things done in programming today, period.

While programmers rarely apply it directly, it is precisely due to how wildly successful modern language theory - and things spawned by it such as such as finite state automata, lexer generators, and Backus-Naur Form grammar descriptions - that it is possible for them to do so. Many things used every day - from web browsers to RDBMS engines to command-line shell interpreters to 'intelligent' editors to TCP stacks - relies on these foundations, in ways that are often surprising to those who never looked underneath the layers of abstraction that shield them from the details. It is possible for most programmers to get by without understanding them, in the same way it is possible to live ones entire life without knowing Newton's Laws of Motion, but a compiler writer can no more ignore them than a physicist could ignore F=MA.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

I'd argue that part of the reason that most software these days is so bloated and slow despite processors being millions of times faster has to do with the fact that fewer and fewer people understand those core concepts of software engineering. We keep wanting to create higher and higher levels of abstraction that don't really solve many problems rather than teaching fundamental skills for designing, understanding, and maintaining complex systems.

Schol-R-LEA · **Posted:** Mon Jan 01, 2018 11:41 pm

I don't know when ~ will be back again, but I did find this Spanish language ebook on compiler design which he might want to look at. Compiladores. Teoría e implementación by Jacinto Ruíz Catalán .

Solar · **Posted:** Tue Jan 02, 2018 1:27 am

zaval wrote:

it gets ugly, you all.

~ keeps advertising his project, not only in this thread. Among the (well-earned) derision there are lots of helpful remarks, pointers etc.; these have been always there, the derision has grown over time.

None of us is to be any judge of what ~ does with his spare time. But as soon as you use a public forum to advertise your ideas, projects etc., you have to live with people replying to it based on its merits. The problem is that ~ keeps coming back for more, without any hint of him having even read what we all suggested to him.

All replies are very much based in what ~ is showcasing.

Keeping quiet about it is, effectively, endorsement.

Schol-R-LEA · **Posted:** Sat Jan 13, 2018 12:41 pm

@~: You haven't responded to what Solar and I have been saying, which makes it really hard for us to offer advice or feedback of any kind. Assuming you haven't simply blocked the two of us, would you mind answering a few questions:

have you taken a look at any of the resources I and the others here have recommended, and if not, why not?
If you have, what if anything to you take away from them, and do have any questions about them, or any problems with them which we might b able to offer advice on?

We really are tying to help you when we suggest you read things like the Dragon book. It is simply foolish not to at least try to use the resources that are available, don't you think?

MichaelFarthing · **Posted:** Sat Jan 13, 2018 3:42 pm

Solar wrote:

Keeping quiet about it is, effectively, endorsement.

On the contrary, keeping quiet shows a total lack of interest. Much more effective.

bluemoon · **Posted:** Sun Jan 14, 2018 1:28 am

Good luck deals with User-defined literals with the naive parser.

glauxosdever · **Posted:** Sun Jan 14, 2018 8:06 am

Hi,

MichaelFarthing wrote:

Solar wrote:

Keeping quiet about it is, effectively, endorsement.

On the contrary, keeping quiet shows a total lack of interest. Much more effective.

Definitely. I don't fight with ~ anymore, since it's been proven it accomplishes nothing. I instead went and reported yesterday some of his posts in the most recent topics to bring moderator attention. Furthermore, I don't think people joking about ~ or about hooking a bot into ~'s account or something is ok either, since it makes us look hostile (in the eyes of those who don't know about ~). I reported some such posts too yesterday.

Effectively, what we have here is an amateur but stubborn developer (despite being registered here since around 11 years) that refuses any advice (or maybe doesn't understand it, I don't know and I don't want to know, since it's not my business). I don't know whether banning him or educating him would be overall better, however banning him would benefit the forums and not ~, while educating him would maybe benefit ~ and temporarily would distract the forums, until ~ becomes less stubborn and more professional. It's up to the moderators to decide however and I'd say we should give him a final chance. However, the current situation shouldn't continue, i.e. ~ spreading misinformation and refusing advice, while others entering into fights with ~. This benefits absolutely no one, since this way ~ doesn't seem to improve (or stop/rethink whatever he is doing either) while others get angry at ~ and spend time fighting with him.

---

Now, I acknowledge ~ has some principles, i.e. providing public domain knowledge/code for others to learn. But such code to be useful has to be good and has to reflect good programming practices. So, ~, please take time to actually let others teach you, before attempting to teach others yourself. Of course, not everyone is a good teacher (see my university C professors in the other thread), but the majority of people here are good teachers and, whenever they happen to be mistaken, they allow to be corrected (unlike you).

Recently, seeing how badly C is taught in my university, I really wanted to write a C book that actually describes at least the most common UB and IB cases, provides the differences between C89, C99 and C11, emphasises good programming practices, etc. But I realised I really am not yet good enough in C to do so. So I postponed this plan. You probably should do the same with your plans for now, you really don't seem to understand enough to implement a good <insert software here>, yet alone teach others how to implement <insert software here>.

I made this mistake twice: when I started compilerdev.org along with another member from here and when I started cpudev.org.

In the case of compilerdev.org, which lasted no more than a couple of months, we the founders didn't have much to say and we allowed Schol-R-LEA to do more than the 90% of contributions. I doubt you would allow that with your current attitude. Sadly, the co-founder messed up the server and everything disappeared (I really hope Schol-R-LEA isn't angry). Schol-R-LEA: I think I have some backups in case you want them.
In the case of cpudev.org, which has lasted for almost 5 months now and still exists (although mostly inactive), there was a group of 3 or 4 people including me that decided to make a #cpudev IRC channel. Then I founded a wiki, aiming to provide with some quick start tutorials (like Bare Bones on our wiki) with the intention to turn it into a full wiki with the help from other contributors. You however seem to refuse help, unless the contributors agree with you (and no one will agree with your bad programming practices).

In both cases, I can see in retrospect it was a bad idea. Just because I'm interested in something doesn't mean I can really teach it. So, if you haven't learned from your mistakes yet, at least learn from mine.

---

I sincerely expect ~ to acknowledge his weaknesses and try to fix them. I also expect the rest of us not to resort to such behaviour that benefits no one.

Regards,
glauxosdever

Schol-R-LEA · **Posted:** Sun Jan 14, 2018 8:31 am

glauxosdever wrote:

Definitely. I don't fight with ~ anymore, since it's been proven it accomplishes nothing. I instead went and reported yesterday some of his posts in the most recent topics to bring moderator attention. Furthermore, I don't think people joking about ~ or about hooking a bot into ~'s account or something is ok either, since it makes us look hostile (in the eyes of those who don't know about ~).

OK, I will admit that the snipe I made about the post being made by a chatbot was in poor taste. I wasn't seriously suggesting that someone had done that; I was simply commenting on how impenetrable ~'s post was, and comparing him to another poster (Trident), who was known for posts that were so hard to follow that they seemed to have come out of a Markov Chain text generator (we'd joked about that regarding him, too, back in the day).

I am still wiling to help ~ out, but so far, ~ hasn't shown any willingness to listen to advice. It is frustrating to see someone who seems to have good intentions and different ideas about things (even if I disagree with them), but is so stubborn about how they try to accomplish things.

Also, no hard feelings about the way compilerdev.org turned out. I would indeed be interested in the backups. Please contact me about them in PM.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

I've managed to implement more code for the main compiler loop. It actually has 3 loops (preprocessor, global declarations and local/function body declarations). Each loop has an outermost loop (mostly to stop each stage loop and to change/reopen files as directed by #includes and end of files), and an innermost loop, where the whole syntax tree will be, working as just a driver that calls the actual syntax OPCODE routines capable of interpreting/configuring/gathering/generating assembly code.

So I need to start by writing code to record the index of included source files and the line at which we need to switch to another, indicated, source file from index, where it was last left, so only ideas for the actual code, and writing/explaining code, will help at this point. I already have the idea of how to proceed, so I will simply add that code. By February the loops with the syntax tree skeletons should be well implemented only to complete/extend them towards the end of the year.

What I need to do next is writing a function that lets me see whether the first non-blank character in the current line is '#', and if so, see if it's an include directive, and then see if the file is enclosed in <> or "" to search for the included file in the corresponding compiler "include" or current source directories.

@Schol-R-LEA, you would probably thrive with cpudev.org if you dedicate this year entirely to a key topic on it, to be implemented as a practical thing to explain and use.

For example, I could suggest that you investigate personally how computers get to read ROM/RAM, and provide working schematics for ROM readers. For example, I find that modern computing at an understandable level by everyone starts with the Atari 2600, it has RAM, ROM, is simple and capable. So figuring out how the cartridges are read and building a working ROM dumper would be extremely valuable for learning how to access ever complex RAM/ROM in other architectures, but this one is so simple that anyone will be able to build it. Nobody said that you cannot sell a book about that and then put it in cpudev.org, or make YouTube videos about it and earn from that with Adsense, or both (books and YouTube/Adsense).

I've been investigating and building a socket for reading the 8 data lines and driving the address lines (11 or 12...) in the Atari 2600 cartridge, but I need to figure out how to do that using only parallel/serial ports, or looking for another way to build a simple cartridge dumper that can dump directly to the PC, simple in the sense that it's possible to see how the dumper is implemented from scratch to later learn how to build similar ROM memory controllers, not just using Arduino with prebuilt components since it seems that an Atari 2600 ROM dumper can be made with much less electronically.

iansjack · **Posted:** Sun Jan 14, 2018 11:24 am

~ wrote:

What I need to do next is writing a function that lets me see whether the first non-blank character in the current line is '#', and if so, see if it's an include directive, and then see if the file is enclosed in <> or "" to search for the included file in the corresponding compiler "include" or current source directories.

Well that's a good five minute's work. You could have done that rather than making your last post.

OSDev.org

Public Domain C/C++ Compiler

Who is online