OSDev.org

The Place to Start for Operating System Developers
It is currently Sat Dec 16, 2017 2:48 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 8:57 am 
Offline
Member
Member
User avatar

Joined: Sun Oct 22, 2006 7:01 am
Posts: 2562
Location: Devon, UK
Wajideus wrote:
Everything you've said makes absolutely no sense at all. To begin with, there are a bunch of open-source C compilers out there. There's literally no reason to write one at all aside from personal ego.


I feel that I need to at least defend "~" on this point. After all, we are on a forum dedicated to hobby OS development for which you could make a similar argument.

I find the other criticisms in this thread to be pretty valid, however.

Cheers,
Adam


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 9:55 am 
Offline
Member
Member

Joined: Thu Nov 16, 2017 3:01 pm
Posts: 31
Quote:
I feel that I need to at least defend "~" on this point. After all, we are on a forum dedicated to hobby OS development for which you could make a similar argument.

I find the other criticisms in this thread to be pretty valid, however.


I would make the same argument to any people working on a hobby OS. In fact, I already have:

Wajideus wrote:
I honestly just don't see the point in creating Unix again. There are already enough Unixes. Lets quit rubbing our crotches for a second here, and look at where we've made mistakes so we can make improvements.
(From the TumuxOS Post)

It makes absolutely no sense to reimplement something for the umpteenth billion time. Especially when there are already open source implementations of it available. If you're going to go out of your way to design a programming language, write a compiler, or make an OS, at least try to do something new and/or different.


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 11:03 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1040
Location: Athens, GA, USA
AJ wrote:
Wajideus wrote:
Everything you've said makes absolutely no sense at all. To begin with, there are a bunch of open-source C compilers out there. There's literally no reason to write one at all aside from personal ego.


I feel that I need to at least defend "~" on this point. After all, we are on a forum dedicated to hobby OS development for which you could make a similar argument.


I don't think that Solar has any problem with that; he basically said so in the next paragraph, even if he was a bit dismissive (mostly regarding ~'s ability to accomplish the task, I suspect, rather than the idea itself).

I think that you missed the parts where ~ was saying that he was writing one because no suitable one exists, and when several examples were mentioned, he never acknowledged them or said what he thought 'suitable' meant. ~'s statements in this thread and others make it clear that he doesn't see it as just him climbing the mountain because it is there.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 12:17 pm 
Offline
Member
Member
User avatar

Joined: Tue Mar 06, 2007 11:17 am
Posts: 1032
Solar wrote:
~ wrote:
I only answered Schol-R-LEA about which data structures I was going to use to parse the code.


Only, you didn't.

Schol-R-LEA explicitly asked you about the token data structures.
I will simply make a file called "elements.dat" that will contain a long at the start with the number of elements and the sequence of elements as they appear on the whole code, their type (start of line, end of line, preprocessor, keyword, identifier, number, operator, blank space block, comment block, string, open parenthesis, close parenthesis, backslash, asterisk...). I need a main loop with helper functions capable of recognizing every element with precedence (spaces, comments, preprocessor, strings, numbers, keywords, identifiers...). They have to record the start and end of each element, then a tree of IFs will call a specialized program only for each element to fully process it separated from the rest of compiler/language elements. The end of the processing of an element or set of elements results in default assembly to then write to the assembly to assemble with NASM. You will be able to see the on-file structure array to handle #includes because that's what I need to implement now.



Solar wrote:
Linking is also suspiciously absent from your deliberations.
This is why I made pure assembly skeletons for PE EXE and DLL. The compiler is supposed to be capable as to produce assembly only instead of producing object code to link even for the executable file formats. In the meantime I can use the capabilities of NASM to produce Linux, DOS or Win16 binaries, but it's something that the compiler should do when we learn how. It's aimed to produce raw assembly code for raw binaries, so if I have executable skeletons or NASM, producing PE EXEs or the like will be something doable.

____________________________________
____________________________________
____________________________________
____________________________________
Compilers could be made simple to compile using only plain C so that they can be compiled with any compiler available but providing additional language support to the producing compiler.

The code from the latest GCC or other compilers could be inspected and made fully portable to any producing compiler for old OSes/machines, but it would need to be massively rewritten in plain C, and still modify it more to support extensions from all other compilers to have a potent tool that doesn't divide the language just because of compiler brands.

So we only have two main choices:

- Modify existing compilers (practically the same as knowing how to develop one from scratch).

- Write a compiler from scratch gradually as we find existing good code to use/clean so it compiles anywhere.

- Maybe write a set of libraries private only to GCC based only in the most basic OS/system features so that the latest GCC truly compiles anywhere. It's a little more than a text processor, so it shouldn't be so difficult to make OS independent.


In any case, if we manage to reimplement all or modify the code towards old C, we will be making the very same job of cleaning up existing software technology to make it freely accessible because modern software, libraries and modern C/C++/Java/JavaScript languages currently are only accessible to half-decade-old NT and UNIX.

The language standards don't move nearly as fast as the rest of software, so if we have a compiler written in plain C that recognizes all existing compiler extensions, we will break the trap of having to use only the latest OS releases just to be able to port/run applications because they are written in the newer language versions.

_________________
Image http://www.archefire.org/_PROJECTS_/ (udocproject@yahoo.com)

YouTube Development Videos:
http://www.youtube.com/user/AltComp126/videos

Current IP address for hosts file (all subdomains):
190.150.9.244 archefire.org


Last edited by ~ on Wed Nov 29, 2017 12:32 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 12:28 pm 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 3027
Location: Chichester, UK
Quote:
It's aimed to produce raw assembly code for raw binaries
So, no ability to use libraries. That's going to be a little restrictive.


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 12:37 pm 
Offline
Member
Member
User avatar

Joined: Tue Mar 06, 2007 11:17 am
Posts: 1032
iansjack wrote:
Quote:
It's aimed to produce raw assembly code for raw binaries
So, no ability to use libraries. That's going to be a little restrictive.
The code can use includes for libraries. The EXE skeletons or NASM can be provided import data. It would be the same as building a project file/compiler makefile.

Producing assembly only makes easy to link if desired.

The difference is that it wouldn't use additional IDE/suite layers beyond the actual file formats, library data, etc. It's clearer at least for OS development and is portable since it's dealing with the knowledge of the raw formats, including more library skeletons as more programs are compiled without errors.

_________________
Image http://www.archefire.org/_PROJECTS_/ (udocproject@yahoo.com)

YouTube Development Videos:
http://www.youtube.com/user/AltComp126/videos

Current IP address for hosts file (all subdomains):
190.150.9.244 archefire.org


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 1:12 pm 
Offline
Member
Member

Joined: Thu Nov 16, 2017 3:01 pm
Posts: 31
iansjack wrote:
Quote:
It's aimed to produce raw assembly code for raw binaries
So, no ability to use libraries. That's going to be a little restrictive.

I'm not sure what he's doing, but I had thought of doing this before in my compiler to segregate the assembly language from the object file format. To pull it off, I had planned on dumping the relocations, symbols, and load mapping to separate files. The relocation file would be a sort of binary diff format, the symbol file would be something like an ini, and the load mapping would be something like a linker script.

There's a couple of neat things you can do with this, like streaming assembly code to stdout and piping it into a virtual machine; or consolidating load maps into a table for a class loader (something I planned on doing for loading and unloading actors in a game engine).


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 2:21 pm 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 3027
Location: Chichester, UK
I don't quite understand how you can use dynamic libraries if your output is raw binary. How does relocation work without relocation information?


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 5:54 pm 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1040
Location: Athens, GA, USA
~ wrote:
Solar wrote:
Schol-R-LEA explicitly asked you about the token data structures.
I will simply make a file called "elements.dat" that will contain a long at the start with the number of elements and the sequence of elements as they appear on the whole code, their type (start of line, end of line, preprocessor, keyword, identifier, number, operator, blank space block, comment block, string, open parenthesis, close parenthesis, backslash, asterisk...).


OK, I can see that this is indeed a table of the tokens... sort of... but why are you putting it in a file? Unless there is a memory crunch - and generally speaking, even large programs won't eat 64KiB for their symbol tables, and even in real mode I think you can spare a whole data segment for something this important - there is no reason to for a C compiler to save it to a file, and every reason for it to keep it as a tree in memory, unless you intend to have the lexer and the parser as separate programs with no sharing of memory.

Such compiler designs have existed in the past; the Microsoft Pascal and Fortran 77 compilers for MS-DOS, circa 1983, comes to mind. But they designed it that way to accommodate the CP/M version of the compiler, and retained it for the first few MS-DOS versions to allow it to run on 64KiB IBM PCs; even by 1983, those were only a small fraction of PCs, with newer ones shipping with at least 256KiB and many IBM PC/XT and Compaq Deskpros already hitting the 640KiB limit (in fact, memory-hungry programs such as Lotus 1-2-3 were already running into problem with that limit, and in 1984 both bank-switched Expanded Memory for 8088s, and Extended Memory for 80286s, were introduced to get around it).

It wasn't really necessary even before that, though. After Turbo Pascal came around in late 1983 - a single-pass, all in memory compiler that ran in 64KiB under both 8080 CP/M and 8088 MS-DOS even including it's simple full-screen text editor, and which blew the older multi-pass compilers away in terms of speed and useful error messages - that technique vanished even in the 8-bit world (as there were still plenty of Apple //es and IIcs, and Commodore 64s, still in use).

Why you think that bringing that approach back is a good idea isn't at all clear to me, unless you are actually talking about for the program listing rather than the symbol table, in which case my question becomes, why are you talking about that instead of the symbol table and the in-memory Token struct/class?

~ wrote:
I need a main loop with helper functions capable of recognizing every element with precedence (spaces, comments, preprocessor, strings, numbers, keywords, identifiers...).


In other words... the tokens. And yes, you would definitely need a set of helper function for this - specifically, a set of functions that combine to form a lexical analyzer.

Also, you would usually handle precedences later, in the parser. More on this later.

~ wrote:
They have to record the start and end of each element, then a tree of IFs will call a specialized program only for each element to fully process it separated from the rest of compiler/language elements.


In other words, a lexical analyzer, specifically an ad-hoc lexer. This is indeed one of the things I was talking about, though I get the sense that you don't know all of the English names for these things, which may be part of the problem we are having. I can't tell whether this is due to a communication problem (from your README file on Archefire, I gather that your native tongue is Spanish, and I get the impression that your English isn't particularly strong - though if this is so, then I have to say your writing is still better than many native English speakers), or because you haven't read up on the existing techniques, or both, and I am willing to give you some benefit of the doubt on this.

~ wrote:
The end of the processing of an element or set of elements results in default assembly to then write to the assembly to assemble with NASM. You will be able to see the on-file structure array to handle #includes because that's what I need to implement now.


OK, now this is a worrying statement, because it sounds as if you are skipping a few steps. My impression is that you are combining three roles - the lexer, the parser, and the code generator - by doing substring matches on the input stream, and structuring the parser so that it is calling the matching function repeatedly against the input strings, walking through the set of possible matches, and then working from there until you collect the end of the expression (what in parsing is called a 'terminal' of the grammar), at which point you output a stream of one or more lines of assembly code.

It is entirely possible to do it this way - it is how the original versions of Small C did it - but you seem to be missing some details as to how you can make that approach work.

The approach in question is called 'recursive decent parsing', a type of top down Left-to-right, leftmost derivation parsing. It is an old, tried, and true method for writing a simple compiler by hand, and is the starting point for almost very compiler course and textbook that doesn't jump directly into using tools like flex and bison. It was developed in the early to mid 1960s, and was probably first investigated by Edsger Dijkstra some time before his 1961 paper on the topic; a number of others experimented with it arond that time, and Tony Hoare seems to have been one of the first to write a complete compiler in that fashion, the Elliot Algol compiler.

In the early 1970s, Niklaus Wirth popularized it for use in the first formal compiler courses, as a method that was easier to use when writing a parser by hand with than the earlier canonical LR parsing method that was developed in 1965 by Donald Knuth (canonical parsers, and bottom-up parsers in general, require large tables to represent the grammar, and are an unholy nightmare to develop entirely by hand, but they are much more efficient that R-D parsers and are well suited to automatically generating the parser).

Recursive-descent works pretty well... for small projects done by hand. It is where just about everyone studying compilers starts out, and I can't fault you for going that route... except that I am not sure if you really understand it yet, as I get the impression that you still haven't read up on formal compiler design yet.

This is almost certainly a mistake. Lexical analysis and parsing are, far and away, the best understood topics in the entire field of computer programming, with the possible exception of relational algebra, and they have uses far beyond compilers and interpreters. Notice the dates I quoted - most of them are from over 50 years ago. These are topics that academic computer scientists and working programmers alike understand better than anything else, and the techniques for it a varied, effective, and solid.

If you don't at least try to learn more about the prior art before tackling writing an actual compiler, even a toy one, then you are doing yourself a disservice.

Maybe I am wrong, and you are simply having trouble expressing what you are doing in a foreign language. But you have to understand that we are trying to help you, trying to give you what we consider the best advice possible.

It's dangerous to go alone - take this! hands ~ a copy of the free PDF version of Compiler Construction by Wirth

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Last edited by Schol-R-LEA on Wed Nov 29, 2017 11:22 pm, edited 9 times in total.

Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Wed Nov 29, 2017 10:23 pm 
Offline
Member
Member

Joined: Thu Nov 16, 2017 3:01 pm
Posts: 31
iansjack wrote:
I don't quite understand how you can use dynamic libraries if your output is raw binary. How does relocation work without relocation information?


there's still relocation information, it's just in a separate file. basically all a relocation is is just a tuple of a pointer into the raw assembly code and an algorithm for decoding and encoding the offset from a base address (usually the load address).


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Thu Nov 30, 2017 8:59 am 
Offline
Member
Member
User avatar

Joined: Tue Mar 06, 2007 11:17 am
Posts: 1032
@iansjack, At least for PE DLLs, relocation is currently left to the OS via internal paging. The only detail is that the DLL has a high base address that is very generic and common to other normal DLLs so it has the chance to virtualize that address. I've tested it with my skeleton DLL/EXE and it can load as many instances of the program as desired without failing:

For other formats like ELF I will have to use NASM at this point, so I have to learn the actual details of relocation so that I can generate them from the compiler.

You can inspect the base address used for the skeleton PE DLL and EXE here. The one from the DLL is much higher in this case:
http://sourceforge.net/projects/x86winapiasmpedllskeleton32/files/



@Schol-R-LEA, One reason to put structure arrays in files and load elements individually in memory is that I also want the compiler to serve as a "source code explainer", which can be able to display the list of all functions in order, the list of global variables, and solve them from their custom data types down to generic-only C types as well as down to their assembly code for the target platform. It will be very informative.

Another reason is that I suspect that there could be a point where the programs will be so big that I will have to free and reload certain source files to hold their identifiers in memory.

In any case, the compiler is full of wrapper functions that are opcode-like, so they can easily be rewritten internally if memory is to be used later. In any case I will have to write the data to disk for being able to study the pieces that the source code is using (list of files, identifiers, variables, functions...).

_________________
Image http://www.archefire.org/_PROJECTS_/ (udocproject@yahoo.com)

YouTube Development Videos:
http://www.youtube.com/user/AltComp126/videos

Current IP address for hosts file (all subdomains):
190.150.9.244 archefire.org


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Thu Nov 30, 2017 9:51 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1040
Location: Athens, GA, USA
~ wrote:
@Schol-R-LEA, One reason to put structure arrays in files and load elements individually in memory is that I also want the compiler to serve as a "source code explainer", which can be able to display the list of all functions in order, the list of global variables, and solve them from their custom data types down to generic-only C types as well as down to their assembly code for the target platform. It will be very informative.


I still think you don't really get what I am trying to tell you. This approach simply isn't sufficient. Source code in a high-level language can't simply be matched to a string of opcodes - you actually need to parse it. I am not seeing any evidence so far that you even understand what that means, or that you need to start with at least a basic defined grammar, in some notation such as Backus-Naur Form or Railroad Diagrams, in order to know how to parse it.

In fact, this talk about 'structure arrays' makes me wonder if you even know the sorts of basic data structures you will need - this is something which really calls for a tree, as an array is simply too monotonic to really be suitable (you can use one, but you'd be wasting a lot of time and effort doing so). At the very least, if you insist on using a linear data structure, a linked list would make more sense, as it is a lot better for handling data whose size isn't know ahead of time, and is a lot less likely to waste memory than a fixed-size array (or even a dynamic array).

My overall impression so far is that you have lemon juice all over your face, and don't realize that it isn't hiding anything.

But again, that impression on my part could be simply because you haven't made enough of your knowledge clear to us for me to judge it. I at least am aware that I am ignorant of how much you actually know. If you do know these things already, I would appreciate it if you demonstrated that knowledge, because so far, all you've shown is ignorance of a particularly pig-headed, stubborn, and utterly willful sort.

I am going to give some free advice again (relating to a PM conversation I had on this subject with someone else here, actually). Both MIT Open Courseware and Stanford Open Classroom have free video courses and e-texts on their websites that cover compiler theory and design, and while I will admit that I haven't gone over their courses, I intend to - I want to see which one I think is the better one so I can be more focused in my recommendations, and besides, the topics are deep enough that you can almost always learn something new from a different presentation of them.

If you don't feel comfortable taking such a course in English, you could try taking one in Spanish, or some other language which you feel more comfortable with (I don't know what languages you know other than Spanish and English). I would expect that there is at least one Spanish-language online course on the topic, and would guess that there are several around if you look.

But the point is, you don't seem to know enough about even the basics of this to proceed as things are. You need to Get Good, and for this topic, that means you need to study.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
 Post subject: Re: Public Domain C/C++ Compiler
PostPosted: Fri Dec 01, 2017 1:09 pm 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1040
Location: Athens, GA, USA
Sorry for the double post, but I didn't want to edit the previous one in case ~ had already read it.

I was re-reading some recent articles on The Daily WTF, and came across one which I think should serve as a warning to ~ about where his current approach appears to be going: Theory vs. Reality

TL;DR: in this case it is the lack of applying theory to reality that is the problem - specifically, not considering beforehand whether there might be a approach that was better suited to the task using an appropriate data structure - which is why I thought it apt.

@~: Again, I don't know if my impression is correct or not, but based on what you have said about how you are going about this project, you will soon find yourself drowning in ad-hoc code, just as the storyteller did, and at that point, you will very likely end up having to scrap your current approach and reconsider it in much the same manner. All I and Solar are trying to do is save you the trouble.

Your choices are these:
  • Follow our advice by setting the code aside for a while; study up on the known solutions; choose the ones you feel would work best for your goals; plan out a design for the compiler; then get back to writing it only after all of that; or,
  • Try out your current approach, fail, then do everything I just said.

This isn't something where you can wing it and expect success. No matter how long reading up on the topic might take, trying to do it without reading up on it first will certainly take longer.

The only advantage to how you are doing it now is that, by failing at it, you will be forced to learn that lesson in a very forceful and painful manner, though at least you can expect the lesson to stick ("Experience keeps a dear school, but fools will learn in no other." - Benjamin Franklin). The choice, at this point, is yours.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
μή εἶναι βασιλικήν ἀτραπόν ἐπί γεωμετρίαν
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group