CompilerDev Wiki/Forum?

Schol-R-LEA · **Posted:** Sat Mar 26, 2016 12:28 pm

embryo2 wrote:

Schol-R-LEA wrote:

Even as rushed and truncated as this was, most of the students were lost after week one.

They lack the foundation. And all the stuff without sound base just bury them deep enough. That's why they won't move further in the field.

Yes, but you have to remember that most of the students taking the course are never going to work on compilers anyway; an undergrad course may seen one student in ten years who is actually looking to write a real compiler. The course is primarily geared towards students trying to understand enough about compilers to work with them effectively and understand what they are doing, not ones going into compiler development themselves, and most of them, if they ever use the techniques learned in the course at all, will use them in a context far removed from compiler and language development (e.g., applying a FSM as a recognizer in a webservice protocol implementation).

The reasons compiler dev is taught as a practical course but OS dev isn't are,

it is just barely feasible for an individual student to write a simple recursive-descent expression parser in the time allowed for an undergrad course, whereas even just one part of an OS (say, the scheduler) would take concerted effort by the whole classroom with no time for anything else (note that most 'practical' OS courses aren't writing an OS, they are studying and tweaking the source code of an existing one)
operating systems, even at the kernel level, have a lot of entry and exit points in their interface, so it is possible to show how they are used in order to elucidate how they work, whereas compilers are more or less black boxes from the users' perspective
teaching compiler development is more amenable to a linear, step by step approach than OS dev is
the existence of a solid theoretical grounding for two parts of compiler design (lexical analysis and parsing) which, while tricky, can be understood by novice programmers, means that it is feasible to have the code written by the students, whereas the parts of OS dev that have any mathematical grounding at all (scheduling, primarily) are too esoteric for a student not looking to work in the field to be expected to bother with
The techniques used in lexical analysis and parsing have a lot more applications in other areas than, say, scheduling or memory allocation.

embryo2 · **Joined:** Wed Jun 03, 2015 5:03 am **Posts:** 397

alexfru wrote:

problem of selecting the optimal jump/branch instructions (when you have a few of different range and therefore length) is actually an NP problem in the general case. So, every assembler either limits the number of labels and jumps/branches (by trying every possible option and therefore "hanging" on large inputs) or at best produces suboptimal object code. And we haven't got anywhere near compilers yet. The battle is already lost in assemblers.

But what is the problem in case of compilers? It is absolutely possible to include the jump optimization part in a compiler. And if such kind of optimizations is also quite natural for assemblers it doesn't mean we have to stop using this optimization in compilers.

embryo2 · **Joined:** Wed Jun 03, 2015 5:03 am **Posts:** 397

Schol-R-LEA wrote:

but you have to remember that most of the students taking the course are never going to work on compilers anyway

In fact there are a lot of areas where people will almost never dig deeper than is outlined on a course. The problem is to select the best combination of such areas to help students to understand the environment where they will be working in the future. And part of a good understanding is the outline of the whole picture. In case of compilers today the outline is incomplete and can be defined as "there are many ways and you'll need a lot of mathematics just to understand a few of them". But there's usually no explicit declaration of the problem's state.

Schol-R-LEA wrote:

The reasons compiler dev is taught as a practical course but OS dev isn't are,

it is just barely feasible for an individual student to write a simple recursive-descent expression parser in the time allowed for an undergrad course, whereas even just one part of an OS (say, the scheduler) would take concerted effort by the whole classroom with no time for anything else (note that most 'practical' OS courses aren't writing an OS, they are studying and tweaking the source code of an existing one)

In fact the scheduler is not a rocket science at all. It's like the simple parser the students write every year - at a similar level of complexity the scheduler is absolutely feasible for being implemented by every student. There's no requirement of going into special hardware issues or something like multiprocessor support and it means the complexity of such scheduler is absolutely acceptable. Also it is applicable to every part of an OS the students can implement. Just limit the complexity level and you'll have a sound course of OS design. But do not forget to outline the road map ahead - give the students an impression of what expects them if they are going to implement their own OS.

Schol-R-LEA wrote:

operating systems, even at the kernel level, have a lot of entry and exit points in their interface, so it is possible to show how they are used in order to elucidate how they work, whereas compilers are more or less black boxes from the users' perspective
teaching compiler development is more amenable to a linear, step by step approach than OS dev is

I'm afraid it's the level of efforts required to split a typical compiler into a manageable set of components that prevents lecturers from showing usage examples in a way similar to the OS case.

Schol-R-LEA wrote:

the existence of a solid theoretical grounding for two parts of compiler design (lexical analysis and parsing) which, while tricky, can be understood by novice programmers, means that it is feasible to have the code written by the students, whereas the parts of OS dev that have any mathematical grounding at all (scheduling, primarily) are too esoteric for a student not looking to work in the field to be expected to bother with

The fundamental output from the OS study is the skill of modular thinking, that gives you an ability to split every simpler system into a set of manageable components. But such skill should have a serious emphasis during the education process or else there will be a lot of low level Linus Torvalds'es who just hate any mention of a component and related architectural issues.

Schol-R-LEA wrote:

The techniques used in lexical analysis and parsing have a lot more applications in other areas than, say, scheduling or memory allocation.

Can you provide some examples of such usage? The link between the parser and the scheduler, for example.

alexfru · **Joined:** Tue Mar 04, 2014 5:27 am **Posts:** 1108

embryo2 wrote:

alexfru wrote:

problem of selecting the optimal jump/branch instructions (when you have a few of different range and therefore length) is actually an NP problem in the general case. So, every assembler either limits the number of labels and jumps/branches (by trying every possible option and therefore "hanging" on large inputs) or at best produces suboptimal object code. And we haven't got anywhere near compilers yet. The battle is already lost in assemblers.

But what is the problem in case of compilers? It is absolutely possible to include the jump optimization part in a compiler. And if such kind of optimizations is also quite natural for assemblers it doesn't mean we have to stop using this optimization in compilers.

Let's go back a little to where you expressed discontent with the lack of theory to arrive at the optimal solution:

embryo2 wrote:

It's craft and art mostly. There's no sound foundation like derivative based optimization in mathematics for example. A bunch of hints and many graph processing algorithms. It's like many different bullets for different guns, but there's no theory how to get an optimal bullet for the target. Trial and error, heuristics and other black magic.

It doesn't matter whether jump/branch optimization remains in an assembler (possibly standalone assembler) or is integrated into a compiler. Moving the problem around doesn't change its nature. The point I'm making is that a compiler as a whole (including everything required to make an executable out of source code, e.g. the assembler, the linker, etc) needs to solve problems which have no optimal solutions in practice. I'm talking about NP problems. And there are more than this one. So, inevitably you end up having suboptimal solutions. With heuristic, trial and error, black magic, craft, art, you name it. I don't see this much different from some math problems having no closed form solutions. You can still solve them, but for that you either simplify/redefine them or use numerical methods, both of which results in suboptimal or inexact solutions.

Another important thing to note is that the optimal solution can only be obtained when we have defined the problem completely. IOW, a compiler that can produce perfect executables for a given target environment (hardware + OS + other software running at the same time) must know (pretty much?) everything about said environment. And then to make use of that info one would need to recompile software of interest from source code. However, as end users we don't do it often. We get pre-compiled OSes, pre-compiled compilers, pre-compiled browsers, etc and use those.

Oh, did I say that a compiler would also need to understand what it compiles?

But maybe we don't need the optimum code every time all the time after all? Maybe we're OK with the 2x or 5x or whatever perf gains that an optimizing compiler gives us despite never generating perfect code?

OSDev.org

CompilerDev Wiki/Forum?

Who is online