Opinions On A New Programming Language

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

I've been toying with a new programming language inspired by C, D, Swift, Lisp, and Golang. It's not really intended to be a new competing language, I just couldn't find one that fit the scope of the problems I was interested in. Below is a snippet of the syntax:

Code:

// type definition
typedef tval = struct {
    type: int;
    value: unif {
        i: int;
        f: float;
    }
}

// equivalent to `const foo = struct { ... }`
struct foo {
    elem1: cstr = "bar";
    elem2 = "baz";       // type inference
}

// equivalent to `const main = func(argc: int, argv: [cstr]) -> int`
func main(argc: int, argv: [cstr]) -> int {

    var x = 5;

    var y: *int = &x;            // '*int' is equivalent to 'ptr -> int'
    var z: int32ptr -> int = y;  // custom sized relative pointers

    // array slicing and iteration
    for argv[1..argc] -> arg {
        puts(arg);
    }

    return 0;
}

/* A '$' defines a pattern, which gets substituted
\* In this case, we're substituting types
func add(val1: $T, val2: T) -> T {
    return val1 + val2;
}

add(5, 3);    // add(val1: int, int) -> int


// A '?' in the place of the return type is shorthand for '-> bool'
func valid(x: int)? {
    return x > 0 && x < 10;
}

Pretty much the same as any other C-based language, so no need to bother getting too technical with the examples. Feature-wise:

From D, I want to borrow the notion of UFCS (Uniform Function Calling Syntax). The first parameter of a function determines what type of object that the function is associated with.
From Swift, I'm borrowing a bit of the syntax and plan on implementing optional types. I also want to dabble a bit with memory ownership.
From Lisp, I want to borrow the idea of being able to modify the syntax tree of the source code before it's evaluated or compiled.
From Golang, I want to borrow CSP (Communicating Sequential Process) channels, structural typing, the error handling mechanism, and deferred statements
From C#, I'd really like to be able to borrow the ability to define custom attributes and support type introspection and/or reflection; but I haven't really set the type system in stone yet, and I want it to actually be useful (unlike C++'s RTTI). Properties are a bit iffy.

Some other differences from C include:

There is no preprocessor, and thus, no header files. Each source file is considered to be an independent module which can be imported.
Instead of "union", there's "unif", and instead of "const char *", there's "cstr" (which is guaranteed by the compiler to be interned)
Enums are namespaced, unless they're anonymous
Structs will be able to be 'flattened'. That is to say you can place a 'vec3' struct inside of a 'vec4' struct in such as way that the members of the 'vec3' struct can be accessed transparently. I originally planned on using an anonymous label-based syntax for this, but I'm discontent with the idea, so it's back to being a WIP.
You'll be able to import the namespace of a struct into the body of a function, so you don't have to constantly use indirection. (Kinda like the 'with' keyword in VB, but without creating a new block scope.
The ability to execute arbitrary code at compile-time so you can compile resources, pre-compute expensive stuff, perform tests, or whatever else suits your fancy. I'm actually considering working at this from the opposite perspective, such that the language is interpreted (JIT-compiled) directly from source by default, but the compiler would be a library and since the language can manipulate it's own syntax trees, it can compile itself statically.

Like C, there will be no built-in memory management. Ideally, you should be able to compile for a freestanding environment. After all, I intend to write a kernel in this language eventually.

If you're wondering why there's no classes, it has nothing to do with some bold anti-OOP philosophy. On the contrary, I am very much for OOP, but there are a few stipulations:

I think static classes and singletons are a nonsensical way of trying to create a module. If you're going to divide a huge singleton into several partial classes split amongst multiple files based on groups of functionality, just let files be modules in the first place.
Don't mix virtual and nonvirtual elements in a class. Let the whole class be virtual, or let it not be virtual at all. Mixing the two just creates problems in the future when the implementation of a class needs to change. In this sense, a class is nothing more than an interface. I've considered an 'interf' keyword to act as a virtual struct, but the idea is still a WIP.
Classes should only contain fields, not methods; unless the methods are delegates. This has to deal with the fact that most functionality is static, and requiring all of the static functionality to be placed inside of the class makes it annoying or impossible to extend the class with more functionality later. This is ultimately why I opted for UFCS.

As far as licensing goes, I honestly don't care much. Not looking for kudos or cash. Public domain is fine as long as no one patents the darn thing so I can't use something I personally invested years of my life into designing.

EDIT:
A small update, the pattern substitution is now regressed, because I decided that semantically, structs and arrays should be passed to and from functions by reference by default; and that the `$` symbol should be used to override that.

Code:

func A() {
    var a1 = [1, 2];
    var a2 = [1, 2];
    B(a1, a2);
    // a1 = [3, 4]
    // a2 = [1, 2]
}

func B(a1: [int], a2: $[int]) {
    a1 = [3, 4];
    a2 = [3, 4];
}

Above, you can see what I mean; 'a1' is passed by reference so it's state gets mutated by B, where 'a2' is passed by value, so it doesn't. Ultimately, there will be a lot less noise in the source code than there is in C.

Well, that's the gist of it. What are your thoughts on the design and it's applications?

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

Wajideus wrote:

[*] Don't mix virtual and nonvirtual elements in a class. Let the whole class be virtual, or let it not be virtual at all. Mixing the two just creates problems in the future when the implementation of a class needs to change. In this sense, a class is nothing more than an interface. I've considered an 'interf' keyword to act as a virtual struct, but the idea is still a WIP.

I tend to mix non-virtual and virtual methods often in C++ quite often. The mixing allows specialized virtual methods for the particular class as well as generic ones that are non-virtual. Non-virtual methods ensures that the compiler can optimize better by inlining. This is impossible with a fully virtual class.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

OSwhatever wrote:

Wajideus wrote:

[*] Don't mix virtual and nonvirtual elements in a class. Let the whole class be virtual, or let it not be virtual at all. Mixing the two just creates problems in the future when the implementation of a class needs to change. In this sense, a class is nothing more than an interface. I've considered an 'interf' keyword to act as a virtual struct, but the idea is still a WIP.

I tend to mix non-virtual and virtual methods often in C++ quite often. The mixing allows specialized virtual methods for the particular class as well as generic ones that are non-virtual. Non-virtual methods ensures that the compiler can optimize better by inlining. This is impossible with a fully virtual class.

At some point, the architecture of the system changes; and then you end up breaking a whole bunch of code that depends on those generic non-virtual methods because you can't replace the implementation to maintain backwards compatibility. In short, non-virtual methods are high-coupling abstractions.

This is one of those situations where C++ tends to get people into the mindset of optimizing the wrong things.

EDIT:
I should reaffirm that I'm not against physical methods, my opinion is merely that you should choose between low-coupling (virtual structs) and high-coupling (physical structs) rather than mixing the two; because a single physical member in virtual struct introduces a vector for future problems.

EDIT #2:
I'm currently considering the notion of struct/interf keywords being used as an annotation to describe whether or not to physically or virtually modify an object rather than making it part of the type itself. For example:

Code:

func method1(o: object struct);

func method2(o: object interf);

both methods take an 'object' as their first parameter; however, 'method1' takes the physical object and 'method2' takes the virtual object. by default, omitting the 'struct' and 'interf' keywords should probably pass the object physically, but that's debatable.

Solar · **Posted:** Mon Nov 20, 2017 1:49 am

Wajideus wrote:

This is one of those situations where C++ tends to get people into the mindset of optimizing the wrong things.

One of the things I always liked in C++, and very much disliked in a great many other languages, is this whole "getting into a mindset" thing.

I understand why people would want an easy-to-learn language that makes it difficult to make silly mistakes.

The problems arise when you're past the "silly" stage and find that the language ties one hand behind your back. Or, worse, you get past the "silly" stage and don't realize one hand is tied behind your back.

As Bjarne Stroustrup put it, "C++ is not aspiring to be a beginner-friendly language, but an expert-friendly language".

I don't say that your language is bad if it takes some architectural decisions away from the developer. I am just saying that you should be open about it, and realize that every thing has two sides.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

I'm currently trying to implement a C/C++ compiler that can correctly compile for those languages indistinctly.

COMPILER-2017-11-08start.zip

I've come to the conclusion that we need to start by writing a simple set of compilers starting with C and C++ compilers, because if we don't, then we won't be able to communicate with the rest of programmers who base the rest of languages in engines written in C/C++.

You can help me write this public domain compiler. You can learn how to make the functions of the compiler and you can use the produced functions for any other compiler. You can read how I'm implementing the code itself here (by now it's only in Spanish but you can roughly translate it with Google and figure out what the code does).

I explain step by step what to add in the code and exactly where, every time I add new functionality to complete the compiler:

Source code-only topic:
http://devel.archefire.org/forum/viewtopic.php?t=2362

Theory-only topic:
http://devel.archefire.org/forum/viewtopic.php?t=2359

______________________________
______________________________
______________________________
______________________________

If we write our own compiler we will be able to immediately use the existing open source code as a starting point through wrapper libraries, and replace the code gradually with our own functions.

Because as compilers are right now, if we don't learn to build our own compilers we will be in the same situation as if we didn't know how to speak any language. We really need to know how to parse source code by ourselves. The code is there, we just need to learn the algorithms to interpret and convert it to assembly/machine code, just like when we interpret a ZIP file or an image file format. It's really just a file format just like XML.

It's like metadata that tells us how to create the binary, nothing more.

It has to be capable of taking a main source file and just produce assembly code for the target CPU, for example NASM code for x86, so we would need a specialized separated version of the compiler and header files for each CPU archiecture, and with that we could define assembly code to build the structure of the executable itself.

Solar · **Posted:** Mon Nov 20, 2017 10:37 am

You do realize that the contents of that archive you've posted will, basically, scare away everybody actually competent enough to help you with that not-very-well-defined endeavor of yours?

Please don't hijack Wajideus' thread.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

I think that I could help the user who is asking for help, although he might have not much experience, but at least even the non-programmer brain knows at all times how to formulate a solution in sequence, it always knows what step should be the next even if not using programming to solve a problem, writing a compiler in this case.

So that's where we could help each other: Just telling him/her what step to do next, and helping to implement it if possible.

You can review the code and the code documentation built piece after piece as the compiler is structured and you will see that it's very usable for any new compiler project.

I've come to the conclusion that the right thing to do would be to start by writing a C/C++ compiler, and from there write any other language.

It will allow to immediately use any code written in C/C++, which is what we often want to make a project launch very fast by using existing libraries.

I know that when somebody tries to write a C derivative asking for help, it's because they don't know C languages very well, so they try to make their programming life easier and happier with a simpler new language derived from there.

But I see that after some time, the best choice is to write the C/C++ compiler and then any other language and it's much easier too if we remember that we will now be enabled to use existing code as easily as we want to make it.

Schol-R-LEA · **Posted:** Mon Nov 20, 2017 12:03 pm

Solar wrote:

You do realize that the contents of that archive you've posted will, basically, scare away everybody actually competent enough to help you with that not-very-well-defined endeavor of yours?

That hardly matters, since archefire.org is down more often it is up - most people wouldn't even be able to access the file in the first place, and would conclude that ~ is either incompetent or a troll.

However, you have piqued my curiosity, so let me unblock these two posts from ~ and see what... TF?

~ wrote:

Because as compilers are right now, if we don't learn to build our own compilers we will be in the same situation as if we didn't know how to speak any language. We really need to know how to parse source code by ourselves.

I am a Lisp dev, from a world where codewalkers and HOFs are as natural as breathing, and writing interpreters and compilers for the language itself, and designing and implementing domain-specific languages based on it, are primary idioms - and I still don't see where you are going with this. Are you actually claiming that you can't truly understand programming without writing your own compiler? Because that's a pretty bizarre assertion if you are.

~ wrote:

The code is there, we just need to learn the algorithms to interpret and convert it to assembly/machine code, just like when we interpret a ZIP file or an image file format. It's really just a file format just like XML.

It's like metadata that tells us how to create the binary, nothing more.

TMNFSYFI (don't ask me to expand this acronym in a family-friendly group. Suffice it to say, I have no idea what point ~ is trying to make.)

And the assertion that C is the only suitable starting point for language development? Permit me to not-so-humbly differ.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

I will place the program when it's ready at SourceForge.net

As is now, my server is just a distribution machine for my house anyway, but it doesn't matter for it to be accessible from the Internet, so I just put things freely available. They are public domain and non-private data so it's OK.

Solar · **Posted:** Tue Nov 21, 2017 3:42 am

Don't expect us to hold our breath.

Your unwillingness to accept and react to input from others doesn't bode well for your endeavors.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

Solar wrote:

One of the things I always liked in C++, and very much disliked in a great many other languages, is this whole "getting into a mindset" thing.

The mindset I'm referring to specifically is the optimization of things that aren't bottlenecks. It creates unnecessary complexity and pidgeon-holes the ability to make future changes.

Solar wrote:

I understand why people would want an easy-to-learn language that makes it difficult to make silly mistakes.

The problems arise when you're past the "silly" stage and find that the language ties one hand behind your back. Or, worse, you get past the "silly" stage and don't realize one hand is tied behind your back.

As Bjarne Stroustrup put it, "C++ is not aspiring to be a beginner-friendly language, but an expert-friendly language".

I don't say that your language is bad if it takes some architectural decisions away from the developer. I am just saying that you should be open about it, and realize that every thing has two sides.

I'll be very straightforward here, this language is *not* being designed for beginners, or to enforce any specific programming paradigm. It's intended to fulfill the same niche as C/C++, but provide richer ways of handling memory, concurrency, code generation, maintenance, and project management. In addition, it also tries to provide a wide set of creature comforts that reduce friction and improve the legibility of the code.

Much like C++, there are values, references, and pointers in the language. Values have pass-by-copy semantics, and are usually primitives like integers and character strings. References are pointers to a single instance of an object; by default, all arrays and structures are passed by reference because the vast majority of the time, that's what you want. Pointers represent unbounded arrays of 0 or more instances of an object. You can override the default passing semantics by adding a sigil of $, &, or * for values, references, and pointers respectively.

Unlike C++, there are no classes. Types represent data, functions operate on data. Uniform Function Call Syntax (UFCS) means that the first operand of a function is used to determine the data type that the function is associated with. This means that you don't have to extend a class to add more behavior to it. Functions are categorized into modules based on what they do, not what type of objects they operate on. The distinction I was making between physical and virtual types is that a majority of the time, you want a function to operate on an actual data structure. But, when you're designing a system that has front-end and back-end subsystems (like a kernel/user interface), you might only want to operate on the interface of that type. This way, if the architecture of the back-end subsystem changes, it won't affect the front-end. Mixing virtual and physical members (by which I mean properties and fields) in a class defeats the entire purpose of encapsulation.

The template syntax of C++ is a nightmare to read, write, and debug. It's uses can primarily be divided into 3 categories: type substitution (generics), code generation, and compile-time execution. I'm tackling these problems directly in different ways: type substitution via pattern matching, code generation via the language's ability to modify it's own syntax tree, and compile-time execution either by using compiler directives or using the compiler as an API (undecided atm).

I'm very much interested in the notion of first-class memory ownership, but as stated, it's a WIP. Concurrency and interprocess communication are another major concern, because C++ completely missed the memo on these things; and they're vital to the scalability of software. I believe it was Joe Armstrong who showed that Erlang could become magnitudes faster than C++ when scaled. And part of that's due simply to the fact that's extremely hard to write concurrent / parallel code in C++.

There are a lot of design decisions weighing heavily on my mind. I have no intention of restricting how people can solve problems. I'm just focused on isolating what the actual problems are and finding the simplest and most elegant solutions to them in a familiar syntax.

EDIT:
Also (this is directed at '~'), I'm already familiar with how compilers work. Front-end stuff is really easy. Just in the past 3 months, I've written over a dozen parsers from scratch in C for various syntaxes. Once you got a syntax tree constructed, you can evaluate it directly. The difference between an interpreter and compiler begins here. You generate a symbol table for the global scope, and for each function, you simplify each instruction down into three-address form; eg. `%0 = %1 + %2`. Typically, single-static assignment is used for all temporaries, so that they're "defined". At this point, you have what is essentially bytecode (like what LLVM has). The hard part is doing optimizations on that bytecode and things like register allocation, because they're often NP-hard problems. If you skip optimization and register allocation (eg. by keeping values on the stack), code generation is as simple as "substitute this sequence of bytecode instructions with this sequence of machine code instructions".

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

I know that normal high level languages are too high level in many aspects, so they end up getting crippled when it comes to manipulating the machine directly. The native assembly language for a machine is always the clearest choice. It's a matter of making that assembly language easier to understand, simpler, more portable, like NASM syntax over plain Intel syntax over AT&T syntax.

I've managed the low level problem at least for x86 by making it possible to use the very same NASM assembly code portably to generate 16, 32 or 64-bit code with a header file called x86 Portable, which adds things like wideword, wideax, widebx, widecx, widedx, widesi, widedi, widesp, widebp, a16/a32/awide, o16/o32/owide.

I intend to use x86 Portable always for the code generated by my compiler.

In that way I can generate assembly code with portability at the same level to that of C, which will produce code that I can assemble to 16, 32 or 64-bit with the same compiler engine using the exact generated core of assembly code, a single compilation would only require to re-package the code properly in the intended executable file format.

http://sourceforge.net/projects/x86-portable/

http://www.archefire.org/_PROJECTS_/x86_Portable/

x86 Portable is just a NASM include header file that adds automatically-sized instructions and registers for word/dword/qword/wideword according to the target x86 CPU, you just need to specify the word size through _x86_Portable__PLATFORMBITS_ (1632 for 386 Real Mode, 16 for 8086/8088 Real Mode, 32 for 32-bit mode and 64 for 64-bit mode).

With that you can think about how to create code that will use the full registers automatically for all operation modes instead of trying to generate or write dirty and highly non-portable assembly sources.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

~ wrote:

With that you can think about how to create code that will use the full registers automatically for all operation modes instead of trying to generate or write dirty and highly non-portable assembly sources.

That might work, but what you're doing isn't transitive to other instruction sets. The primary way of handling register allocation is a technique known as "graph coloring".

You start by generating a set of instructions that read and write to temporary variables; and then you map (allocate) registers to those temporaries. If your bytecode is in single-static assignment form, it's easy to keep track of which registers are alive or dead so you can collect (free) them for reuse. When you run out of registers, you spill (allocate from the stack).

That said, please create your own topic. I have no intention of writing a C/C++ compiler at the moment. The entire reason this topic was created was to discuss the creation of a new programming language which I intend to replace them for new non-unix / non-windows based operating system. Why? Because my philosophy is that the language you write in heavily influences the architecture of the OS. If you base your OS in C, it's probably going to be a lot like Unix. If you write your OS in BASIC, Delphi, and Pascal, it's probably going to be a lot like Windows.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

The x86 registers and words are capable of automatically selecting their size, they are also capable of having a default size but being able to select the full width of the register portably (for example default 16-bit words but a real usable size of 32 bits) so it's a very particular feature that the x86 CPU has but that assemblers haven't included as of yet, so it's a very worth optimization to apply to a language: automatic selection of register and variable sizes at the assembly level, not only at the C level.

I think that I will create my own topic to announce when I have an interesting progress in my hands. I just wanted to let you know whatever I know about the topic in case it was of use for you.

Wajideus · **Joined:** Thu Nov 16, 2017 3:01 pm **Posts:** 47

Back on topic, there are a few new syntactical features I decided to implement.

First of all, assignments are allowed in conditions and iterations, but require a different syntax:

Code:

if fopen("test.txt", "r") -> f {
    defer fclose f;
    ...
}

for args -> arg {
    puts arg;
}

You can imagine these as being syntactically equivalent to:

Code:

// C Code
{
    FILE *f;
    if ((f = fopen("test.txt", "r")) {
        ...
        fclose(f);
    }
}

{
    auto &arg = args[0];
    for (int i = 0; i < args.count; arg = args[++i]) {
        puts(arg);
    }
}

There are 2 main reasons for this new syntax. First, it improves readability by making the condition itself the subject as opposed to the assignment. Second, it eliminates an error caused by typo; because using '=' for assignment in a condition is now illegal.

The second feature is that, like VB, parentheses around function arguments are only required if you want to use the return value of the function in an expression. So it's perfectly okay to do the following:

Code:

// var is just a declarator keyword for a variable. we're inferring the type
var name = readline "what's your name? ";
printf "hello %s!\n", name;

EDIT:
I also feel like something I said earlier about pointers may have been overlooked, which is very important:

Wajideus wrote:

Pointers represent unbounded arrays of 0 or more instances of an object.

There is a distinction between arrays and pointers in my language that doesn't exist in C/C++. Arrays have a maximum capacity that's always bounds checked (an internal size_t typed field more or less). Pointers don't have a maximum capacity and thus skip bounds checking. Arrays can be casted to pointers though by taking the address of one of their elements, and pointers can be casted to arrays by creating a slice with an upper bound.

Code:

func main(argc: int, argv: **char) -> int;

would be the equivalent of the traditional C/C++ style main function.

Code:

func main(argc; int, argv: [*char]) -> int;

would actually have a C prototype of

Code:

func main(argc: int, argvsz: size_t, argv: **char) -> int;

Which wouldn't make sense because you're already passing the capacity of the array in argc. This is one reason why in my prior examples, I used

Code:

func main(args: [cstr]) -> int;

Where the `cstr` type is equivalent to a `*const char`

OSDev.org

Opinions On A New Programming Language

Who is online