Using word as the only data type.

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

Hi. I am toying around with what I guess you could call a stack-based virtual machine. I'm mainly taking ideas from Forth language. My intention is to use single data type - "word" in Forth terminology. Word is of CPU data path width. What I like about it is that it seems "natural" to work with register sized data. All the items on the stack are of word size. And all memory accesses should be word-size aligned. Essentially make a word oriented virtual machine on top of a byte oriented physical machine. What I find difficult with this is of course byte manipulation. I wanted to take input from the user and it requires byte access. I could use bitwise operations to manipulate individual bytes in the word but it takes away from the simplicity which I chase. My question is: what other drawbacks are there to using word as the only one data type. Is it feasible to use word as the only data type? Am I making my life more complicated?

Schol-R-LEA · **Posted:** Mon Oct 10, 2016 2:58 pm

BLISS did this too, and the experience with that highlights one of the problems in the idea: that is makes it far less portable. DEC had several completely different languages, all called 'BLISS', in part because having the sole data type be the system word caused problems when going from 8-bit, 12-bit, 16-bit, 18-bit, 32-bit, and 36-bit systems (eventually they standardized on 8/16/32-bit, just as IBM had earlier, but that meant a lot of their older product lines got abandoned). The languages were not consistent in other ways, but that was the biggest drawback.

More to the point, if you have only one data type, you lose a lot of the error checking and debugging information that typing provides. Typing doesn't need to be explicit to be strong (though most strongly typed languages are explicitly typed), and explicit typing certainly doesn't imply strength, nor is strong typing necessarily desirable for system programming; still, having some kind of typing information is far too valuable to avoid.

That having been said, if you design a solid set of type definition constructs, and either avoid infix operators (as in Forth and the various Lisps), or allow operators to be overloaded (as in C++ or Python), and have a convenient means of mixing HLL code with assembly code, you can in theory use a raw data type as the base of all the others, and define the types you need explicitly. This is (in principle at least) how Ada works, and is how I intend to write Thelema (though in that case I am also allowing extensions to the compiler itself, as well as lexical macros or the kinds typical of other Lisps). You still need to be careful about word size issues for portability, but it becomes a library implementation issue rather than a showstopper.

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

Not that type-error checking is bad but I do intend to leave it up to the programmer for simplicity's sake. Also my operators will most likely never be infix. From what I gather pre- or post-fix operators are much more easier to implement and I like that. Also, yes, operators could be overloaded if I go for a Forth-like system which I mostly am right now. Though I do try to keep an open mind and not implement just another copy of an existing language. Perhaps my taste for simpler things is due to my lack of knowledge. :roll:

Thank you for your answer.

Schol-R-LEA · **Posted:** Tue Oct 11, 2016 11:19 am

Fair enough; as long as you are aware of the issues, you can make an intelligent choice about them.

As for preferring simplicity, as I say, I favor Lisp, especially Scheme, which is a pretty minimalist language itself. However, I have found that some complexity is necessary for a practical language; the question becomes, where do you put that complexity, and what trade-offs can you make on it?

For Thelema, I have chosen to put a lot of it in the support for extensibility - the language specializers, the core macro system, the module system, the scoping mechanisms, and eventually, the deployment system for AOT partially-compiled modules - so that the almost everything else can be shunted into the libraries.

jeaye · **Posted:** Tue Oct 11, 2016 11:21 am

Quote:

Not that type-error checking is bad but I do intend to leave it up to the programmer for simplicity's sake.

Nothing is simple about that, aside from the novelty.

Quote:

Is it feasible to use word as the only data type?

It's doable. What do you think you'll actually gain from it, besides "I'm using only words?" There's nothing wrong with wanting only that, but don't conjure the illusion that this will actually gain you anything.

Quote:

Am I making my life more complicated?

Is this not apparent? You're ruling out half a century of research and development into systems programming languages. By this question, it seems pretty clear that you're actually thinking this language will help you make your OS, in some way. It won't.

My advice is to continue IFF you understand that you're throwing away ADTs, strong, static typing, and any reasonable chance at non-procedural paradigms, likely for no gain. Developing an esoteric language for your esoteric operating system is an excellent goal; just don't think what you're bringing to the table, in language features, is making _anything_ easier.

If you want further suggestions for your proposed language, consider how you might represent functions. Defining and calling functions will get you most of the way, but even C has function pointers, which carry on to be first class functions. When setting up, for example, your IDT, function pointers will come in handy.

Boris · **Joined:** Sat Nov 07, 2015 3:12 pm **Posts:** 145

Try rewriting the meaty skeleton using only intmax_t for integer/pointer storage.
If you feel good about that, peruse your dream

Brendan · **Posted:** Wed Oct 12, 2016 1:07 am

Hi,

sandras wrote:

I wanted to take input from the user and it requires byte access.

No; characters would be stored as "one or more words" and you won't need byte access (except for in the keyboard driver itself). What you do need is far more RAM, because (assuming 64-bit words) UTF-8 wastes 87.5% of each word, UTF-16 wastes 75% of each word and UTF-32 wastes 50% of each word.

For every kind of device driver you can imagine (including the keyboard driver) the language would be completely unusable because you can't handle memory mapped IO or IO ports properly. You'd have to use a different language (that's far less crippled and significantly faster and much better at catching bugs, that doesn't waste most of your RAM for no reason) instead.

Cheers,

Brendan

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

I get what all of you are saying. Thank you for the input.

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

I think to further the discussion for my own good I should talk about what I have in mind right now for the language.

There's a stack. Each element on the stack is the same size. You can read 8-, 16-, 32-, 64-bit values from memory and push them to the top of stack. If the size of the value from the memory is less than the size of the stack element, then the rest of the bits are left zero. For writing, you can write 8-, 16-, 32-, 64-bits of the top element of stack to memory. So dealing with elements on the stack (duplicating, swapping) is easy. I kind of dislike that there has to be a separate command for every size of read/write. But that's the way it is for now in my head.

Anyway. That's what I got for now.

MichaelFarthing · **Posted:** Thu Oct 13, 2016 12:31 pm

Stacks often use extra storage in this fashion so don't think your idea if off the wall.
You did not say that the language was a stack-orientated one. I think that changes the perspective considerably.
You should consider whether it might be helpful for padding to be by sign extension rather than zeros.

EDIT Looking back I see you do start by talking about Forth, so effectively you did say a stack-orientated language. Apologies.

onlyonemac · **Joined:** Sat Mar 01, 2014 2:59 pm **Posts:** 1146

I once thought of designing a CPU that used only 16-bit values. I concluded that, since so much is already based around 8-bit values (e.g. disk I/O, files, ASCII text, etc.), it would be necessary to at least provide a way to access the low and high bytes of a 16-bit value separately. I ended up designing the CPU with a flag on move instructions that would allow one to switch the high and low bytes of a move operation, designed to be used in conjunction with two 8-bit registers designated for working with 8-bit values in arithmetic operations. Ultimately though I don't think it would be very efficient to load even a small text file through two 8-bit registers.

Ycep · **Joined:** Mon Dec 28, 2015 11:11 am **Posts:** 401

Why that?

Sik · **Joined:** Wed Aug 17, 2016 4:55 am **Posts:** 251

onlyonemac wrote:

I once thought of designing a CPU that used only 16-bit values. I concluded that, since so much is already based around 8-bit values (e.g. disk I/O, files, ASCII text, etc.), it would be necessary to at least provide a way to access the low and high bytes of a 16-bit value separately. I ended up designing the CPU with a flag on move instructions that would allow one to switch the high and low bytes of a move operation, designed to be used in conjunction with two 8-bit registers designated for working with 8-bit values in arithmetic operations. Ultimately though I don't think it would be very efficient to load even a small text file through two 8-bit registers.

RISC processors normally only understand one word size and the only place where they allow other sizes (e.g. bytes) is in the load/store instructions (and when loading into a register, they will zero-extend the value to the native word size). So your idea is not really that far off from reality.

Also DSP processors usually only understand their native word size and nothing else =P

onlyonemac · **Joined:** Sat Mar 01, 2014 2:59 pm **Posts:** 1146

Sik wrote:

RISC processors normally only understand one word size and the only place where they allow other sizes (e.g. bytes) is in the load/store instructions (and when loading into a register, they will zero-extend the value to the native word size). So your idea is not really that far off from reality.

Interesting. This was intended to be an ultra-RISC system (in the order of around four instructions, with the ALU accessed merely as a collection of special registers), so maybe it wouldn't be too inefficient after all.

OSDev.org

Using word as the only data type.

Who is online