Page 1 of 2
Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 3:19 am
by Antti
I am planning to design and implement a simple bytecode specification. Here is the initial plan. It should tell you the main idea.
Code: Select all
Instructions:
ADD 0x?0, 0x??
AND 0x?1, 0x??
CALL 0x?2, 0x??
CMP 0x?3, 0x??
DIV 0x?4, 0x??
JMP 0x?5, 0x??
MOV 0x?6, 0x??
MUL 0x?7, 0x??
OR 0x?8, 0x??
POP 0x?9, 0x??
PUSH 0x?A, 0x??
RET 0x?B, 0x??
SHL 0x?C, 0x??
SHR 0x?D, 0x??
SUB 0x?E, 0x??
XOR 0x?F, 0x??
First octet Second octet
1 A A A X X X X C C C C B B B B
1 = first octet must be [0x80-0xFF]
X = opcode
A = subtype
B = first operand / first part of 8-bit operand
C = second operand / second part of 8-bit operand
MOV - Move
X = 6, A = 0x00 -> MOV R0, imm8
X = 6, A = 0x01 -> MOV R0, imm8 << 8
X = 6, A = 0x02 -> MOV R0, imm8 << 16
X = 6, A = 0x03 -> MOV R0, imm8 << 24
X = 6, A = 0x04 -> MOV reg, reg
X = 6, A = 0x05 -> MOV mem, reg
X = 6, A = 0x06 -> MOV reg, mem
X = 6, A = 0x07 -> MOV mem, mem
Example:
MOV R0, 0x20 -> 0x86, 0x20
MOV R0, (0x21 << 8) -> 0x96, 0x21
MOV R0, (0x22 << 16) -> 0xA6, 0x22
MOV R0, (0x23 << 24) -> 0xB6, 0x23
MOV R15, R1 -> 0xC6, 0x1F
MOV [R15], R1 -> 0xD6, 0x1F
MOV R15, [R1] -> 0xE6, 0x1F
MOV [R15], [R1] -> 0xF6, 0x1F
ADD - Add
X = 0, A = 0x00 -> ADD R0, imm8
X = 0, A = 0x01 -> ADD R0, imm8 << 8
X = 0, A = 0x02 -> ADD R0, imm8 << 16
X = 0, A = 0x03 -> ADD R0, imm8 << 24
ADD R15, R1 -> 0xC0, 0x1F
ADD [R15], R1 -> 0xD0, 0x1F
ADD R15, [R1] -> 0xE0, 0x1F
ADD [R15], [R1] -> 0xF0, 0x1F
Instructions are always 16-bit wide and always naturally aligned. Everything is defined and there would be no "reserved" instructions. This is just an initial plan but I think all the above would be final unless I discard this design. Do you have any comments?
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 3:33 am
by Combuster
Antti wrote:Everything is defined and there would be no "reserved" instructions.
1 = first octet must be [0x80-0xFF]
Do you have any comments?
I couldn't find:
- Conditional jumps
- Stack pointer operations
- Arithmetic shifts and rotates
- Floats
- How encoding actually works for unary instructions (like push/pop/call/jmp)
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 6:54 am
by alexfru
From the above I gather that the machine is 32-bit, yet loading into a register or otherwise using an arbitrary 32-bit constant is going to be painful and require a bit too much code.
I'd suggest looking at how it's done in other CPUs with small subsets of general-purpose instructions, e.g. MIPS (MIPS16e encoding and any newer one), ARM (thumb(2?) encoding and any newer one), TI MSP430 (16-bit CPU, though), etc.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 7:46 am
by Antti
First octet range [0x00-0x7F] defines an invalid instruction. It is and will be so. It is not a reserved range.
Combuster wrote:Conditional jumps
Instruction JMP has plenty of bits and I have to come up how I am going to use them. Conditional jumps are included.
Combuster wrote:Stack pointer operations
Same as above.
Combuster wrote:Arithmetic shifts and rotates
Rotations are not directly supported. Shifts will have a carry-flag-like flag.
Combuster wrote:Floats
Not supported natively.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 8:06 am
by Antti
alexfru wrote:From the above I gather that the machine is 32-bit, yet loading into a register or otherwise using an arbitrary 32-bit constant is going to be painful and require a bit too much code.
I know. Immediate constants are a bit painful. This is something that I have thought a lot. I try to find a good way to avoid them as much as possible. I like the 16-bit fixed length and I am not going to change that.
Every instruction will have "A, B, C" bits used. That means that, for example, I thought that "RET" could be conditionally and could also have a return value (imm8).
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 8:31 am
by Antti
I made a notation error:
Code: Select all
X = 6, A = 0x00 -> MOV R0, imm8
X = 6, A = 0x01 -> MOV R0, imm8 << 8
X = 6, A = 0x02 -> MOV R0, imm8 << 16
X = 6, A = 0x03 -> MOV R0, imm8 << 24
This should mean:
Code: Select all
X = 6, A = 0x00 -> MOV R0_0, imm8
X = 6, A = 0x01 -> MOV R0_1, imm8 << 8
X = 6, A = 0x02 -> MOV R0_2, imm8 << 16
X = 6, A = 0x03 -> MOV R0_3, imm8 << 24
R0_0 is range [7:0] of R0
R0_1 is range [15:8] of R0
R0_2 is range [23:16] of R0
R0_3 is range [31:24] of R0
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 8:35 am
by AndrewAPrice
I love it when people come up with bytecodes and instruction sets.
What is your intention for your bytecode? I see you're using registers - is it for a virtual CPU that you want to implement an emulator for?
If it's for a virtual machine that is easy to compile high level languages to, or easy to translate bytecode to machine code, popular alternatives are SSA and stack based bytecodes. Fixed register machines, while not impossible, tend to add some extra effort because you have to perform register allocation in both the compiler emitting your bytecode, and again when you extract a SSA form representation from it and map it onto real registers.
If it's trying to come up with your own realistic CPU-like instruction set, then can do whatever you'd like!
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 10:01 am
by Antti
MessiahAndrw wrote:What is your intention for your bytecode?
That is a good question but I do not know the answer. This design is not "extensible". It is not a good thing in general but in this case I want to try it. When the specification is ready, it would really be ready. Frozen. Locked.
It will be easy to write an assembler and interpreter for this. If it works well, I start using it.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 10:52 am
by Antti
Code: Select all
JMP - Jump
X = 5, A = 0x00 -> JMP reg+reg
X = 5, A = 0x01 -> JMP imm8 (relative)
X = 5, A = 0x02 -> JMPt imm8 (relative) Jump if true
X = 5, A = 0x03 -> JMPf imm8 (relative) Jump if false
X = 5, A = 0x04 -> JMPt reg+reg Jump if true
X = 5, A = 0x05 -> JMPf reg+reg Jump if false
X = 5, A = 0x06 -> JMPt [reg+reg] Jump if true
X = 5, A = 0x07 -> JMPf [reg+reg] Jump if false
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 10:56 am
by onlyonemac
I once came up with the simplest CPU design I've ever seen. It only has two instructions: move, and move indirect. (Move just copies one register to another, move indirect copies a register to or from the memory location specified by another register, and hi/low word operations are encoded in the register operands.) All other operations are implemented by grouping these instructions, so for example adding two numbers would involve moving each number into the ALU source register and then reading the ALU result register. I know it's terriably inefficient but I liked the idea of an archiutechture that I could implement without needing a PROM microcode - essentially the "microcode" is normal code. With careful programming one could also obtain some efficiency: for example, repeatedly incrementing a value only involves loading one of two ALU registers, and a loop counter requires only two instructions: one to copy the counter from the ALU result register into the ALU source register and another to check the value in the ALU result register (it's not nessecary to copy the value out of the result register). Now even the x86 archietechture requires two instructions to do that (or sometimes even three), and they're much slower instructions to execute!
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 12:16 pm
by AndrewAPrice
I find
one instruction set computers interesting.
The Turing complete single instructions are cool (like subleq), but I think they will tedious and inefficient to implement everyday algorithms with (much like trying to implement division in brainf*ck is). A more practical single instruction set computer, in my mind, is a move machine like you describe. You could memory map your ALU, your logic unit (e.g. 0x1234 equals 0x1233 if 0x1232 is true else it equals 0x1231), your instruction pointer, your memory indirection unit (reading/writing 0x87 reads and writes at the address stored in 0x88), etc. and have a fully functioning computer. I think you would have efficiency issues with dealing with functions that can be recursive, as everything would be based around absolute addresses.
For efficiency reasons, you could have CPU registers (memory mapped of course), including a stack register that pops/pushes on reads and writes.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Sun Jul 13, 2014 12:21 pm
by AndrewAPrice
Also, with real CPUs you have different requirements to virtual machines. Many real CPUs typically have fixed instruction sizes (e.g. 2 or 4 bytes) so that the fetching the instruction can be done in one cycle, which simplifies the implementation (and all instructions can be memory aligned.) There are ways around this, for example, the CPU could fetch 5 bytes at a time (have some sort of FIFO cache buffer for passing instructions through) and simple don't consume additional bytes if the instruction is only one bye large.
I'm not a processor engineer, so my knowledge of how processors work is limited to wiring one up in Logisim.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Mon Jul 14, 2014 12:06 am
by Antti
Code: Select all
; R0 = start address
; R1 = length (4-byte words)
ClearMemory:
XOR R2, R2
SUB R1, 1 ; If underflow, boolean is true
RETt 0 ; Return 0 if true (length was 0)
.L1:
MOV [R0], R2
ADD R0, 4 ; If overflow, boolean is true
RETt 1 ; Return 1 if true
SUB R1, 1
JMPf .L1 ; Jump if not underflow
RET 0 ; Return 0
Here is a demonstration of having a boolean flag. Some of the registers will have a special meaning, like the stack register. Perhaps the instruction pointer could be editable also.
There is still a lot to do. MessiahAndrw, thank you for your replies.
Re: Yet another bytecode design (Antti's bytecode)
Posted: Mon Jul 14, 2014 12:18 am
by Antti
A huge problem in the code snippet above. Of course I am not able to use "SUB R1, 1". Rethinking is needed. I need to have a "SUB reg, imm4".
Re: Yet another bytecode design (Antti's bytecode)
Posted: Mon Jul 14, 2014 12:41 am
by linguofreak
onlyonemac wrote:I once came up with the simplest CPU design I've ever seen. It only has two instructions: move, and move indirect. (Move just copies one register to another, move indirect copies a register to or from the memory location specified by another register, and hi/low word operations are encoded in the register operands.) All other operations are implemented by grouping these instructions, so for example adding two numbers would involve moving each number into the ALU source register and then reading the ALU result register. I know it's terriably inefficient but I liked the idea of an archiutechture that I could implement without needing a PROM microcode - essentially the "microcode" is normal code. With careful programming one could also obtain some efficiency: for example, repeatedly incrementing a value only involves loading one of two ALU registers, and a loop counter requires only two instructions: one to copy the counter from the ALU result register into the ALU source register and another to check the value in the ALU result register (it's not nessecary to copy the value out of the result register). Now even the x86 archietechture requires two instructions to do that (or sometimes even three), and they're much slower instructions to execute!
This is actually a concept that other people have come up with as well. It's called a transport triggered architecture.