OSDev's dream CPU

ACcurrent · **Posted:** Thu May 03, 2012 4:40 am

We all have something we hate about our current CPU (for the most of us x86) we are developing our OSes on. Some of us wth did intel/arm/mips/sun/etc. do that so I want to collect some info about what everybody hates and what we would like in a cpu. I do (and interestingly) find IDTs a very nice feature of x86.

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

Like in every other field, I like simplicity.

And having a silent pc would be nice, meaning, make it slower, but maybe multicore, but cooler.

Yoda · **Posted:** Thu May 03, 2012 6:34 am

I hate all architectures but one - VAX-11. That's the most regular and elegant architecture. The only noticeable drawback of that architecture is that it is 32 bit only. It would be great work if it could be extended to 64 bits without loosing regularity and elegance.

Yoda · **Posted:** Thu May 03, 2012 6:49 am

AFAIK Elbrus is not an extention of VAX-11. And it's architecture is changed many times. Don't know about first, but the second was superscalar RISC, third was VLIW, Elbrus-micro series has SPARC ISA, E2K was VLIW again. I hate VLIW also and don't think that Elbrus has a wide perspective.

JamesM · **Posted:** Thu May 03, 2012 6:56 am

Quote:

armv8 seems fairly good to me.

I agree! <totally-not-being-paid-to-say-that>

AndrewAPrice · **Posted:** Thu May 03, 2012 7:24 am

I'd like slower but may cores.

e.g.
3,000 1 Mhz cores than 1 3GHz core.

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

MessiahAndrw wrote:

I'd like slower but may cores.

e.g.
3,000 1 Mhz cores than 1 3GHz core.

Ofc, that wont run as fast as one 3Ghz core. ; )

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

berkus wrote:

armv8 seems fairly good to me.

as well as most superparallel processors like Cell and GPGPUs.

hardware quirks are inevitable, but at least it's not as braindead as x86.

Reading the ARMv8 specification it seems like the chose to head towards more complex ISA. It is not really a reduced instruction set and the number of instruction doing the same like x86 has increased. There is probably a good reason for this however from the programmer point of view it feels bloated.

I think in order to support instruction parallelism, that should be done by the compiler and not on the fly. The advantage is that you get a more generic ISA that optimizes itself but it adds a lot more complexity. Today I would take a design decision similar to Tensilica and Tilera in order to support OOE, that is compiler optimized OOE. This is important in order to reduce switching.

Also, one thing that is important for the OS programmer is having a convenient memory architecture.

A few things that I think a CPU should have:

Memory should be treated the same and you should not have to quirk with cache and uncached memory depending shared memory ability.
The CPU should handle cache operation in one instruction and the CPU itself should know about the cache configuration. No cache operation loops thank you. OS knowing about cache geometry isn't usually not necessary.
I want new synchronization primitives to become standard, examples are AMD advanced synchronization facility and Intel Transactional Synchronization Extensions.
The architecture shall conveniently support nested interrupts without the SW having to check and assign a stack every time an interrupt occurs.
Recursive page tables.
Memory mapped IO only (Intel is pretty alone support both ports and memory IO)
The ISA should handle any instruction pointer offset deviations due to pipelines, meaning a jump of PC + 16 really means a jump of 16 bytes forward and not 16 bytes +- whatever the pipeline think it should be. If the architecture reports an exception happened at PC = address, that means it happened at that address and not that address +- something.

iansjack · **Posted:** Thu May 03, 2012 8:22 am

The 68000s were nice, as were the Power PCs. Plenty of registers and sensibly consistent instruction sets. Why have most of the elegant architectures been driven out by ugly monsters like the x86s?

Before that the 6809 was my favourite.

Ready4Dis · **Joined:** Sat Nov 18, 2006 9:11 am **Posts:** 571

I like simplicity as well. All the quirks of x86 and backwards compatibility is pretty obnoxious. Just getting into long mode (64-bit) is a chore. I understand backward compatibility, but do we really need 16-bit support? If you need a 16-bit support, use a 16-bit cpu. Also, I'm curious as to how many x86 opcodes are left unused in 99% of software (for example, most compilers have there way of doing things, and will never output certain opcodes). Why not shrink the instruction set down a bit and save some space. Or, give us access to the microcode (aka, direct access to the RISC architecture that is behind the CISC interface), so we could create our own instruction sets that could be more efficient for what we're doing (yeah, like they would ever give us that much access).
Anyways, I like how most microcontrollers are setup, they just kind of startup and run, not as much to getting it working as there is in a full blown PC (a lot of that is because the lack of peripherals I know).

JuEeHa · **Joined:** Thu Mar 10, 2011 4:24 am **Posts:** 30

Haters gonna hate but I kinda like x86-16 althought I like some RISC architectures more. I think that x86-32 and x86-64 are horrible mess. My dream CPU would have following things:
-RISC architecture
-Flat memory model with logical to real address translation being realAddr=addrBase+logicalAddr and if addrBase+logicalAddr would be more than addrTop it would generate interrupt
*addrBase and addrTop are changeable in supervisor mode where logical to real address translation would be realAddr=logicalAddr
-Two protection modes (user, supervisor)
-Supervisor interrupt (interrupt 0 in interrupt vector) would be run in super visor mode, everything else in user mode
-Memory mapped IO

sandras · **Joined:** Thu Nov 03, 2011 9:30 am **Posts:** 146

Agree on memory mapped IO. Also I would not say that kind of machine would be my dream machine, but I do adore my old 486.

Combuster · **Posted:** Thu May 03, 2012 11:29 am

Ready4Dis wrote:

Also, I'm curious as to how many x86 opcodes are left unused in 99% of software

As far as userland software goes, the score is actually getting better over time:
8086: 116/164 unused.
186: 6/7 unused
286: only added privilege and protected mode stuff.
386: 21/31 unused
486: 6/6 unused (cmpxchg is not part of any compiler)
586: 6/6 unused on the integer unit. Knowing that Windows' software blitter had MMX optimisations, you'll actually find a good use for the majority of that extension.
686: 4/31 unused (conditional mov ftw, sysenter/sysexit are also actually used although always manual work)

and then came SSE, and any self-respecting compiler can choose to do all floating ops on the flat register file rather than x87 without resorting to vectorisation, so the actual waste is minimal. There are also enough pixel pumps and video decoders around to cover the better part of the integer vector instructions.

Rudster816 · **Joined:** Thu Jun 17, 2010 2:36 am **Posts:** 141

iansjack wrote:

The 68000s were nice, as were the Power PCs. Plenty of registers and sensibly consistent instruction sets. Why have most of the elegant architectures been driven out by ugly monsters like the x86s?

Before that the 6809 was my favourite.

x86 is still cream of the crop when it comes to overall performance, and it has a nearly impregnable lead in software support for the PC platform. The only reason it's so quirky\ugly is because it was originally designed like it was going to be completely replaced in a couple of years like most CPU architectures of the day. The original 8086\8088 were damn good all things considered, and IBM's decision to use the 8088 in the original PC pretty much brought death up on all other PC architectures.

MessiahAndrw wrote:

I'd like slower but may cores.

e.g.
3,000 1 Mhz cores than 1 3GHz core.

Would be many orders of magnitude slower than it's 1 3Ghz core counterpart. It would also never fit in a single chip, as frequency has nothing to do silicon usage.

http://en.wikipedia.org/wiki/Amdahl%27s_law

Sandras wrote:

Like in every other field, I like simplicity.

And having a silent pc would be nice, meaning, make it slower, but maybe multicore, but cooler.

I'd say your typical smart phone is silent, yet an extremely powerful computer. A PC based around a beefy ARM chip would easily provide enough juice, and be totally passive cooled. Even low end x86 chips can be passively cooled with large heatsinks.

OSwhatever wrote:

Reading the ARMv8 specification it seems like the chose to head towards more complex ISA. It is not really a reduced instruction set and the number of instruction doing the same like x86 has increased. There is probably a good reason for this however from the programmer point of view it feels bloated.

I think in order to support instruction parallelism, that should be done by the compiler and not on the fly. The advantage is that you get a more generic ISA that optimizes itself but it adds a lot more complexity. Today I would take a design decision similar to Tensilica and Tilera in order to support OOE, that is compiler optimized OOE. This is important in order to reduce switching.

Also, one thing that is important for the OS programmer is having a convenient memory architecture.

A few things that I think a CPU should have:

Memory should be treated the same and you should not have to quirk with cache and uncached memory depending shared memory ability.
The CPU should handle cache operation in one instruction and the CPU itself should know about the cache configuration. No cache operation loops thank you. OS knowing about cache geometry isn't usually not necessary.
I want new synchronization primitives to become standard, examples are AMD advanced synchronization facility and Intel Transactional Synchronization Extensions.
The architecture shall conveniently support nested interrupts without the SW having to check and assign a stack every time an interrupt occurs.
Recursive page tables.
Memory mapped IO only (Intel is pretty alone support both ports and memory IO)
The ISA should handle any instruction pointer offset deviations due to pipelines, meaning a jump of PC + 16 really means a jump of 16 bytes forward and not 16 bytes +- whatever the pipeline think it should be. If the architecture reports an exception happened at PC = address, that means it happened at that address and not that address +- something.

How exactly would accomplish an OS not having to worry about the CPU's caches, while using memory based IO? I think the simplest solution would be to have separate load\store instructions that don't use the cache, but this means the CPU cache hierarchy isn't 100% invisible. You also have to provide a way to update the TLB. It would be absolutely debilitating for the CPU to have to check every single TLB entry against every store operation to see if it needed to update something. You also have to worry about write through caches on CPU's that don't snoop other CPU's caches. Snooping can improve performance on systems with smaller amount of caches and very slow memory access, but once you put enough CPU's in the system, snooping becomes a significant bottleneck. There are also major engineering challenges in order to make snooping efficient.

Pipelines do not affect the semantics of the ISA in any way. ISA's are black box abstractions, meaning that what is actually happening at the hardware level doesn't matter. Whatever the ISA says will happen, will happen, unless there's a bug. Even in out of order superscalars instructions are atomically committed in program order. Exceptions\Interrupts\Branch mispredictions only cause the CPU to have to throw away some work it's already done.

I have no idea what you mean by supporting nested interrupts, as I know x86 already does. You don't have to switch stacks either. All you have to do is enable interrupts in the very beginning of your interrupt handler and make sure the entire thing is interrupt safe. This is usually not what's desired though because interrupt handlers are inherently interrupt unsafe, and can sometimes need tight timing constraints. A good handler will do anything that is interrupt unsafe in the very beginning of the ISR and then enable interrupts, and never have to disable them again.

A couple things that I plan on doing on my CPU are user mode exceptions (for certain ones), hardware bounds checking, dedicated program base for PIC, separate paging hierarchies for User\Kernel pages, and getting rid of the concept of software interrupts (opting for syscall like only).

AndrewAPrice · **Posted:** Thu May 03, 2012 11:47 am

Ready4Dis wrote:

Also, I'm curious as to how many x86 opcodes are left unused in 99% of software

I remember when I was studying compilers, I was reading how backends tended to limit themselves to a common subset found on all processors (mov, add, jmp, etc) while x86 unique stuff (majority of extensions like SSE, and instructions like loopew, movsw, cmov) were generally left untouched (sometimes exposed in the form of intrinsic functions).

I wonder if some of these higher level instructions (e.g. scasb) are actually faster than doing the code yourself (e.g. mov, cmp, jnz, etc).

Either, scansb is purely executed in microcode, so it runs many times faster than the handcoded method because the microcode must interpret it, or x86 instructions are JITed into Microcode, so scansb generates the exact same set of microcode instructions as the handcoded method.

Ready4Dis wrote:

Or, give us access to the microcode (aka, direct access to the RISC architecture that is behind the CISC interface), so we could create our own instruction sets that could be more efficient for what we're doing (yeah, like they would ever give us that much access).

Would be nice, but I'm sure there would be huge variations to the microcode ISA between different processors. Even higher models may have significant changes that break compatibility. You'd have to support individual processors.

OSDev.org

OSDev's dream CPU

Who is online