iansjack wrote:
The 68000s were nice, as were the Power PCs. Plenty of registers and sensibly consistent instruction sets. Why have most of the elegant architectures been driven out by ugly monsters like the x86s?
Before that the 6809 was my favourite.
x86 is still cream of the crop when it comes to overall performance, and it has a nearly impregnable lead in software support for the PC platform. The only reason it's so quirky\ugly is because it was originally designed like it was going to be completely replaced in a couple of years like most CPU architectures of the day. The original 8086\8088 were damn good all things considered, and IBM's decision to use the 8088 in the original PC pretty much brought death up on all other PC architectures.
MessiahAndrw wrote:
I'd like slower but may cores.
e.g.
3,000 1 Mhz cores than 1 3GHz core.
Would be many orders of magnitude slower than it's 1 3Ghz core counterpart. It would also never fit in a single chip, as frequency has nothing to do silicon usage.
http://en.wikipedia.org/wiki/Amdahl%27s_lawSandras wrote:
Like in every other field, I like simplicity.
And having a silent pc would be nice, meaning, make it slower, but maybe multicore, but cooler.
I'd say your typical smart phone is silent, yet an extremely powerful computer. A PC based around a beefy ARM chip would easily provide enough juice, and be totally passive cooled. Even low end x86 chips can be passively cooled with large heatsinks.
OSwhatever wrote:
Reading the ARMv8 specification it seems like the chose to head towards more complex ISA. It is not really a reduced instruction set and the number of instruction doing the same like x86 has increased. There is probably a good reason for this however from the programmer point of view it feels bloated.
I think in order to support instruction parallelism, that should be done by the compiler and not on the fly. The advantage is that you get a more generic ISA that optimizes itself but it adds a lot more complexity. Today I would take a design decision similar to Tensilica and Tilera in order to support OOE, that is compiler optimized OOE. This is important in order to reduce switching.
Also, one thing that is important for the OS programmer is having a convenient memory architecture.
A few things that I think a CPU should have:
Memory should be treated the same and you should not have to quirk with cache and uncached memory depending shared memory ability.
The CPU should handle cache operation in one instruction and the CPU itself should know about the cache configuration. No cache operation loops thank you. OS knowing about cache geometry isn't usually not necessary.
I want new synchronization primitives to become standard, examples are AMD advanced synchronization facility and Intel Transactional Synchronization Extensions.
The architecture shall conveniently support nested interrupts without the SW having to check and assign a stack every time an interrupt occurs.
Recursive page tables.
Memory mapped IO only (Intel is pretty alone support both ports and memory IO)
The ISA should handle any instruction pointer offset deviations due to pipelines, meaning a jump of PC + 16 really means a jump of 16 bytes forward and not 16 bytes +- whatever the pipeline think it should be. If the architecture reports an exception happened at PC = address, that means it happened at that address and not that address +- something.
How exactly would accomplish an OS not having to worry about the CPU's caches, while using memory based IO? I think the simplest solution would be to have separate load\store instructions that don't use the cache, but this means the CPU cache hierarchy isn't 100% invisible. You also have to provide a way to update the TLB. It would be absolutely debilitating for the CPU to have to check every single TLB entry against every store operation to see if it needed to update something. You also have to worry about write through caches on CPU's that don't snoop other CPU's caches. Snooping can improve performance on systems with smaller amount of caches and very slow memory access, but once you put enough CPU's in the system, snooping becomes a significant bottleneck. There are also major engineering challenges in order to make snooping efficient.
Pipelines do not affect the semantics of the ISA in any way. ISA's are black box abstractions, meaning that what is actually happening at the hardware level doesn't matter. Whatever the ISA says will happen, will happen, unless there's a bug. Even in out of order superscalars instructions are atomically committed in program order. Exceptions\Interrupts\Branch mispredictions only cause the CPU to have to throw away some work it's already done.
I have no idea what you mean by supporting nested interrupts, as I know x86 already does. You don't have to switch stacks either. All you have to do is enable interrupts in the very beginning of your interrupt handler and make sure the entire thing is interrupt safe. This is usually not what's desired though because interrupt handlers are inherently interrupt unsafe, and can sometimes need tight timing constraints. A good handler will do anything that is interrupt unsafe in the very beginning of the ISR and then enable interrupts, and never have to disable them again.
A couple things that I plan on doing on my CPU are user mode exceptions (for certain ones), hardware bounds checking, dedicated program base for PIC, separate paging hierarchies for User\Kernel pages, and getting rid of the concept of software interrupts (opting for syscall like only).