I agree that performance is a big problem in any situation where native code is not being executed by a physical machine. But I'm trying to compare a virtual machine, where the CPU and all hardware is emulated (or virtualized), versus only emulating the CPU, and accessing the physical hardware more-or-less directly.
You have to have some kind of device driver (possibly including "fake" devices, like "/dev/loopback" or whatever), and you have to have "none or more" layers on top of that (to allow multiple applications to share the same device if/where necessary), and you have to have some kind of software interface that applications can use (to access the device via. its driver or the layers on top). If you want, you can pretend that the software interface is a "virtual emulated device", however this is just meaningless word games - it changes nothing, it's still some kind of software interface.
The virtual CPU could be a "lowest common denominator" type processor that can only do a few simple things that are guaranteed to be available natively on 99% of devices. Or, taken to the extreme, it could be a single instruction set processor that would completely give up all performance for the ability to run on any device, even the simplest battery powered toy.
Just to advocate for a second, the single instruction set CPU does have a few other advantages that I think are worth mentioning. First off, although emulating a OISC processor using a typical PC would have dramatic performance problems, building a physical OISC processor could, in fact, perform at an extremely high clock speed. I'm basing this entirely on the emergence and evolution of ASIC processors designed specifically for bitcoin mining, which are essentially single instruction set processors that calculate hash codes at extremely fast speeds -- up to 1000x faster than a modern CPU for roughly the same price.
It probably costs about $5000000+ (in design, prototypes, validation, fab setup costs, etc) to produce a chip that is even slightly competitive. That cost has to be amortised - if you sell 5000000 chips you add $1 to the price of each chip to recover the cost, and if you sell 10 chips you add $500000 to the price of each chip to recover the cost. For OISC, you will never find enough people willing to use it (even if you give the chips away for free) and will never be able to get the price down to anything close to commercially viable.
Yes, you might (as a hypothetical fantasy) be able to achieve "4 instructions per cycle at 10 giga-cycles per second" (or 40 billion instructions per second). This sounds "nice" until you realise that to get any work done software will need thousands of times more instructions and that it's still slower than an 50 MHz ARM CPU you could've bought for $1.
If you don't believe me, show me the "one instruction code" that does the equivalent of a normal CPU's floating point addition instruction (scalar, not SIMD). Before showing me this code, think about the number of branches you couldn't avoid and the performance problems that branch mis-predictions cause.
The other big advantage would be that true compile-once, run-anywhere would finally be a reality. You could literally run the same OS and same applications on your server, your laptop, your router, your TV, your phone, your watch, and your toothbrush. The only difference would be the drivers that would be loaded into memory at run time.
Something that isn't a massive performance disaster (CIL, LLVM, Java byte-code, ...) is also capable of true "compile-once, run-anywhere".
And I think the most important aspect of any new technology is the ability to try it out before you buy it, which is certainly possible in this case, since running a virtual machine that only executes one instruction is something that virtually everyone on this site could do in a few hours.
Maybe I'm wrong, but I definitely can see the potential for this type of "technology" to become the next "evolution" of computers in our lifetime.
A virtual machine using "pure interpreted" for OISC would be very easy to write; and a virtual machine for OISC that is capable of getting performance that is better than "10000 times slower than Java" would take decades of work (if and only if that's actually possible in practice).