TightCoderEx wrote:
Inarguably, the fewer instructions, the faster the throughput.
Nope. That's not true at all; modern CPUs have very complex, non-obvious performance characteristics. Your hand-crafted assembly with the minimum possible number of instructions could very easily be slower than a longer, more complex set of instructions produced by a compiler.
For an incredibly simple example, consider that on x86 most CPUs, the "LOOP" instruction is slower than a "DEC, JNZ" pair. For a more complex example, consider that the fastest way to copy a large block of memory on a modern x86 CPU is with a fairly complex SSE-based algorithm; which outperforms the shortest possible algorithm by an order of magnitude.
These days, CPU performance is complex and equally complex modern compilers are often better at producing fast code than all but the very best assembly-language programmers. Not only that, but if the compiler is doing the optimisation, the higher level source code can be kept simpler and easier to maintain, unlike hand-written assembly. Also, if a new optimisation technique is discovered, updating a compiler to take advantage of it is far easier than re-writing assembly.
While writing optimised assembly can be a fun and rewarding exercise, and a great way of learning about the low-level operation of the CPU, in terms of real-world performance it's often harmful.