mariuszp wrote:
...an increase in clock speed which is to be expected might remove the performance drop completly.
Well... not really.
You see, from a hardware point of view, the issue it this:
With ever-increasing clock speeds, the number of things you could
do in one clock cycle went
down. The answer was to design longer pipelines, with each step ("stage") in the pipeline doing less, but having the clock speed go up to the point where the overall processing speed increased.
This in turn brought the problem of pipeline stalls. When the beginning of the pipeline encounters a branch, with the branch condition still being computed in later stages of the pipeline, you'd have to wait until the condition is computed before you knew which part of the branch to feed into the pipeline. Like a buffer flush, this was costly in terms of performance.
The answer was branch prediction, and speculative execution. You fed the "most likely" part of the branch to the pipeline, executing it speculatively, and if it turns out your guess was correct, you avoided the pipeline stall. If your guess is incorrect, you'd be no worse off than if you stalled.
If you get rid of branch prediction, and increase the clock speed (which more or less requires lengthening the pipeline, among other things), your pipeline stalls will just get
worse.
So it's not as easy as that. And we're pretty much at the point of clock speed saturation. Have you realized that clock speeds have long since stopped increasing the way they used to? Recent performance increases have mostly come from architectural tweaks, not clock speed increases. And many of those tweaks were, you guessed it, better branch predictions...