OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 8:37 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 68 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Thu Jan 22, 2015 6:44 pm 
Offline
Member
Member
User avatar

Joined: Fri Jun 13, 2008 3:21 pm
Posts: 1700
Location: Cambridge, United Kingdom
Brendan wrote:
Except, it's highly misleading. All these CPUs use different clock speeds, have different cache sizes and speeds, different RAM speeds, etc.

Here's your results; with the original score divided by the clock speed:

  • 2006 Core 2 (Q9650): 3810/3 = 1270
  • 2008 Nehalem (860): 6230/2.8 = 2225, difference from previous = 75%
  • 2011 Sandy Bridge (i7 2600): 7970/3.4 = 2344, difference from previous = 5%
  • 2013 Haswell (i7 4790): 9540/3.5 = 2725, difference from previous = 16%


And what relevance does that normalization have? If you took Core 2 and built it in Haswell's process it wouldn't miraculously run at 5GHz all the time. Clock speed is largely irrelevant - it doesn't really matter if you're running at 2.5GHz or 5GHz if you're managing the same instructions/time measure.

Brendan wrote:
Note that the biggest change/improvement Intel made for Nehalem was bringing the memory controller onto the same chip as the CPU, which caused major improvement in RAM access times. I very much doubt that the jump in benchmark score for Nehalem is a coincidence, and that implies their benchmark is heavily effected by memory bandwidth and/or latency. Nehalem and Sandy Bridge both used DDR3-1066/1333, and the (normalised) performance difference between them is minor. Haswell uses DDR3-1333/1600, and with a 20% increase in RAM speed (over Sandy Bridge) we get a 16% increase in (normalised) benchmark score. What a surprise!


OK, let's consult a different benchmark: SUPERCOP. Specifically (because I don't have all day), I'll pick the Salsa20 stream cipher (which is one of the newly picked TLS 1.2 ciphers, because it's very amenable to software implementation) on a 4096 byte message. One will note that a 4096 byte message will quite easily fit inside L1 cache.

Sandy Bridge manages 3.76 cycles per byte. Haswell manages 3.14 cycles per byte. A 20% improvement.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Thu Jan 22, 2015 10:23 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Owen wrote:
Brendan wrote:
Except, it's highly misleading. All these CPUs use different clock speeds, have different cache sizes and speeds, different RAM speeds, etc.

Here's your results; with the original score divided by the clock speed:

  • 2006 Core 2 (Q9650): 3810/3 = 1270
  • 2008 Nehalem (860): 6230/2.8 = 2225, difference from previous = 75%
  • 2011 Sandy Bridge (i7 2600): 7970/3.4 = 2344, difference from previous = 5%
  • 2013 Haswell (i7 4790): 9540/3.5 = 2725, difference from previous = 16%


And what relevance does that normalization have? If you took Core 2 and built it in Haswell's process it wouldn't miraculously run at 5GHz all the time. Clock speed is largely irrelevant - it doesn't really matter if you're running at 2.5GHz or 5GHz if you're managing the same instructions/time measure.


With the new process Intel can (and have) reduced power consumption; but the newer process doesn't change the performance per cycle of the design and only really effects the "watts" part of performance per watt. If you want to pretend that comparing a 2.8 Ghz CPU to 3.5 Ghz CPU is "fair" then why not compare a low-end 2.0 Ghz Nehalem to an overclocked Haswell running at 4.0 GHz?

Owen wrote:
Brendan wrote:
Note that the biggest change/improvement Intel made for Nehalem was bringing the memory controller onto the same chip as the CPU, which caused major improvement in RAM access times. I very much doubt that the jump in benchmark score for Nehalem is a coincidence, and that implies their benchmark is heavily effected by memory bandwidth and/or latency. Nehalem and Sandy Bridge both used DDR3-1066/1333, and the (normalised) performance difference between them is minor. Haswell uses DDR3-1333/1600, and with a 20% increase in RAM speed (over Sandy Bridge) we get a 16% increase in (normalised) benchmark score. What a surprise!


OK, let's consult a different benchmark: SUPERCOP. Specifically (because I don't have all day), I'll pick the Salsa20 stream cipher (which is one of the newly picked TLS 1.2 ciphers, because it's very amenable to software implementation) on a 4096 byte message. One will note that a 4096 byte message will quite easily fit inside L1 cache.

Sandy Bridge manages 3.76 cycles per byte. Haswell manages 3.14 cycles per byte. A 20% improvement.


Sigh.

For Haswell, the decoders weren't noticeably improved, branch prediction wasn't noticeably improved, integer ops weren't noticeably improved, floating point ops weren't noticeably improved, MMX/SSE/AVX wasn't noticeably improved (other than the fused multiply-add extensions). There were minor improvements everywhere, but nothing noticeable.

The only noticeable improvements were in the memory hierarchy and not in the core itself. This includes doubling the bandwidth of the L1 cache, adding transactional memory extensions, and then removing transactional memory extensions.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Fri Jan 23, 2015 12:05 am 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
Comparing a 2.8GHz CPU to a 3.5GHz CPU is perfectly fair in this context. Your original claim:
Brendan wrote:
For single-thread performance, a lot of people (including me) think Intel can't make cores go much faster. In the last 10 years clock speeds have gone from ~3.5 GHz Pentium 4 all the way up to ~3.5 Ghz Haswell; and while Intel have found ways of optimising the CPU's "instructions per cycle" they've been getting diminishing returns for a long while now
has been contradicted by legitimate benchmarks. Clock speed went up steadily for the top non-overclocked processors of each generation. Instructions per cycle went up by the same amount every 2-3 years for non-memory-bound benchmarks. Considering Haswell also lowered power usage, you don't have a leg to stand on here (without moving the goalposts, of course).

It doesn't matter if nothing "noticeably improved" according to you if the numbers go up like that. Of course this doesn't mean anyone should just give up on parallel processing and ride the free single-threaded performance increase train. What it does mean is that claims we've run out of single-thread performance are wrong.

_________________
[www.abubalay.com]


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Fri Jan 23, 2015 4:57 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Rusky wrote:
Comparing a 2.8GHz CPU to a 3.5GHz CPU is perfectly fair in this context. Your original claim:
Brendan wrote:
For single-thread performance, a lot of people (including me) think Intel can't make cores go much faster. In the last 10 years clock speeds have gone from ~3.5 GHz Pentium 4 all the way up to ~3.5 Ghz Haswell; and while Intel have found ways of optimising the CPU's "instructions per cycle" they've been getting diminishing returns for a long while now
has been contradicted by legitimate benchmarks. Clock speed went up steadily for the top non-overclocked processors of each generation. Instructions per cycle went up by the same amount every 2-3 years for non-memory-bound benchmarks. Considering Haswell also lowered power usage, you don't have a leg to stand on here (without moving the goalposts, of course).


Clock speed didn't go up. The only thing that happened is that (for Sandy Bridge) Owen chose benchmark results for a 2.8 Ghz chip and not a benchmark for a 3.5 Ghz chip. I suspect Owen deliberately tried to choose CPUs with the same TDP, which would've made sense if we were talking about performance/watt and not raw single-thread performance.

If you really want to look at frequency alone, Netburst reached 3.8 GHz, Core 2 reached 3.33 Ghz (Q9000), Nehalem reached 3.33 Ghz (975), Sandy Bridge reached 3.5 (3970X), and Haswell has reached 3.5 Ghz (5930K). I guess this means that I was wrong - clock speeds didn't remain at 3.5 Ghz from Pentium 4 to Haswell, and they actually got slower (3.8 Ghz down to 3.5 Ghz). They didn't go up steadily, they "stayed about the same, lumpishly".

Rusky wrote:
It doesn't matter if nothing "noticeably improved" according to you if the numbers go up like that. Of course this doesn't mean anyone should just give up on parallel processing and ride the free single-threaded performance increase train. What it does mean is that claims we've run out of single-thread performance are wrong.


Mostly all I'm saying is that for single threaded performance we're well into "diminishing returns". From 80386 to Pentium M we saw jumps from 100% faster to 400% faster as Intel picked all the low hanging fruit (clock speed, out-of-order, SIMD, branch prediction, prefetching, etc), and now all the low hanging fruit is gone and we're lucky to see pathetic 20% faster jumps. If you plotted "single thread performance vs. time" on a graph it wouldn't be a perfectly smooth curve and there will be "bumps", but that curve is still clear and points towards a ceiling/asymptote for single threaded performance that isn't far from what we've got now. Squabbling like children over the exact numbers doesn't change the long term trend.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Fri Jan 23, 2015 2:18 pm 
Offline
Member
Member
User avatar

Joined: Fri Jun 13, 2008 3:21 pm
Posts: 1700
Location: Cambridge, United Kingdom
Brendan wrote:
Hi,

Owen wrote:
Brendan wrote:
Except, it's highly misleading. All these CPUs use different clock speeds, have different cache sizes and speeds, different RAM speeds, etc.

Here's your results; with the original score divided by the clock speed:

  • 2006 Core 2 (Q9650): 3810/3 = 1270
  • 2008 Nehalem (860): 6230/2.8 = 2225, difference from previous = 75%
  • 2011 Sandy Bridge (i7 2600): 7970/3.4 = 2344, difference from previous = 5%
  • 2013 Haswell (i7 4790): 9540/3.5 = 2725, difference from previous = 16%


And what relevance does that normalization have? If you took Core 2 and built it in Haswell's process it wouldn't miraculously run at 5GHz all the time. Clock speed is largely irrelevant - it doesn't really matter if you're running at 2.5GHz or 5GHz if you're managing the same instructions/time measure.


With the new process Intel can (and have) reduced power consumption; but the newer process doesn't change the performance per cycle of the design and only really effects the "watts" part of performance per watt. If you want to pretend that comparing a 2.8 Ghz CPU to 3.5 Ghz CPU is "fair" then why not compare a low-end 2.0 Ghz Nehalem to an overclocked Haswell running at 4.0 GHz?


So it would be unfair to compare a processor which ran at 5GHz and acomplished 1 instruction per cycle against a processor which ran at 1Ghz and acomplished 5 instructions per cycle, because the former has a clock speed advantage?

Have you characterized the properties of the process that the two processors were built on? Do you know what design parameters Intel used for the two cores?

Brendan wrote:

Owen wrote:
Brendan wrote:
Note that the biggest change/improvement Intel made for Nehalem was bringing the memory controller onto the same chip as the CPU, which caused major improvement in RAM access times. I very much doubt that the jump in benchmark score for Nehalem is a coincidence, and that implies their benchmark is heavily effected by memory bandwidth and/or latency. Nehalem and Sandy Bridge both used DDR3-1066/1333, and the (normalised) performance difference between them is minor. Haswell uses DDR3-1333/1600, and with a 20% increase in RAM speed (over Sandy Bridge) we get a 16% increase in (normalised) benchmark score. What a surprise!


OK, let's consult a different benchmark: SUPERCOP. Specifically (because I don't have all day), I'll pick the Salsa20 stream cipher (which is one of the newly picked TLS 1.2 ciphers, because it's very amenable to software implementation) on a 4096 byte message. One will note that a 4096 byte message will quite easily fit inside L1 cache.

Sandy Bridge manages 3.76 cycles per byte. Haswell manages 3.14 cycles per byte. A 20% improvement.


Sigh.

For Haswell, the decoders weren't noticeably improved, branch prediction wasn't noticeably improved, integer ops weren't noticeably improved, floating point ops weren't noticeably improved, MMX/SSE/AVX wasn't noticeably improved (other than the fused multiply-add extensions). There were minor improvements everywhere, but nothing noticeable.

The only noticeable improvements were in the memory hierarchy and not in the core itself. This includes doubling the bandwidth of the L1 cache, adding transactional memory extensions, and then removing transactional memory extensions.


You have a weird definition of "core" if it doesn't include "L1 cache". Next you'll be saying things like TLBs and branch predictors are uncore.

A doubling of L1 cache bandwidth isn't a notable improvement in the performance of a core? Or is it somehow completely irrelevant that Intel managed to make a very important portion of their core twice as fast?


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Fri Jan 23, 2015 9:18 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Owen wrote:
Brendan wrote:
With the new process Intel can (and have) reduced power consumption; but the newer process doesn't change the performance per cycle of the design and only really effects the "watts" part of performance per watt. If you want to pretend that comparing a 2.8 Ghz CPU to 3.5 Ghz CPU is "fair" then why not compare a low-end 2.0 Ghz Nehalem to an overclocked Haswell running at 4.0 GHz?


So it would be unfair to compare a processor which ran at 5GHz and acomplished 1 instruction per cycle against a processor which ran at 1Ghz and acomplished 5 instructions per cycle, because the former has a clock speed advantage?


If I took a Haswell CPU, underclocked it to 1 GHz and benchmarked it, then overclocked it to 5 Ghz and benchmarked it again, would it be fair to pretend that IPC suddenly jumped based on the benchmark results? This wouldn't just be unfair, it'd be blatantly dishonest.

What if I benchmarked a 2.8 Ghz Sandy Bridge and a 3.5 Ghz Haswell and tried to pretend IPC jumped significantly? This wouldn't just be unfair, it'd be blatantly dishonest.

Owen wrote:
A doubling of L1 cache bandwidth isn't a notable improvement in the performance of a core? Or is it somehow completely irrelevant that Intel managed to make a very important portion of their core twice as fast?


They didn't make it twice as fast, it's the same latency as Sandy Bridge (4 cycles for both). There was no notable engineering breakthrough, no miraculous new cache management approach, no shift to a radically different circuit design. It's the same old stuff they've been doing for 10+ years, and all they did was double the width (256-bit wide instead of 128-bit wide).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Fri Jan 23, 2015 10:11 pm 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
Microarchitecture is the biggest part of what determines clock speed, so it's actually "blatantly dishonest" to try to correct for differences in clock speed. You can't just up a 2.8GHz to 3.5GHz without causing all sorts of problems, unless it was deliberately underclocked, in which case it still makes sense because you're also comparing power usage.

_________________
[www.abubalay.com]


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 4:43 am 
Rusky wrote:
Microarchitecture is the biggest part of what determines clock speed

It would be true if the microarchitecture has changed significantly. But if the change is something like L1 size extension, then your statement is wrong.

But if with lithography size limit decrease transistor's switching current also goes down, then it is possible to increase clock rate. And it was true for some time. Today it seems that such approach has hit it's limit.
Rusky wrote:
You can't just up a 2.8GHz to 3.5GHz without causing all sorts of problems

But the "sorts of problems" is a consequence of additional transistors being thrown on the same silicon area. Additional transistors consume additional current (power) and clock rate should be decreased to compensate for the increased power consumption. Here a "new" microarchitecture (just some extra transistors for L1) introduction leads to clock rate limit decrease.

As long as additional power consumption was a direct consequence of the "new" microarchitecture, it is unfair to decouple clock rate from new design. But with new lithography power consumption is still advances a bit (but much slower than before), then again we should extend our "unsplittable" part of the design and include the lithography in our equation. Next we can find some more parts to be inseparable from the design. All this leads to the need for some basic assessment techniques, which can be employed to compare processors. And it seems reasonable to employ just operation per second metric divided by power consumption for a particular algorithm (task/goal/load pattern and so on) and by a price of a processor. Then we can see that performance statistics are skewed without power consumption and price factors involved and all the discussion about Ghz is skewed in the same manner.


Top
  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 1:15 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Rusky wrote:
Microarchitecture is the biggest part of what determines clock speed, so it's actually "blatantly dishonest" to try to correct for differences in clock speed. You can't just up a 2.8GHz to 3.5GHz without causing all sorts of problems, unless it was deliberately underclocked, in which case it still makes sense because you're also comparing power usage.


The 2.8 Ghz Sandy Bridge was deliberately under-clocked (product binning).


Originally the Sandy Bridge micro-architecture was done on older 32 nm manufacturing process; then Intel copied the same micro-architecture over to the newer 22 nm manufacturing process and called it "Ivy Bridge".

By comparing 3.5 Ghz Sandy Bridge to 3.5 Ghz Ivy bridge you can see the different that changes to the manufacturing process had in a very fair way (e.g. same micro-arch at the same clock speed). Almost no difference in performance but power consumption dropped from 95 W down to 77 W.

By comparing 3.5 GHz Ivy Bridge and 3.5 Ghz Haswell you can see the difference that changes to the micro-architecture had in a very fair way (e.g. at the same clock speed on the same manufacturing process). The difference? Power increased to 85 W (but I'd assume this is mostly due to GPU changes and not CPU changes) and speed increased a little (mostly due to the L1 cache "width but not speed/latency" bandwidth improvement).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 2:35 pm 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
Good thing he didn't use the 2.8GHz Sandy Bridge then, right?

Of course you want to control those variables when you're comparing the effects of one particular thing (microarchitecture, manufacturing process, etc.), but the claim under investigation here is, again,
Brendan wrote:
For single-thread performance, a lot of people (including me) think Intel can't make cores go much faster. ...while Intel have found ways of optimising the CPU's "instructions per cycle" they've been getting diminishing returns for a long while now
Any combination of features is a valid way to improve single-threaded performance, even including memory bandwidth (depending on the context). And single-threaded performance of, as Owen put it, the "highest listed variant of given family," (not the underclocked ones) has improved steadily with less diminishing returns than you claim (both including and not including memory bandwidth).

The real "diminishing returns" is that we don't get as massive of a free clock speedup when we shrink the manufacturing process anymore. Somehow Intel has continued to speed things up regularly without that though.

_________________
[www.abubalay.com]


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 8:00 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Rusky wrote:
Good thing he didn't use the 2.8GHz Sandy Bridge then, right?


My mistake - it was the "under-clocked" 2.8 Ghz Nehalem that was used to artificially inflate the single-threaded performance difference between Nehalem and Sandy Bridge.

Rusky wrote:
Of course you want to control those variables when you're comparing the effects of one particular thing (microarchitecture, manufacturing process, etc.), but the claim under investigation here is, again,
Brendan wrote:
For single-thread performance, a lot of people (including me) think Intel can't make cores go much faster. ...while Intel have found ways of optimising the CPU's "instructions per cycle" they've been getting diminishing returns for a long while now
Any combination of features is a valid way to improve single-threaded performance, even including memory bandwidth (depending on the context). And single-threaded performance of, as Owen put it, the "highest listed variant of given family," (not the underclocked ones) has improved steadily with less diminishing returns than you claim (both including and not including memory bandwidth).


Why stop there? Why not say that that because SSDs have much lower seek times than old hard disks, single threaded performance improved dramatically for certain benchmarks?

The real question is whether Intel can/can't continue delivering the small ("less than twice as fast") single-threaded performance increases forever. They can't. Things like shifting the memory controller onto the same chip as the CPU can only happen once; and things like increasing L1 data cache width (or SIMD register size, or number of execution units, or...) has diminishing returns.

In the next ~8 years we'll reach 5 nm and won't be able to go smaller (due to both economics and physics); and we'll never see anything with twice the single threaded performance as Haswell due to diminishing returns. We're already close to the limit of instruction level parallelism (out-of-order, speculative execution), already close to the limit of branch prediction and pre-fetching, already close to the limit of the memory hierarchy, etc. The only real trick Intel has left is bringing the RAM onto the same chip (which will cause another "one time only" bump in performance).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 11:21 pm 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
Brendan wrote:
Why stop there? Why not say that that because SSDs have much lower seek times than old hard disks, single threaded performance improved dramatically for certain benchmarks?
If you were benchmarking an I/O bound program that would in fact make sense. But, as I've repeated twice, we're talking specifically about single-threaded performance, and Owen gave benchmarks that are CPU-bound, including one that specifically avoided memory effects.

Brendan wrote:
we'll never see anything with twice the single threaded performance as Haswell due to diminishing returns. We're already close to the limit of instruction level parallelism (out-of-order, speculative execution), already close to the limit of branch prediction and pre-fetching, already close to the limit of the memory hierarchy, etc.
Maybe, maybe not. Although (as I've already agreed with) the rate of increase has slowed down and it's not really a free lunch anymore, there's still plenty of things that haven't really been explored in mainstream chips before we hit absolute rock bottom (top? :P).

For example, temporal register addressing can enable a lot more software pipelining and reduce the need for register renaming and large register files, CPU-managed call stacks can reduce the need for explicit state saving and restoring (and do it in parallel), different instruction encodings can be way faster and more power-efficient than x86, static scheduling with code-gen-at-install-time can make instruction issue way simpler, branch prediction can be replaced with transfer prediction out of basic blocks, caches can handle partial reads and cold writes without waiting for the whole cache line, etc...

Just because x86 can't get much faster doesn't mean we can't get lots more CPU performance and/or power.

_________________
[www.abubalay.com]


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sat Jan 24, 2015 11:58 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Rusky wrote:
Brendan wrote:
we'll never see anything with twice the single threaded performance as Haswell due to diminishing returns. We're already close to the limit of instruction level parallelism (out-of-order, speculative execution), already close to the limit of branch prediction and pre-fetching, already close to the limit of the memory hierarchy, etc.
Maybe, maybe not. Although (as I've already agreed with) the rate of increase has slowed down and it's not really a free lunch anymore, there's still plenty of things that haven't really been explored in mainstream chips before we hit absolute rock bottom (top? :P).

For example, temporal register addressing can enable a lot more software pipelining and reduce the need for register renaming and large register files, CPU-managed call stacks can reduce the need for explicit state saving and restoring (and do it in parallel), different instruction encodings can be way faster and more power-efficient than x86, static scheduling with code-gen-at-install-time can make instruction issue way simpler, branch prediction can be replaced with transfer prediction out of basic blocks, caches can handle partial reads and cold writes without waiting for the whole cache line, etc...


Sure, everything can be a whole lot faster, as long as you never actually produce a chip and find out how badly it performs on real software. I'm sure Itanium looked good on paper too.

Of course there's also the other problem - as soon as you break compatibility you're stuck in "no man's land" where nobody wants to write software for it because there's no market share, and there's no market share because there's no software for it; and where the price is high because the volume is low and you can't afford the latest manufacturing process so "performance per watt" suffers, making the whole "catch 22" worse; and then you realise your budget for R&D only just covers the cost of "one hairy volunteer doodling away in the back room" and both the technical gap and the marketing gap between your original attempt and the competition's established architectures just continues to get wider and wider; until you're forced to do the only "economically viable" thing and become a patent troll.

There's a reason that every single architecture that's ever gone "head to head" against Intel/80x86 has died.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sun Jan 25, 2015 7:59 am 
Brendan wrote:
The only real trick Intel has left is bringing the RAM onto the same chip (which will cause another "one time only" bump in performance).

Are there signs of such plans or it is just some speculation about future possibilities? Just curious.
Brendan wrote:
Of course there's also the other problem - as soon as you break compatibility you're stuck in "no man's land" where nobody wants to write software for it because there's no market share, and there's no market share because there's no software for it;

That's why we need a vendor independent managed solution (like Java OS), which can provide us with a lot of ready to use software and a market, that is ready to accept new technology. And investments in this case are almost invisible comparing to the new environment creation from the ground up.
Brendan wrote:
There's a reason that every single architecture that's ever gone "head to head" against Intel/80x86 has died.

The ARM+Android/iOS is going to kick the Intel's @$$. Desktop PCs are loosing market share and mobile processors are gaining performance (like true 8*64-bit core Snapdragon 820, for example).


Top
  
 
 Post subject: Re: Do you agree with linus regarding parallel computing?
PostPosted: Sun Jan 25, 2015 9:28 am 
Offline
Member
Member
User avatar

Joined: Wed Jan 06, 2010 7:07 pm
Posts: 792
Brendan wrote:
Sure, everything can be a whole lot faster, as long as you never actually produce a chip and find out how badly it performs on real software. I'm sure Itanium looked good on paper too.

Of course there's also the other problem - as soon as you break compatibility you're stuck in "no man's land" where nobody wants to write software for it because there's no market share, and there's no market share because there's no software for it; and where the price is high because the volume is low and you can't afford the latest manufacturing process so "performance per watt" suffers, making the whole "catch 22" worse;

Yes, this is definitely possible and happens, but we're talking about the future end of Moore's law here. While it does look like a huge challenge to complete with x86, it's even less likely that everyone will just permanently give up when the limits of x86's design are reached, if there turns out to be anything better at all.

Further, I honestly don't think Itanium is all that relevant in this context. It made a lot of obvious-in-retrospect mistakes that have nothing fundamentally to do with the techniques it tried to champion, and because of its failure there hasn't been a lot of exploration of alternative implementations since. As I'm sure you remember, a lot of us got excited about the Mill CPU when it was announced precisely because it presented some concrete possibilities for those alternative solutions. Who knows if it'll work, but it's not a sure-fire failure either.

Brendan wrote:
There's a reason that every single architecture that's ever gone "head to head" against Intel/80x86 has died.

The way to do this (before my far-future prediction) is not to go head-to-head with x86. Like embryo mentioned, ARM is getting pretty big by powering mobile devices, something x86 is awful at. Another example of what could be done is GPUs, which don't even tend to have a defined macroarchitecture. Also, every few years we get a new generation of game consoles, where it's pretty much irrelevant what architecture you use and the designers just want lower cost (and for mobile ones, power usage). If you can find a niche where it's easy to switch out CPUs and offer some tangible benefit for that niche over x86 in performance/watt/$, there's a lot more precedent for success.

_________________
[www.abubalay.com]


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 68 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group