Announcement

Collapse
No announcement yet.

AMD Ryzen 7 5800X Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • artivision
    replied
    Originally posted by bridgman View Post

    Sorry, I didn't understand your post. Are you talking about building a chip with 16-32 separate small cores (executing 16-32 threads) or a single core with 16-32 execution units working off a single thread ? You mentioned "in order units" which implies 16-32 separate cores and threads, but earlier in the post you were talking about a single thread of a game.

    The closest interpretation I can come up with is something like the early superscalar microprocessors (dual pipeline but in-order) where two adjacent instructions could sometimes be executed in a single clock, but it seems really unlikely that approach could scale to the level you are describing.

    I believe the ARM programming model still requires sequential consistency within each core so all the out-of-order circuitry is still required - the relaxed memory model only affects the timing of when the results of one core become visible to other cores.

    AFAIK the only way to use 16-32 in order units and keep them busy would be a VLIW approach like PA-RISC/Itanium, where the compiler identifies groups of instructions that can be executed together and packs them into instruction bundles at ISA level. I don't think that is what you are suggesting though, is it ?
    You remembered Itanium good, but Denver and others are not compiler based. Today's Arm is out of order but not that heavily as x86. They don't use that deep scalar and deep prediction when they need to add an extra Integer Unit, they just add it with little effort. You can have four and they can have six or eight of those plus the FP path, using less transistors. That's because today's problems have visible boundaries as to what from a single thread can be executed independently. The root of the problem is that you add transistors trying to execute one more parallel instruction with algorithms than normally you cannot, you invent new maths. Just stop, let the thread that can execute three integer per circle to execute three and not four and the thread that can execute 32 do 32. Big simple core with four threads is the best.

    Leave a comment:


  • duby229
    replied
    Originally posted by atomsymbol

    Just a note: I have deleted the post to which you were replying, because the meaning of the term "super-pipelined" is very confusing (at least to me).
    https://www.google.com/imgres?imgurl...MygAegUIARCZAQ

    here's an image which describes it well.

    EDIT: What it shows is exactly what bridgman said, both phases of the clock cycle perform work on superpipelined architectures instead of only one phase on pipelined architectures. And what you said is correct that superpipeline does not imply superscalar as I had wrongly thought.
    Last edited by duby229; 13 November 2020, 11:16 PM.

    Leave a comment:


  • duby229
    replied
    Originally posted by atomsymbol

    It is impossible to combine the terms "pipelining" and "super-scalar" into "superpipelining" because pipelining and super-scalar are orthogonal terms and therefore the term "superpipelining" would have indeterminate meaning. In other words, super-scalar does not necessarily imply that the CPU is also pipelined - it is possible to design a super-scalar non-pipelined CPU.
    That was clarified already, and yeah that was the conclusion made as well. Thank you for clarifying further.

    Leave a comment:


  • duby229
    replied
    Originally posted by oiaohm View Post

    Overview Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently. Equally applicable to RISC & CISC. Whereas the gestation period between the beginning of RISC research and the arrival of the first commercial RISC machines was about 7-8 years, the first superscalar machines were available within a year or two of the word having first been coined [1987].


    Sorry to say superpipeline and superscaler are different things. Its shown on page 10. Superpipeline has half cycles and superscaler uses whole cycles only; Superscaler starts of 2 instructions per cycle where Superpipeline start of a instruction every half cycle that results in basically the same IPC uplift but they are done electrically differently. Performance and behaviour of superscaler and superpipeline is very alike form a software developer point of view. There are some performance advantages to superpipeline due to using half cycles over superscalar in particular corner cases but there is also disadvantages in circuit complexity due to having half cycles this effects power consumption of a superpipeline so it worse than superscaler all the time and also superpipeline requires more silicon area to pull off. It is valid to say that superscaler is superpipeline refined with its design defects removed but superscaler is not superpipeline.
    yeah I had Google searched all that after bridgman's post. thank you for clarifying that's all good information .

    Leave a comment:


  • oiaohm
    replied
    Originally posted by duby229 View Post

    Ok, thank you, I will keep that in mind.

    EDIT: Bolded by me, that is also how I define Superpipeline and the reason to superpipeline is to take advantage of ILP, hence Superscalar. I've never heard of a superscalar architecture that wasn't superpipelined. But that just may well be an oversimplification that i've never been corrected on till now. So Yeah, thank you for pointing that out.

    EDIT: I've been googling trying to get a clear definition of superpipeline and it was confusing until I looked at googles image search..
    Overview Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently. Equally applicable to RISC & CISC. Whereas the gestation period between the beginning of RISC research and the arrival of the first commercial RISC machines was about 7-8 years, the first superscalar machines were available within a year or two of the word having first been coined [1987].


    Sorry to say superpipeline and superscaler are different things. Its shown on page 10. Superpipeline has half cycles and superscaler uses whole cycles only; Superscaler starts of 2 instructions per cycle where Superpipeline start of a instruction every half cycle that results in basically the same IPC uplift but they are done electrically differently. Performance and behaviour of superscaler and superpipeline is very alike form a software developer point of view. There are some performance advantages to superpipeline due to using half cycles over superscalar in particular corner cases but there is also disadvantages in circuit complexity due to having half cycles this effects power consumption of a superpipeline so it worse than superscaler all the time and also superpipeline requires more silicon area to pull off. It is valid to say that superscaler is superpipeline refined with its design defects removed but superscaler is not superpipeline.

    Leave a comment:


  • bridgman
    replied
    Originally posted by duby229 View Post
    No, actually those ALU's ran at literally twice the core clock.
    Yes, after a bit more digging around I think you are correct.

    I had always discounted the Wikipedia description because it included the word "effectively" which could mean either a faster clock or using both rising and falling clock edges, but I found independent references to a "fast clock" running at 2x the clock used for the rest of the chip.

    Leave a comment:


  • duby229
    replied
    Originally posted by bridgman View Post

    I agree with most of your post but I believe you are using "superpipelined" where you should be saying "superscalar". Superscalar + pipelined does not equal superpipelined as far as I know.

    EDIT - I have now found two completely different definitions for "superpipelined" - one being use of both clock edges to double-pump the pipeline, and the other being either "more than 4 pipeline stages" or "breaking each of the usual CPU pipeline stages into smaller stages".

    Neither of them have anything to do with superscalar, however, although I guess you could argue that if you implemented the entire pipeline using both clock edges then you could execute 2 instructions per clock, but that is no different from using 1 clock edge and doubling the clock speed.
    Ok, thank you, I will keep that in mind.

    EDIT: Bolded by me, that is also how I define Superpipeline and the reason to superpipeline is to take advantage of ILP, hence Superscalar. I've never heard of a superscalar architecture that wasn't superpipelined. But that just may well be an oversimplification that i've never been corrected on till now. So Yeah, thank you for pointing that out.

    EDIT: I've been googling trying to get a clear definition of superpipeline and it was confusing until I looked at googles image search...
    https://www.google.com/imgres?imgurl...MygAegUIARCZAQ

    Your initial definition is the correct one.
    Last edited by duby229; 13 November 2020, 06:14 PM.

    Leave a comment:


  • duby229
    replied
    Originally posted by bridgman View Post

    They were the first OoO superscalar x86 CPUs and were pipelined, but I don't believe they were superpipelined.

    I normally think of pipelining as breaking logic down into smaller pieces separated by registers in order to allow higher clock speeds, and superpipelining as extending that to use both clock phases so that you can fit two pipeline stages into a single clock if the logic is sufficiently simple. I haven't heard the term superpipelining used with Intel's Netburst processors but I believe the double-pumped ALUs fit that description.

    https://en.wikipedia.org/wiki/NetBur...ecution_Engine
    Rapid Execution Engine

    With this technology, the two arithmetic logic units (ALUs) in the core of the CPU are double-pumped, meaning that they actually operate at twice the core clock frequency. For example, in a 3.8 GHz processor, the ALUs will effectively be operating at 7.6 GHz. The reason behind this is to generally make up for the low IPC count; additionally this considerably enhances the integer performance of the CPU. Intel also replaced the high-speed barrel shifter with a shift/rotate execution unit that operates at the same frequency as the CPU core. The downside is that certain instructions are now much slower (relatively and absolutely) than before, making optimization for multiple target CPUs difficult. An example is shift and rotate operations, which suffer from the lack of a barrel shifter which was present on every x86 CPU beginning with the i386, including the main competitor processor
    No, actually those ALU's ran at literally twice the core clock.

    Leave a comment:


  • bridgman
    replied
    Originally posted by duby229 View Post
    486 was pipelined CISC (Scalar architecture)
    Pentium and i586 was superpipelined CISC (Superscalar architecture)
    K6 and Pentium Pro was OoO and superpipelined RISC (OoO Superscalar architecture)

    Pentium and Pentium Pro are two very different things

    (what I mean by pipeline is a Scalar architecture and superpipeline is a Superscalar architecture. I just grew up with using the term pipeline in place of scalar because that's exactly what it means)
    I agree with most of your post but I believe you are using "superpipelined" where you should be saying "superscalar". Superscalar + pipelined does not equal superpipelined as far as I know.

    EDIT - I have now found two completely different definitions for "superpipelined" - one being use of both clock edges to double-pump the pipeline, and the other being either "more than 4 pipeline stages" or "breaking each of the usual CPU pipeline stages into smaller stages".

    Neither of them have anything to do with superscalar, however, although I guess you could argue that if you implemented the entire pipeline using both clock edges then you could execute 2 instructions per clock, but that is no different from using 1 clock edge and doubling the clock speed.
    Last edited by bridgman; 13 November 2020, 05:30 PM.

    Leave a comment:


  • bridgman
    replied
    Originally posted by duby229 View Post
    https://en.wikipedia.org/wiki/Pentium_Pro

    It was the NexGen K6 and then the Pentium Pro that was the first OoO Superpipelines and they were the first to decode x86 instructions into RISC-like uops.
    They were the first OoO superscalar x86 CPUs and were pipelined, but I don't believe they were superpipelined.

    I normally think of pipelining as breaking logic down into smaller pieces separated by registers in order to allow higher clock speeds, and superpipelining as extending that to use both clock phases so that you can fit two pipeline stages into a single clock if the logic is sufficiently simple. I haven't heard the term superpipelining used with Intel's Netburst processors but I believe the double-pumped ALUs fit that description.

    Leave a comment:

Working...
X