Announcement

Collapse
No announcement yet.

AMD Ryzen 7 5800X Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • arQon
    replied
    s/3400/5400/ - sorry about that.

    As for the rest, well, I guess we'll see how things turn out over the next year or so. I haven't "missed" any of the things you imagine, and it's clear you still don't understand how this business actually works, and are still hung up on the *absolutely incorrect* notion that a 5400 part can only come from a stockpile of dies "with like 5 working cores or less".

    Obviously it would be stupid to make "5400" parts NOW, as you say, but "now" is just a few weeks after the Zen3 launch. Once that initial demand is over, the situation becomes very different. Zen4 may show up before the 5400, or it may not. I strongly doubt it, but at this stage it's all guesswork. Like I say, we'll see.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by arQon View Post
    The 4-core/etc parts MAY be defective CCXs, but they may also be fully-functional CCXs with cores lasered off (or disabled by other means). It's called product segmentation, and you don't understand it at all. (Though it's counter-intuitive, so I don't blame you for that). I'll see if i can remedy that...
    This is you have missed something how the AMD market is segmented and the demards.

    Originally posted by arQon View Post
    Think of it like this: even if AMD was getting perfect yields, as soon as they have ANY "surplus" dies - not just defective ones - it would STILL be worth castrating some 5600s to "3400"s, because it's still a double win: they make less profit than they would if the part was sold AS a 5600, but it ISN'T being sold as that, because that's what the definition of "surplus" is. But it's still *a* profit (and technically, a larger one since the profit on an unsold unit is 0), and it *denies Intel that sale*.
    Question is where is AMD going to have surplus dies and the answer is does not fit your idea. You have looked down the stack you have failed to look up. The released EPIC cpus are all Zen 2 at this stage. EPIC Zen 3 release will be needing dies. To be correct 8 and 6 core dies.

    This is a simple over look Ryzen 5 5600, AMD Ryzen 9 5900X and not released yet EPIC Zen 3 cpu all need 6 core chiplets. The 8 core chiplets will also be going out though three different doors at least. Threadripper solutions are another door. 8 and 6 core chiplets in the AMD model has 3 to 4 outlets. So AMD can sell as many of them as they can make in the 3-8 core chiplets. You have missed that a Ryzen 5 5600 is a downgrade profit from the same chiplet sold in AMD Ryzen 9 5900X and will be downgraded profit to the same chiplet sold in a Threadripper or Epic part.

    4 core chiplets those have at best 2 outlet in the 3400 series but in the Zen3 series only 1 at best. Remember you still have customers in servers buying Zen 1 and Zen 2 based chips so you are going to have the failed yield from them to put into 4 core chip market.

    Lets say AMD does release a Ryzen 3 5xxx is this going to hinder AMD selling the failed Zen2 parts that come the "3400"s yes it is.

    There is very little benefit to AMD to ruin good chiplets as they need as many good Zen3 chiplets as they can get.

    You have missed that AMD have delayed the Epic and Threadripper releases using Zen3 because there is not enough Zen3 6 and 8 core chiplets to go round then you suggest the idea of lets intentionally ruin some.

    This is why I suspect we might be waiting until Zen 4 before we see a new Ryzen 3 as it just does not make sense with AMD demards to make a Ryzen 3 for Zen3 at the moment unless there happens to be enough chips failed with like 5 working cores or less. With yield increases when you stay at a nm level there may not be the failed production or at least not enough to justify disrupting the "3400"s sales..

    Leave a comment:


  • arQon
    replied
    Originally posted by oiaohm View Post
    Thing you have missed in the Ryzen/Zen model the lower-margin parts are the defective parts not suitable to make higher end models.
    And the thing YOU have missed is that that lie remains untrue no matter how often it's repeated.

    The 4-core/etc parts MAY be defective CCXs, but they may also be fully-functional CCXs with cores lasered off (or disabled by other means). It's called product segmentation, and you don't understand it at all. (Though it's counter-intuitive, so I don't blame you for that). I'll see if i can remedy that...

    Think of it like this: even if AMD was getting perfect yields, as soon as they have ANY "surplus" dies - not just defective ones - it would STILL be worth castrating some 5600s to "3400"s, because it's still a double win: they make less profit than they would if the part was sold AS a 5600, but it ISN'T being sold as that, because that's what the definition of "surplus" is. But it's still *a* profit (and technically, a larger one since the profit on an unsold unit is 0), and it *denies Intel that sale*.

    The concept doesn't sit well with us technical folk, because it's "bad" use of resources, when we spend all our time trying to optimise that. But from a business perspective, if you had sufficient cash on hand it would even make sense to sell those dies BELOW cost, as long as that deficit was less than the profit the your competitors would have made if the sale had gone to them instead.

    HTH
    Last edited by arQon; 16 November 2020, 01:27 AM.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by artivision View Post
    That proves my point. The bigger the instructions buffer is, less prediction required.
    Its not that simple wish it was. Bigger instruction buffer can allow more prediction cost not less because you have more space to read future predicted instructions into that can be wrong. Think the old story of the snail going 2 bricks up the wall then sliding down 1 brick before going back up the wall. Horrible as it sounds that snail is a rough representation of out of order execution final performance. Giving more space for predictions means when wrong there can be a large cost.

    Originally posted by bridgman View Post
    BTW something interesting just occurred to me about the ROB's on ARM cores being as big or bigger than on x86 these days. Since AFAIK the ARM cores do not support SMT and the ROB is split between threads when SMT is active, the ARM cores arguably have even more OoO HW relative to x86 on a per-thread basis.
    This is always the hard problem with optimising cpu design. Out of Order is double sided sword. Not enough you don't get best utilisation of the silicon too much you waste time doing too much work you are throwing away. X86 designs have had a long time to be optimising this for the instruction set. Arm is fairly new to the out of order game.

    This stuff is not more or less is better. Yes out of order is the three bears porridge problem where there is a goldilocks value and that value changes based on instruction set and cpu design so this makes comparing between designs if values are right invalid thing todo. If you are not at the goldilocks value you either are not using the silicon the best you can or wasting too much throwing away never used work.

    Leave a comment:


  • bridgman
    replied
    Originally posted by artivision View Post
    That proves my point. The bigger the instructions buffer is, less prediction required.
    I'm not sure what you mean by "prediction" - I don't think you are referring to a branch predictor, are you ? That's the only "prediction" I can think of in a modern processor other than cache and memory prefetchers.

    Originally posted by artivision View Post
    They connect seven integer units with scalar that doesn't seem entirely hardware. The problem with your architecture is that when you add an extra integer unit, you want to fill it on every algorithm. Just accept that some apps will use 3 and some 7, that's all.
    I don't think I understand your point. When you say "my architecture" are you talking about the use of hardware dependency analysis at runtime, ie every OoO processor on the market except for Denver, Itanium and other VLIW designs ?

    All OoO processors keep their in-flight instructions in a reorder buffer (I think ARM sometimes calls it a commit queue) in program order then instructions execute out of order as soon as their dependencies are available.

    What you describe about "some apps will use 3 and some 7" is exactly how all OoO processors operate today... the number of instructions executed in parallel is a function of the ILP that can be extracted from the code within the instruction window. On average a modern CPU executes 2-3 instructions per clock but can peak at 6-8.

    BTW something interesting just occurred to me about the ROB's on ARM cores being as big or bigger than on x86 these days. Since AFAIK the ARM cores do not support SMT and the ROB is split between threads when SMT is active, the ARM cores arguably have even more OoO HW relative to x86 on a per-thread basis.
    Last edited by bridgman; 14 November 2020, 05:24 PM.

    Leave a comment:


  • artivision
    replied
    Originally posted by bridgman View Post

    AFAIK Denver is effectively compiler based VLIW, although the compiler (dynamic translator) runs on one of the cores in parallel with execution rather than at build time. Denver uses VLIW bundles at the micro-op level and AFAIK operates in two modes:

    - ARM native decode generates bundles during execution with up to 2 instructions if dependencies allow
    - dynamic translation generates bundles during offline optimization with up to 7 instructions if dependencies allow

    It's not clear if the Carmel core in Xavier is an outgrowth of Denver or a HW-based OoO design. I have seen statements both ways.

    For what it's worth, modern ARM designs appear to have at least as much out-of-order hardware as x86. The Cortex X1 has a 224 entry ROB, while Apple's M1 is reported to be up in the ~600 entry range. Latest x86 designs are in the 224-352 range.
    That proves my point. The bigger the instructions buffer is, less prediction required. They connect seven integer units with scalar that doesn't seem entirely hardware. The problem with your architecture is that when you add an extra integer unit, you want to fill it on every algorithm. Just accept that some apps will use 3 and some 7, that's all.

    Leave a comment:


  • bridgman
    replied
    Originally posted by artivision View Post
    You remembered Itanium good, but Denver and others are not compiler based. Today's Arm is out of order but not that heavily as x86. They don't use that deep scalar and deep prediction when they need to add an extra Integer Unit, they just add it with little effort. You can have four and they can have six or eight of those plus the FP path, using less transistors. That's because today's problems have visible boundaries as to what from a single thread can be executed independently. The root of the problem is that you add transistors trying to execute one more parallel instruction with algorithms than normally you cannot, you invent new maths. Just stop, let the thread that can execute three integer per circle to execute three and not four and the thread that can execute 32 do 32. Big simple core with four threads is the best.
    AFAIK Denver is effectively compiler based VLIW, although the compiler (dynamic translator) runs on one of the cores in parallel with execution rather than at build time. Denver uses VLIW bundles at the micro-op level and AFAIK operates in two modes:

    - ARM native decode generates bundles during execution with up to 2 instructions if dependencies allow
    - dynamic translation generates bundles during offline optimization with up to 7 instructions if dependencies allow

    It's not clear if the Carmel core in Xavier is an outgrowth of Denver or a HW-based OoO design. I have seen statements both ways.

    For what it's worth, modern ARM designs appear to have at least as much out-of-order hardware as x86. The Cortex X1 has a 224 entry ROB, while Apple's M1 is reported to be up in the ~600 entry range. Latest x86 designs are in the 224-352 range.

    Leave a comment:


  • uid313
    replied
    Originally posted by tildearrow View Post

    Why are you using a browser benchmark? That's comparing apples to oranges.... I mean, Safari on A14 vs. Chrome on Zen?

    If you want to see the truth you should compare Chrome vs Chrome, and I mean like REAL Chrome (not the App Store one as that is a Safari skin).
    Geekbench 5 is not a browser benchmark. The results are on the web, viewable in a web browser, but the benchmark is an app you download in the Apple App Store and on other stores. So it has nothing to do with web browser or Safari or Chrome.

    Leave a comment:


  • tildearrow
    replied
    Originally posted by AdrianBc View Post


























    What is this blank space supposed to be? ;p

    Leave a comment:


  • tildearrow
    replied
    Originally posted by uid313 View Post

    We already know how good the Apple A14 Bionic performs and the M1 performs even better.

    and
    Benchmark results for a MacBookAir10,1 with an Apple Silicon processor.




    A single-core score of 1719, what this means is that it is fastest than the fastest thing both Intel and AMD offers, while using much, much less power.
    Why are you using a browser benchmark? That's comparing apples to oranges.... I mean, Safari on A14 vs. Chrome on Zen?

    If you want to see the truth you should compare Chrome vs Chrome, and I mean like REAL Chrome (not the App Store one as that is a Safari skin).

    Leave a comment:

Working...
X