Announcement

Collapse
No announcement yet.

Intel & AMD Form An x86 Ecosystem Advisory Group

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Originally posted by Uiop View Post
    Of course, code density does matter, in the sense that if all our programs got 10 times bigger, no one would like it.
    My argument is: the reason why code density does not matter in our discussion is that code density is similar in the duel of RISC vs x86.
    Apparently, you have just attempted to take my claims out of the context.
    Well, let's review what you actually claimed:

    Originally posted by Uiop View Post
    ​​"Code density" is an argument that was relevant in the 1980's, no one cares about it today. If someone did care about code density, then more Linux applications would be compiled with gcc -Os.


    Of course, you can say that you were only talking about CISC vs. RISC, but you sure didn't characterize it as such. In fact, if you really meant it how you're now claiming, that's a funny way to put it.

    Anyway, your qualification seems to put this point to rest, so let's move on.

    Originally posted by Uiop View Post
    The problem is that the data is highly variable, although in 99% of cases the gains in speed will be similar to the losses in code size, by percentages.
    This would be a "typical" result:
    Code:
    speed size
    -O3 106% 107%
    -O2 100% 100%
    -Os 70% 85%​​​​​​
    Simply looking at the efficaciousness of an unspecified version (of presumably GCC?) at producing smaller executables and the size vs. speed tradeoff of its options, on an unspecified executable with an unspecified value of -march really does nothing to answer the question of how bad the problem is.

    To pursue the matter further, what I'd personally like to see would be stats on server apps of stalls due to instruction cache misses. I don't have this data, nor do I have time to go digging for it, just now. However, this is what I think it'd take to move the discussion forward in a meaningful way.
    Last edited by coder; 21 October 2024, 03:09 PM.

    Leave a comment:


  • coder
    replied
    Originally posted by Uiop View Post
    LOL, the very intro of that article contradicts you. It's examining the relative density of CISC vs. RISC, which wouldn't even be a worthy question if the author agreed with your assertion that density is a non-issue!

    Originally posted by Uiop View Post
    ​Irrelevant.
    Agree to disagree.

    Originally posted by Uiop View Post
    ​​Yes, I can.
    It is a reasonable argument. You are being unreasonable here.
    You can believe whatever you like. I don't accept it as such. You clearly have a winning personality.

    Originally posted by Uiop View Post
    ​​​Just because you refused to do a few quick searches on Google doesn't mean that there is no data available.
    That's your job. If you make the claim, you get to back it up.

    Originally posted by Uiop View Post
    ​​​​But for the purposes of our discussion, this is mostly irrelevant, because the end-effects are small.
    "Small" is a very squirrelly word. It really should be quantified, and then people can decide its relevance.

    Leave a comment:


  • coder
    replied
    Originally posted by Uiop View Post
    True, but you are nit-picking. It is clear that I made a slight error when typing that sentence.
    I didn't see it as such, but I accept your explanation and withdraw my criticism.

    Originally posted by Uiop View Post
    ​I have provided two pieces of evidence:
    (1) it is very uncommon to see: gcc -Os
    (2) code density is similar across today's popular ISAs

    Now you say that you "need more evidence".
    However, any of the two pieces of evidence I provided swings the conclusion highly in my favor, unless you can somehow prove both points to be false.
    You made assertions about # 2, without citing any sources. Furthermore, your point doesn't apply to AArch64, which has no support for compressed instruction streams. The fact that RISC-V saw a need to introduce support for does argues in favor of it being an issue.

    Originally posted by Uiop View Post
    ​So, you are now arguing against point (1) only.
    Both the "O2" and "Os" will optimize for both size and speed, but O2 will mostly optimize towards speed, and Os will optimize mostly towards code size. It says so in the docs.
    Again, we don't know how many sub-options of -O2 were rejected on the basis of code density. The GCC developers might not even be aware of why they yielded regressions. You really can't use the prevalence of -O2 as proof that code density is a non-issue.

    Originally posted by Uiop View Post
    ​​You are making a very convoluted argument, in such a way that it is not in any way clear that it defeats my point(1).
    I didn't make the original assertion. My goal was simply to show that your claim is not firmly rooted in data or facts.

    I'm not so confident, one way or another, but I think it does seem rather extreme to declare it's a non-issue. Code competes for space with data, in most of the cache hierarchy, and for memory bandwidth. In the sorts of a highly virtualized environments embraced by the cloud and most server operators, where many different VM images are executing, I expect this issue to be more prominent. That's as far as I go. I'm not saying it's a first-order concern, but I also think we shouldn't dismiss it. If you had actual data, to support your assertions, I'd love to see it.

    BTW, Intel even acknowledged the issue, in the course of their discussion of APX, by pointing out that they compensated for the overhead of adding an extra prefix byte by the effectiveness of 3-operand instructions at eliminating register-to-register copies (i.e. mov instructions) from the the instruction stream. No, it's not proof, but it does show they're at least sensitive to the potential criticism of APX hurting code density.

    Leave a comment:


  • coder
    replied
    Originally posted by Uiop View Post
    So, he can't say "all CPUs are RISC internally", because that is a red herring,
    He didn't.

    Originally posted by Uiop View Post
    ​since we are discussing how the front-end ISA affects the final performance. We are not discussing whether both CISC and RISC can be implemented in similar ways
    Seems to me that his post was primarily concerned with the historical definition and evolution of RISC. If that's what he wanted to comment on, then it's fair for him to focus on that. Heck, this is a comment thread on an article about the new X86 advisory group. So, you really have no claim to be more on-topic than his post.

    Originally posted by Uiop View Post
    ​​"Code density" is an argument that was relevant in the 1980's, no one cares about it today. If someone did care about code density, then more Linux applications would be compiled with gcc -Os.
    To support such a claim, I think you need more evidence than that! I think concerns about code density haven't gone away, especially because aggregate core performance (i.e. number of cores * per-core performance) is scaling faster than cache sizes or memory bandwidth. I have little doubt that it's one of the issues faced by current cloud workloads. We've certainly seen gains posted by BOLT, as an example that this stuff matters.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


    It's also a flawed test to say that people would use -Os if code size mattered. If we leave aside that most packages have build configurations aimed for general usage and just focus on the presumption that this statement applies to packages built for servers & cloud, essentially what you're claiming is that, for the issue of code density to be a real concern, the benefits of compiling with -Os must be so great as to overcome everything else done by -O2. Even if this is true for some packages, it cannot be true universally.

    Finally, it presumes that the very definition of -O2 doesn't already take into account some size vs. speed tradeoffs. Even if the GCC developers aren't consciously doing this, the do base many such decisions on real-world performance data.


    As an aside: if you're going to presume control over the subject of a discussion thread and put words into people's mouths, you're going to waste a lot of time & energy getting into pointless fights on here.

    Leave a comment:


  • r_a_trip
    replied
    Originally posted by coder View Post
    You mention "custom or POSIX" software, but then go on to mention Windows and MacOS? So, it would be "Windows, MacOS, POSIX, or custom". Don't forget Android, as well.

    Also, where do you get the idea that Snapdragon X is a souped up phone SoC? Apple is closer to that than Qualcomm, because the cores in Apple's M-series are an outgrowth of their custom cores designed for their iPhone and iPad. The cores used in Snapdragon X came from Nuvia, who were designing ARM cores for servers, before Qualcomm bought them. The whole reason Qualcomm spent all that money, on the acquisition, was to get high-performance cores to help them enter the laptop market. They're roughly in the same league as Apple's M-series, which isn't surprising since a lot of Nuvia's key designers came from Apple.

    How do you discern a software ecosystem? Just curious.
    Yes, MacOS runs on ARM these days and it also directly ties you to Apple. Yes, we have Asahi Linux (running more or less), but if that project throws in the towel, back to MacOS. Apple is anathema to me. If Apple ran the computing world, we would be wishing for the Microsoft of the 90's, when it comes to computing freedom. So them having a functional ARM desktop is moot. I don't touch poison.

    Yes, Windows runs on ARM and that is about it. Discernable software eco system? Simple, how many ISVs have ported their software to run on Windows for ARM? Precious little. With Windows on ARM you get a very pale and anemic echo of what Windows is on x86.

    Android? Phone OS. Plain and simple. Could be more, but Alphabet isn't interested in it. Again more limitations than helpful solutions.

    ARM running custom (specific problem solving) or POSIX is simply if you want to have something that is usable and has a software selection, you are confined to these options. Did I mention Apple is off the table because of their business model?

    From someone looking to have their personal computing needs comfortably met, ARM is a hodgepodge of non-standard custom SoCs, all dependent on non-standardised OS images. Choosing ARM mostly ties you to one vendor for the board, the processor and GPU and their own tailored OS images. Fully dependent on that party for support and the duration there of. So yes, ARM is well supported, just awfully fragmented and it ties you to one party or another for support. Even if Linux distributions provide an image for a certain board, you can't be certain they will provide it for another board of a different vendor. If that is a-ok for you, then yes ARM is perfectly fine.

    For people who like to choose their mainboards based on what they provide and have the option of choosing from multiple vendors and then getting to choose from multiple AIBs, RAM vendors, Storage vendors, OS vendors, etc. ARM just doesn't cut it. (Even in x86 laptops there are multiple options to choose from in a lot of tiered segments.) x86 just holds the better cards. x86 offers the genuine personal computer experience. ARM gives mostly options to solve specific problems by offering tailored solutions. Neither is bad, but ARM simply doesn't cater to mainstream personal computing like x86 does. I suspect that will remain the case for at least multiple decades.​

    Leave a comment:


  • coder
    replied
    Originally posted by Uiop View Post
    Well, I'm honored by a reply from an AMD representative.
    Please note that I'm usually quite critical, and by default I have no respect for any authorities. So, you won't be spared.
    Also note, since you are affiliated with AMD (a major CISC CPU manufacturer), I have an immediate right to a presumption that you are biased.
    First, he actually retired earlier this year. Second, he's from the GPU group.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


    Finally, look at his registration date & post count. He's been around the block enough times to have no illusions about what to expect from this forum. Some people pester him with complaints and support requests, others with product suggestions, etc. There's often some respect and deference shown, but this forum has all types.

    It's pretty amazing he's put up with it all, especially through all the turmoil that's AMD's driver and userspace have undergone, the past decade or so. If that hasn't sent him packing, I doubt anything you say will do it.

    Leave a comment:


  • coder
    replied
    Originally posted by Dukenukemx View Post
    The evidence is the Pentium Pro, because that's why people hated the Pentium Pro. A lot of 32-bit code had some 16-bit code in it, which would slow down on Pentium Pro's. I'm not sure if that's the case with X86S, but I wouldn't be shocked if it was.
    You can't reason this stuff by analogy, though. So far, we have no indication X86S will disadvantage 32-bit userspace and it's certainly not going to affect 32-bit arithmetic performance, since that's still a mainstay of arithmetic in 64-bit apps.

    Originally posted by Dukenukemx View Post
    ARM has no standards, and it shows. Just look at Apple's silicon and Qualcomm's Snapdragon X chips, as none run Linux out the gate.
    I doubt if can boot an old Linux distro on new x86 hardware, either. Even if you can get it to run, it won't run well. This stuff doesn't just work because of "standards", but rather takes time, effort, and devotion on the part of Intel and AMD. After all that, it's often the case that Michael has to apply patches to a bleeding edge kernel, for new hardware to work well enough for him to benchmark.

    Leave a comment:


  • drakonas777
    replied
    Originally posted by coder View Post
    I have a different take on this. I think the main scenario where you form a consortium, like this, is when you regard something as no longer the primary strategic vehicle for your company. It's basically like putting a legacy product into maintenance mode. They'll keep delivering incremental refinements of the ISA, but I think its biggest days are behind it. I don't expect to see any new announcements as big as things like APX, X86S, and AVX10.

    Sure, Intel and AMD will keep turning the crank and trying to produce competitive products, as long as there's a market, but I think their energy is probably going to focus on ARM or RISC-V and AI, simply because that's where their big customers and partners are going. It might've been their partners who pushed the two together, but the reason they relented is only because they decided they have more to gain than to lose by it.
    Fundamentally I agree. However, we lack official confirmation(s) about the effort allegedly being put in upcoming ARM based consumer products. Aside those rumors about something ARM based from NVIDIA and AMD in 2025 and AMD's past work on K12 we really don't know anything concrete. Personally I do believe that x86 is going effectively die, but it will take longer than a lot of people expect. I'd say the most likely scenario is that in upcoming ~10 years or so most of consumer PC ARM products will be targeted at portable market and the big three (AMD/NVIDIA and Intel, though Intel seems to double down on x86 without any rumors on ARM based stuff) will use this portable segment like Windows ultrabooks/transformers/2in1s to boost Windows ARM ecosystem and make/force devs to be more familiar with this platform, since this segment has the most volume in units I think. Then, eventually, they will switch to ARM in all HPC areas, so consoles/desktops/WS and eventually servers. But it will take time, especially in WS/mobile WS/HPC because of general "inertia". Initially there was quite a lot hesitation to jump from Xeons to EPYCs even though EPYCs had far better value proposition and we are basically talking the same ISA. Also to this date it's even really hard to find "brand" workstations and mobile workstations (so things like Thinkpad P1s, Dell Precisions etc.) with Ryzen/EPYC platform, the majority of them are Intel based only. Perhaps all these do not make a big part in the whole market considering AI fuzz and whatnot, but its pretty conservative part of the market. If Intel and AMD are going to continue develop x86 in a form of extra IPC, advance packaging etc. (and according to rumors they intend to, especially considering Intel "royal core" and innovations there, which allegedly should result in dynamically changing core number/IPC combo at the CPU runtime) I don't really see how ARM is going to completely kill x86 in the next 15 years. Perhaps I am work - there is always such possibility.

    Leave a comment:


  • AdrianBc
    replied
    Originally posted by Uiop View Post
    I'm loving your analysis.
    What you state above actually confirms what I have said. I said "three times" less power, and it appears that I was too conservative, as you got "three to five times".


    That is not what is being discussed.
    You have to compare a program with complex arithmetic instructions vs. a program that replaces all the arithmetic with register-register moves.


    Also, I was talking mostly about the integer unit, which is 64-bit, for my "three times less power" estimate.
    For SIMD units, I would expect that, as the operand width grows over 128-bits, increasingly higher percentage of power would be used by the execution stages.


    Yeah, but you didn't test it correctly. You have to replace arithmetic instructions with moves.


    I'm not surprised at all by your result for 512-bit vectors, it is what I would expect.

    Well, if your estimate is only for programs which are not doing any intensive computations, either scalar integer (e.g. computations with big integer numbers, like for cryptography) or vector FP or integer computations, then I agree with you, i.e. that for such programs the power consumed in the control part of the CPU is several times greater than the power consumed in the execution units.

    Moreover, I agree that for such programs the power consumed by the instruction decoder, which is the weak point of the CPUs implementing the x86 ISA, represents a proportionally greater part of the power consumption.


    Nevertheless, for such programs the CPUs do not reach the thermal power limits for which a normal computer is designed, so the lower efficiency for such programs is much less important than when the CPUs consume the maximum power and their performance is constrained by this.

    Even for such programs, the difference in power consumption between CPUs with a similar microarchitecture, but implementing different ISAs, e.g. x86-64 and Aarch64 might be of several tens of percents, but there is no chance for one of them to consume double or more than the other. On the other hand, differences in the microarchitecture, e.g. completely different branch predictors, or differences in the manufacturing process used for them can easily make one CPU consume a power many times greater than the other.

    The first RISC CPUs were dramatically more efficient than the alternatives, but that is because at that time the CPUs did not include out-of-order execution and speculative execution, which have greatly increased the complexity and power consumption of the control parts of the CPU, leaving the instruction decoder as only a small part of it.

    The most important disadvantage of the x86 ISA is the difficulty in determining simultaneously the length of each instruction in a sequence of instructions, which has limited for a long time the simultaneous decoding to 4 instructions and only recently this limit has been increased to 6 and then 8 simultaneous instructions, values that had been reached much earlier for ISAs with a fixed-width encoding. like Aarch64. However that has been successfully mitigated since more than a decade ago by the use of a cache memory with already decoded instructions, which enables the simultaneous execution of more instructions than can be decoded in a clock cycle and it also reduces the power consumption used for decoding by avoiding the repeated decoding of the same instructions.


    Nowadays, most of the principles on which the RISC designs were based have become obsolete, with the exception of the fixed-width instruction encoding, which still provides an important simplification and also with the exception of the separation between register-register arithmetic/logic instructions and load/store instructions, without which it would be much more difficult to have an efficient fixed-width encoding (originally this separation was useful not only for the instruction encoding, but also for allowing high performance in an in-order CPU with high-latency memory, but now all high-performance CPUs use OoO execution so this is no longer important, except in cheap microcontrollers).
    Last edited by AdrianBc; 20 October 2024, 07:15 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by r_a_trip View Post
    Software selection basically custom or POSIX. Of course we have Apple peddling their souped up iPhone SOC, tied to MacOS, which they can do as the odd duck in computing. Qualcom trying to emulate it with their souped up phone SOC and Windows,
    You mention "custom or POSIX" software, but then go on to mention Windows and MacOS? So, it would be "Windows, MacOS, POSIX, or custom". Don't forget Android, as well.

    Also, where do you get the idea that Snapdragon X is a souped up phone SoC? Apple is closer to that than Qualcomm, because the cores in Apple's M-series are an outgrowth of their custom cores designed for their iPhone and iPad. The cores used in Snapdragon X came from Nuvia, who were designing ARM cores for servers, before Qualcomm bought them. The whole reason Qualcomm spent all that money, on the acquisition, was to get high-performance cores to help them enter the laptop market. They're roughly in the same league as Apple's M-series, which isn't surprising since a lot of Nuvia's key designers came from Apple.

    Originally posted by r_a_trip View Post
    but having no discernable software ecosystem around Windows for ARM.
    How do you discern a software ecosystem? Just curious.

    Leave a comment:

Working...
X