Announcement

Collapse
No announcement yet.

SiFive U8-Series To Offer Much Greater RISC-V Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by blargh4 View Post
    that battle has already been fought, and x86 won, because the ISA just isn't that important for high-performance microarchitectures
    I'm not sure if that battle had been won by x86. It appears that Apple's mobile A13 is reaching SPECint levels of Intel Skylake and surpassing that of Ryzen3 (both desktop CPUs). Either Apples engineers are that good or the ISA makes it easier to extract performance.

    https://www.anandtech.com/show/14892...d-max-review/4

    Thanks for interesting insights.

    Comment


    • #32
      Originally posted by pkese View Post

      I'm not sure if that battle had been won by x86. It appears that Apple's mobile A13 is reaching SPECint levels of Intel Skylake and surpassing that of Ryzen3 (both desktop CPUs). Either Apples engineers are that good or the ISA makes it easier to extract performance.
      I'm only referring to the period in the 90s when there was a lot of competition from various RISC designs (IBM Power, MIPS, Sparc, HP PA-RISC, Alpha) for high-performance CPUs. And they were faster here and there, but in the end Intel had the money and people to throw at optimizing every inch of their X86 designs and demonstrated decisively that RISC's clear advantage for simpler CPUs didn't scale to sophisticated OoO designs, and that a bad ISA could be worked around. I'm sure that X86 is not the final word on CPU performance, and that a modern mostly-clean-sheet ISA like armv8 makes it easier for Apple to get that performance in a smaller power budget, but they're also generations ahead of every other ARM64 design, so I suspect some Apple microarchitectural secret sauce has more to do with it than the instruction set.

      Comment


      • #33
        Originally posted by pkese View Post
        There's a footnote quoting "using iso-process & iso-frequency methodology" - but I may have read it carelessly myself, because there were two footnotes and unlike ARM's one, this one was when comparing to their own U7. Their methodology for ARM was not explicitly revealed, so you may be right, or you may be not.
        Oh gee, I missed the "1" (looks really like a suffix). Sorry.

        Comment


        • #34
          Originally posted by blargh4 View Post

          I'm only referring to the period in the 90s when there was a lot of competition from various RISC designs (IBM Power, MIPS, Sparc, HP PA-RISC, Alpha) for high-performance CPUs. And they were faster here and there, but in the end Intel had the money and people to throw at optimizing every inch of their X86 designs and demonstrated decisively that RISC's clear advantage for simpler CPUs didn't scale to sophisticated OoO designs, and that a bad ISA could be worked around. I'm sure that X86 is not the final word on CPU performance, and that a modern mostly-clean-sheet ISA like armv8 makes it easier for Apple to get that performance in a smaller power budget, but they're also generations ahead of every other ARM64 design, so I suspect some Apple microarchitectural secret sauce has more to do with it than the instruction set.
          Although I'm not a big fan of RISC -- they will eventually add more and more instructions to their ISAs to handle new tasks, but I'd to say this story is not complete.

          During late 90s DEC had the fastest CPU (Alpha) in the word, but later this architecture was phased out in favor of Intel's Itanium plan.
          The similar thing happened to a lot of other architectures, because for some reason everyone was thinking Itanium was the future.
          So the RISC side really was never given a chance to show how far they can push.

          Comment


          • #35
            One can make a credible argument that risc won the "risc vs cisc war", in the sense that about every architecture since has been, if not straight risc, at least heavily influenced by it.

            Now, x86 is certainly testament to the fact that with sufficient thrust pigs can fly.

            Risc-v is a nice clean risc isa, and with the C extension density is comparable to or better than x86-64 and arm64, but the "innovation" here is the licensing model rather than any particular detail of the ISA itself. I hope it succeeds, but we'll see.

            Comment


            • #36
              To complement what blargh4 wrote, we were discussing higher-end chips, not the low-end.

              At the higher-end of the performance spectrum the ISA matters less and less. For instance, the shortcoming of RISC-V I previously mentioned, the lack of reg + reg addressing mode will be masked by instruction fusion. That's why I think high-end RISC-V will need the same level of micro-architecture wonders we see in every high-end chip no matter the ISA. There's no magic bullet.

              I just read the SiFive announcement, and it sounds like it was written by a newcomer to marketing BS. Overuse of the "incredible" word (this made me think of Jobs' "revolution"), missing the fact that 1.4x and 2.3x do not make 3.1x but 3.2x (marketing droids should know about rounding).

              Comment


              • #37
                Originally posted by ldesnogu View Post
                At the higher-end of the performance spectrum the ISA matters less and less.
                Complex decoders in X86 with additional pipeline stages matter in both area, power and performance (each stage added to the front of the pipeline incurs a cost at mispredicted branches). Then there are uOP caches that also take area and power (it is hard to put all these transistors close to the CPU core without incurring extra latency).

                The cost of fusing a reg+reg addressing instructions in a RISC-V is paid at occurrence of each such instruction pair. The cost of extra pipeline stages in X86 is paid at every single instruction.

                It used to be the case that any CPU performance problem could be solved by adding more transistors. However times have changed. Nowadays the bottleneck is power consumption. You can have one CPU running at 5 GHz, but can you have 16? And at what frequency? In this era, each transistor that you can spare is a transistor that doesn't produce heat. Extra pipeline stages and uOP caches mean more transistors, more heat, longer signal paths, longer waits and thus lower frequency and overall performance.

                We'll only find this out when RISC CPUs match engineering parity with X86. It used to be the case that Intel was able to invest more money into engineering and production than the rest. Now that competition is catching up thing will start to reveal themselves. I don't think that Apple's CPU engineering department is better than the one at Intel or AMD (definitely both Intel and AMD have more experience), yet they appear to be able to produce outstanding results.

                I think that Apple just has easier job to design a performant CPU. Apple is now at performance parity, but it is unlikely that they will stop anytime soon. And I think they may as well outcompete X86 architectures. Time will tell.

                Comment


                • #38
                  some of your points might be accurate but I really think you are under playing the importance of ARM64 as a step forward for the new age of mega chips. Backward compatibility is important for now, which is why many ARM 64 bit chips can execute ARM 32 but instructions. However companies looking forward, Apple for one, are driving developers to the 64 bit world like it or not. Apples aggressiveness here is likely due to a long term plan to remove all that cruft from their 64 bit ARM hardware.

                  interestingly X86 these days is largely cruft that Intel seemingly can’t do anything about. They don’t have a clean way to transition to 64 bit. In fact I find the Intel X86 solution to be a bit of a joke these days. Intel could lean out the 64 bit variants but for whatever reason never has, the fact that they haven’t done so is likely to haunt the company moving forward.

                  with ARM (Apple) 64 bit chips you have a much cleaner architecture moving forward. I put Apple in paren here because I’ve often wondered how much of ARM 64 is ARM tech and how much is Apple tech. In the end it doesn’t matter as the industry is getting a significantly better processor vs X86. It is good enough that I see RISC-V having a very hard time getting significant design wins.

                  perhaps the most important part of the discussion here is operating system and compiler quality. These days the need for emulation is massively reduced due to the well done SDKs that abstract away any need to think about hardware for the overwhelming majority of software. Stable cross compiling helps too. The point here is that the effort required to move software to a different processor architecture is greatly reduced or eliminated by modern operating systems. It is easy to sit here and dream about a performing ARM based laptop because most of Linux already runs on ARM. Not just the core operating systems but entire distros.

                  We really have moved to a time where the processor can be valued for things other than its instruction set. Low power (long run time) is a big feature. I can actually imagine a day when putting solar cells on a laptop, to supplement the batteries, might actually make sense! More importantly mobile has demonstrated clearly that special function units are far more useful moving forward than the CPUs ALUs. We live in an interesting time here.

                  Originally posted by pkese View Post

                  There's a footnote quoting "using iso-process & iso-frequency methodology" - but I may have read it carelessly myself, because there were two footnotes and unlike ARM's one, this one was when comparing to their own U7. Their methodology for ARM was not explicitly revealed, so you may be right, or you may be not.

                  Regarding me being the salesman - I'll leave the guessing to you.

                  The reason I get excited about RISC-V is that all other instruction sets were designed in the era when numbers of transistors per chip was counted in thousands and the intent of instruction sets was to improve performance relative to that era technology (X86 was in the 'microcoded' era, ARM was in 'single issue pipeline and no cache' era).

                  In the meanwhile technology had advanced, we re counting transistors in millions and billions, there's superscalar pipelines, out-of-order speculative execution, register renaming and other stuff in modern CPUs. For 90% of the ISA this doesn't matter a lot, but in edge cases, there's a lot of extra cruft in the implementation of a CPU there simply due to needing emulate quirks of archaic ISAs. And these are often things that are hindering the whole architecture of the CPU (think of ARM32 conditional instructions - maybe a good idea for 1985 to save a few transistors, but definitely a great pain for implementations in 2019; yes they have dropped them in ARM64 ISA, but all modern CPUs still implement them for backward compatibility reasons).

                  RISC-V was designed to match state of the art HW of 2010s rather than 1970-1980ies. This is why they don't need extra pipeline stages for decoding X86 instructions into internal core uOPs (and consequenially they have shorter pipeline stalls at branch mispredictions). They don't need uOP caches. Instruction fusing is simpler etc.

                  RISC-V processors have been shown to match performance levels of other ISA CPUs with RISC-V having considerably less transistors (and consuming less power). A lot of that is due to not having to deal with backward compatibility (X86-32, ARM-32 are both extremely complex and most 64-bit CPUs still implement them), but a large part is simply that a modern ISA is easier to implement efficiently.

                  Comment


                  • #39
                    Originally posted by cipri View Post

                    Yes some people have strange dreams.
                    there are already completely open source risv cpus. Do you think rpi devs are interested??? No, they are broadcom employees, they want to sell the broadcom chips!
                    This will happen when broadcom will start to produce and sell risc-v chips! And very likely, this will happen in the next years.
                    People also have strange postings.

                    companies like Qualcomm will go RISC-V when there is real demand to go in that direction. Right now wishful thinking is not demand. The problem is magnified by the reality that RISC-V may never catch up performance wise. It would be tough marketing to tell your customers that Qualcomm is switching g to a low performance processor to save on expenses. ARM already has quality low performance chips and Qualcomm is few to chase Apple with high performance ones.

                    as for RPI there is one good reason for them to go RISC-V and that would be to pursue their educational mission. A truely open piece of hardware, that is well documented, would be a great place to teach about hardware design. Honestly though RPI seems to be more interested in software and peripherals that processor design. Even with software they are not well managed, with the availability of 64 bit hardware they should have moved quickly to a 64 bit only distro. This is where RISC-V might have a real opening in the education market, that is bypass the legacy RPI decisions to build a better mouse trap. The problem is getting the backing of a large company to drive a low cost solution to market.

                    Comment


                    • #40
                      Originally posted by pkese View Post

                      Complex decoders in X86 with additional pipeline stages matter in both area, power and performance (each stage added to the front of the pipeline incurs a cost at mispredicted branches). Then there are uOP caches that also take area and power (it is hard to put all these transistors close to the CPU core without incurring extra latency).

                      The cost of fusing a reg+reg addressing instructions in a RISC-V is paid at occurrence of each such instruction pair. The cost of extra pipeline stages in X86 is paid at every single instruction.

                      It used to be the case that any CPU performance problem could be solved by adding more transistors. However times have changed. Nowadays the bottleneck is power consumption. You can have one CPU running at 5 GHz, but can you have 16? And at what frequency? In this era, each transistor that you can spare is a transistor that doesn't produce heat. Extra pipeline stages and uOP caches mean more transistors, more heat, longer signal paths, longer waits and thus lower frequency and overall performance.

                      We'll only find this out when RISC CPUs match engineering parity with X86. It used to be the case that Intel was able to invest more money into engineering and production than the rest. Now that competition is catching up thing will start to reveal themselves. I don't think that Apple's CPU engineering department is better than the one at Intel or AMD (definitely both Intel and AMD have more experience), yet they appear to be able to produce outstanding results.
                      Actually I do think that Apples engineering team is better than Intel’s maybe even AMD’s. They have purchased an incredible array of engineering companies to get them to the state they are now. PA Semi is perhaps the one that got the most news but there have been many others.
                      I think that Apple just has easier job to design a performant CPU. Apple is now at performance parity, but it is unlikely that they will stop anytime soon. And I think they may as well outcompete X86 architectures. Time will tell.
                      Apples chips are rather amazing if you ask me as only a small portion of the chip area is dedicated to supporting the ARM instruction set. Large swaths of die space go to video and other special processing chores. Apples A series chips are one of the reasons I often mention that the old ALU centric design is a thing of the past.

                      interestingly Apples ARM cores are the best out there which might argue against my statements above but I think it is obvious that Apple has a huge engineering team to support the rest of the SoC. I think it is safe to say that the rest of the die space is more complex than the ARM cores.

                      so from the standpoint of RISC-V I don’t see a lot of importance in the RISC-V core performance numbers. Bringing a competitive chip to market these days is far more involved than just crapping out good RISC cores. The entire chip must be competitive in supporting what customers want. For example there is no sign of a matrix accelerator for AI/ML acceleration. No sign of camera processing hardware, no I/O processors for sensors, lots of “no” really that make the processor less than inviting. By the way in today’s systems all off those “no’s” that they don’t have must run at extremely low power. The RISC-V community is along ways from having a high performance solution for smart phones. Right now it is more of a feature phone processor.

                      Comment

                      Working...
                      X