Announcement

Collapse
No announcement yet.

SiFive U8-Series To Offer Much Greater RISC-V Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    For those who are complaining about documentation: they are moving too fast to stop and look back. RISC-V ISA is new. RISC-V chips are even newer.

    Look how quickly they got to a 7nm process. Look at how much product development they have done in the short time they have been around. Look at how they were able to do that despite the pressure from x86 and ARM.

    They don't have a "Raspberry PI" yet, so until they do, expect documentation to be light.

    Comment


    • #42
      Originally posted by wizard69 View Post
      companies like Qualcomm will go RISC-V when there is real demand to go in that direction. Right now wishful thinking is not demand. The problem is magnified by the reality that RISC-V may never catch up performance wise. It would be tough marketing to tell your customers that Qualcomm is switching g to a low performance processor to save on expenses. ARM already has quality low performance chips and Qualcomm is few to chase Apple with high performance ones.
      There's no particular reason why a RISC-V core would have worse performance than the equivalent ARM64 one. That being said, Qualcomm has paid an arm and five legs for an ARM architectural license, so for them I don't see any upside in suddenly switching to RISC-V. It's just a fantasy, almost as far out there as whoever it was earlier in this thread fantasizing about Intel ditching x86 for RISC-V.

      For example there is no sign of a matrix accelerator for AI/ML acceleration.
      The U87, part of the U-series that was now announced, will have the RISC-V V extension, which last I checked does have matrix instructions in addition to the vector ones.

      Now, in general I think we're far away from being able to do an apples-to-apples comparison of a RISC-V chip and a high end ARM64, not to mention x86, chip. Where RISC-V is seeing some initial success is in deeply embedded systems where either the licensing overhead from ARM is seen as too much, and/or where the customization opportunities afforded by the free licensing make for an advantage. Things like the Western Digital HD/SSD(?) controllers, Nvidia using it for some control functions in their GPU's etc.

      Comment


      • #43
        Originally posted by pkese View Post
        Complex decoders in X86 with additional pipeline stages matter in both area, power and performance (each stage added to the front of the pipeline incurs a cost at mispredicted branches). Then there are uOP caches that also take area and power (it is hard to put all these transistors close to the CPU core without incurring extra latency).
        A uop cache can reduce the number of pipe stages compared to the fetch - decode pipe. And it's close to CPU.

        The cost of fusing a reg+reg addressing instructions in a RISC-V is paid at occurrence of each such instruction pair. The cost of extra pipeline stages in X86 is paid at every single instruction.
        And detecting the pairs is a zero cost? Don't you think you already have an adder to compute reg + imm in R-V? Do you think reg + reg would cost more than reg + imm? Is this added cost more than detecting pairs? Also do you have a solution for code that is already scheduled to get around the problem of latency inherent to having to use some add instruction before doing the memory access? In this case the pairs won't be close, there will be code in between them; good luck detecting such pairs at low cost.

        The only problem of reg + reg is that this increases bandwidth of physical to virtual reg mapping, and that it adds one read port, in particular for store instructions.

        It used to be the case that any CPU performance problem could be solved by adding more transistors. However times have changed. Nowadays the bottleneck is power consumption. You can have one CPU running at 5 GHz, but can you have 16? And at what frequency? In this era, each transistor that you can spare is a transistor that doesn't produce heat. Extra pipeline stages and uOP caches mean more transistors, more heat, longer signal paths, longer waits and thus lower frequency and overall performance.
        More transistors doesn't necessarily mean more heat; if all transistors are toggling yes there's more heat. But if these transistors are used to improve work/cycle you'll get better power efficiency for doing a given task. Caches are an example of that. Do you really want to remove them? Branch prediction is another one; throwing more transistors at it is a way to reduce wasted speculation, which means less energy to complete the task. Do you really want to make it too simple thus wasting cycles on the wrong code path?

        Also if the uop cache allows you to feed 6 instructions from it, while you only have 4 decoders in the front-end, which one will consume less power? I have no answer about this one, it's certainly not obvious, and surely not as simple as saying that since you have less transistors it must be better.

        If you read Anandtech you'll see that the Apple core is very large, but it's more power efficient than many other much small CPUs.

        We'll only find this out when RISC CPUs match engineering parity with X86. It used to be the case that Intel was able to invest more money into engineering and production than the rest. Now that competition is catching up thing will start to reveal themselves.
        RISC CPUs? You mean RISC-V CPUs? Equating RISC and RISC-V is stupid. RISC-V didn't invent anything except for an over hyped "distribution" model.

        I don't think that Apple's CPU engineering department is better than the one at Intel or AMD (definitely both Intel and AMD have more experience), yet they appear to be able to produce outstanding results.
        Companies per se don't have experience. The engineers working for these companies have experience, and I have big news for you: people change companies :-)

        I think that Apple just has easier job to design a performant CPU. Apple is now at performance parity, but it is unlikely that they will stop anytime soon. And I think they may as well outcompete X86 architectures. Time will tell.
        Apple CPUs already are better than most x86 chips. I consider them mostly at parity. And you know why? Because the ISA in the end doesn't matter for high-end CPUs so as long as you have good engineers and the willingness to push performance you'll end up at very close points.

        Anyway I'm not trying to compare with the monstrosity that x86(-64) is; it has to die a painful death. I'm comparing AArch64 to RV64 and I think AArch64 is better from the ISA point of view. What interests me is when RV64 will have a competitive high-end CPU, which is far from being the case now.

        EDIT: some interesting read showing some of the R-V shortcomings: https://gist.github.com/erincandesce...d9982f7618ef68
        Last edited by ldesnogu; 10-28-2019, 08:49 AM.

        Comment


        • #44
          Originally posted by uid313 View Post
          Apple has huge amount of resources and already make their own ARM-based Apple A12 "Bionic" processors, they really could switch to RISC-V and reduce their costs by cutting out ARM.
          I've been quietly speculating this too. Apple usually wants to design as much as possible by themselves. Moving to RISC-V sounds like an obvious choice.

          Comment


          • #45
            Originally posted by Zucca View Post
            I've been quietly speculating this too. Apple usually wants to design as much as possible by themselves. Moving to RISC-V sounds like an obvious choice.
            Considering ARM was a spinoff from Acorn and (drumroll...) Apple, and that Apple has an architectural license, I'd guess Apple to be one of the last ARM vendors to switch to something else.

            Comment


            • #46
              Originally posted by zxy_thf View Post
              Both area efficiency and performance/watt are ambiguous metrics -- we don't know how much they benefit from the 7nm manufacturing process.
              If we assume all chips are developed at TSMC (and that the designs would simply be ported to the newer nodes):
              16nm ~ 1.73x perf over 28nm
              7nm ~ 1.38x perf over 16nm
              7nm ~ 2.38x perf over 28nm

              They've squeezed out something like 3x additional performance over what the node shrink from 28 to 7nm in theory should yield (if we compare the U54 to the U84), so they must have had massive bottlenecks and optimizations.

              Directly comparing against the A72 chip is more or less impossible due to the incomprehensible language and broken formatting used on SiFive's website.

              I'm really excited if the massive node shrink actually brings down the cost of these, the $1000 price tag is just way too steep for me to ever even affording it, should be possible considering how many more chips per wafer they'll pump out.

              Comment


              • #47
                Originally posted by ldesnogu View Post
                EDIT: some interesting read showing some of the R-V shortcomings: https://gist.github.com/erincandesce...d9982f7618ef68
                Thanks for the link, this kind of stuff is great to read.

                Comment


                • #48
                  Originally posted by jabl View Post
                  Considering ARM was a spinoff from Acorn and (drumroll...) Apple, and that Apple has an architectural license, I'd guess Apple to be one of the last ARM vendors to switch to something else.
                  Hard to say. Apple joined the group in the 80s and developed a CPU for their Newton.

                  We can only guess (unless a realiable source is found) how much Apple pays for the license nowdays.

                  Anyways, earlier Apple wanted to make their own modems and now they do. This is why I guess Apple would want to have their own CPU. Although they kinda have it already with ARM. Not completely, but almost.

                  Comment

                  Working...
                  X