Announcement

Collapse
No announcement yet.

How A Raspberry Pi 4 Performs Against Intel's Latest Celeron, Pentium CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by vladpetric View Post
    There are fundamental limitations to getting higher IPC
    What fundamental limitations preventing higher IPC do you have in mind?

    Hint: Future [20+ years] x86 CPUs will be able to perform memory renaming.
    Last edited by atomsymbol; 07 August 2020, 03:48 PM. Reason: Change 10+ years to 20+ years

    Comment


    • #12
      Originally posted by chuckula View Post

      A very trite cliche that's disproven by two major facts:
      1. Alpha got all of its performance from deep pipelines and high clocks. Cool back in the mid-90s, but how did that turn out for the Pentium 4 again? Hell, there wasn't even any Alpha chip that was ever produced that had a vector unit in it and it was only on the roadmap for the generation AFTER the EV8 that never made it to market.
      2. We already did see what Alpha would have turned into in the original Athlon since it was basically made by Digital designers that AMD hired. So by any useful definition, the processors that we have now *are* the distant ancestors of Alpha, and there's no way some magical extrapolation of 1990's era high-clock deep-pipeline processing was ever going to work with the laws of physics to be 4x faster than a 10nm Atom much less a high-end modern x86 device.
      Alpha also had really well executed out-of-order execution ... It had a better OoO design than the Pentium Pro, which was released a year earlier.

      Yes, a lot of modern designs are greatly influenced by the Alpha. Sadly, not as much the modern ARMs (this does not include Apple's ARM line, which only licensed the instruction set, and whose original chief architect is Jim Keller ... wink wink about the Digital designers, then Athlon, then later Ryzen).

      If I had a hard choice between a well-executed OoO pipeline and an in-order pipeline with vector units, I think I would pick the OoO one almost all the time. Utility of short vectors is small - they typically require either hand-written code, or Fortran loops over contiguous memory (linear algebra).

      Comment


      • #13
        Originally posted by herman View Post
        A Pi 4 is not a serious desktop replacement, but for a budget SOC, it's actually coming close enough to Intel that we benchmark it. The real test will be Apple's silicon. If that can get competitive performance, then it will only be a matter of time before other manufacturers switch to ARM-based products.
        Apple processors only use the instruction set - they designed their cpus from scratch, and didn't use the toy reference designs from ARM.

        Microarchitectures (pipeline, caches, predictors, etc) matter a lot more than the instruction set.
        Last edited by vladpetric; 07 August 2020, 02:37 PM.

        Comment


        • #14
          Originally posted by atomsymbol View Post

          What fundamental limitations preventing higher IPC do you have in mind?

          Hint: Future [10+ years] x86 CPUs will be able to perform memory renaming.
          1. Data dependencies in the program. Maybe you can break some of them through carefully-directed value prediction.

          2. Control dependencies in the program. The likelihood of being on the right path goes down with each predicted, speculative branch ... it's one of the reasons why we don't have larger instruction windows.

          There are other practical considerations as well (e.g., larger structures need more power), but these I'm considering to be fundamental.

          I really doubt that memory renaming is going to happen. I'm willing to make a bet on it as well (though for that, we should have a clear understanding as to what memory renaming means - e.g., link to a specific paper describing the mechanism).

          Comment


          • #15
            Originally posted by vladpetric View Post

            Apple processors only use the instruction set - they designed their cpus from scratch, and didn't use the toy reference designs from ARM.

            Microarchitecture (pipeline, caches, predictors, etc) matter a lot more than the instruction set.
            I agree with you that the microarchitecture matters more, but isn't Apple's custom designed processors just like how AMD and Intel design custom x86 processors? The important part is that code written for x86 works across all (or most) x86 processors. If the same is true for Apple silicon & all other ARM processors, then we'll see a custom ARM processor revolution soon.

            Comment


            • #16
              Originally posted by herman View Post

              I agree with you that the microarchitecture matters more, but isn't Apple's custom designed processors just like how AMD and Intel design custom x86 processors? The important part is that code written for x86 works across all (or most) x86 processors. If the same is true for Apple silicon & all other ARM processors, then we'll see a custom ARM processor revolution soon.
              No general disagreements with you - but I for one have been really disappointed with rpi4's low performance (including for my own benchmarks).

              Given that Apple's a closed shop, I think that for an ARM revolution to take place, we need a great chip design that is not Apple's. I'm hoping that ARM themselves do that, but with so much ownership changes ... I'm not too hopeful. Of course, this is just me speculating.

              Comment


              • #17
                Originally posted by herman View Post
                then we'll see a custom ARM processor revolution soon.
                Only a very few (estimated to be around six) companies have an ARM 64-bit architectural license. While Cortex-X holds promise for some flexible design points, and the (rumored) Whitechapel design of Samsung/Google looks interesting, Apple is still consider a leader, in no small part because they invested in a lot of very good chip designers (and they even poached one of ARMs lead chip designers not all that long ago). Custom designs that are performant (and not just another generic look-alike) require very large investments, and many of those are targeting in-house products.

                Comment


                • #18
                  Originally posted by vladpetric View Post
                  Unfortunately the microarchitecture of the RPi4 is still a bit of a joke.

                  I've had a lengthy (and peppered with the occasional insults, of course) discussion earlier on phoronix forums.

                  It's not just the clock speed. The IPC of the RPi4 is mediocre.
                  Your comment is a bit of a joke:
                  ARM isn't built to be performant. It's built to be efficient. Compare performance-per-watt (for both idle and load) and suddenly, those Intel chips don't look so great. The G6400 is marketed as 58W, and we all know Intel under-estimates their TDP. Worst-case scenario, the entire RPi4 uses 15W. There's a reason Intel gave up in the mobile market - they weren't able to compete with ARM's power draw without being slower.
                  Also, clock speed is a big part of it. That's 1.5GHz against 4GHz (when looking at the G6400). That's more than twice the clock speed at 4x the power consumption.

                  What you're doing is like mocking an economy car because it isn't fast like a sports car or powerful like a truck, failing to understand that it wasn't built to do either.

                  Comment


                  • #19
                    Originally posted by tildearrow View Post
                    Had the Alpha architecture not died, then we would have processors 4x faster than x86.
                    It certainly didn't die a natural death. If Alpha was a human, we would call it premeditated murder. I remember in 1996 my employer had me benchmarking hardware to run their heavy application. The fastest from intel was the Pentium Pro, and from DEC the 21164A. It was no comparison at all: 32 bit 200 Mhz from intel vs. 64 bit 500 Mhz from DEC. Intel's finest never stood a chance.

                    Comment


                    • #20
                      why run 32-bit on the Pi 4 instead of 64-bit? 64-bit is faster for a lot of workloads due to having more registers.

                      Comment

                      Working...
                      X