Announcement

Collapse
No announcement yet.

How A Raspberry Pi 4 Performs Against Intel's Latest Celeron, Pentium CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by CommunityMember View Post

    Only a very few (estimated to be around six) companies have an ARM 64-bit architectural license. While Cortex-X holds promise for some flexible design points, and the (rumored) Whitechapel design of Samsung/Google looks interesting, Apple is still consider a leader, in no small part because they invested in a lot of very good chip designers (and they even poached one of ARMs lead chip designers not all that long ago). Custom designs that are performant (and not just another generic look-alike) require very large investments, and many of those are targeting in-house products.
    I agree. In my own personal (and very subjective) assessment, Apple is 3-4 years ahead of the competition much like it was with the iPhone. That's a long time to be behind the times for the competition, but it's not that long for consumers when you realize the defacto architectural standard for the last 40 years is about to be dethroned. I look forward to all-day battery life on ARM-based laptops.

    My own personal wish is this will finally spur on greater Linux support/development for ARM and pure Linux phones can finally be created. Having only two OS's in the mobile industry is stifling.

    Comment


    • #22
      Originally posted by vladpetric View Post

      No general disagreements with you - but I for one have been really disappointed with rpi4's low performance (including for my own benchmarks).

      Given that Apple's a closed shop, I think that for an ARM revolution to take place, we need a great chip design that is not Apple's. I'm hoping that ARM themselves do that, but with so much ownership changes ... I'm not too hopeful. Of course, this is just me speculating.
      Apple's most likely competitors will be Microsoft, Samsung, Qualcomm, Nvidia (if they buy ARM). Maybe even Intel will grow desperate and finally get back into the ARM business, haha. If Apple's laptops have the performance they are promising, then everyone will want to get in on the new ARM-based designs.

      Comment


      • #23
        Originally posted by vladpetric View Post
        1. Data dependencies in the program. Maybe you can break some of them through carefully-directed value prediction.
        I believe more in the potential usefulness of JIT compilation (in a CPU) than in value prediction.

        Originally posted by vladpetric View Post
        2. Control dependencies in the program. The likelihood of being on the right path goes down with each predicted, speculative branch ... it's one of the reasons why we don't have larger instruction windows.
        Predicting multiple branches per cycle isn't fundamentally different from predicting a single branch per cycle. A branch predictor can output 2+ bits per cycle instead of just 1 bit per cycle. Current CPUs (Zen, Skylake-derived) can predict two branches per cycle, although there are some weird irregularities to the implementation so it isn't actually a generic/universal two-branches-per-cycle predictor (at least on Zen, I didn't test Skylake).

        Another reason why we don't have larger instruction windows, in addition to the reason you mentioned, is that a larger instruction window increases the probability of there being multiple memory access instructions in the window ("the memory wall"). Intel Sunny Cove has 4 AGUs and might be able (?) to sustain 2 memory reads and 2 memory writes per cycle. I am probably going to wait until Intel releases SunnyCove-derived desktop CPUs (maybe 2020-sep-02) and AMD releases Zen 3 before buying a new CPU.

        Originally posted by vladpetric View Post
        There are other practical considerations as well (e.g., larger structures need more power), but these I'm considering to be fundamental.
        I think it is probable that if memory renaming (MR) gets implemented in x86 CPUs somewhere in the future then a CPU will have to have just a couple of MR-capable very-large cores plus many smaller cores that are not MR-capable - and/or CPU cores will have to be stacked on top of each other.

        Originally posted by vladpetric View Post
        I really doubt that memory renaming is going to happen. I'm willing to make a bet on it as well (though for that, we should have a clear understanding as to what memory renaming means - e.g., link to a specific paper describing the mechanism).
        Multiple writes to the same register R per cycle rename the register R - multiple writes to the same memory location L per cycle rename the memory location L (if L is in L1D cache).

        I don't know an article about memory renaming.

        Memory renaming does not make sense today, for example because a single instruction fetch window is too small to contain multiple iterations of the same loop.

        Comment


        • #24
          Originally posted by schmidtbag View Post
          Your comment is a bit of a joke:
          ARM isn't built to be performant. It's built to be efficient. Compare performance-per-watt (for both idle and load) and suddenly, those Intel chips don't look so great. The G6400 is marketed as 58W, and we all know Intel under-estimates their TDP. Worst-case scenario, the entire RPi4 uses 15W. There's a reason Intel gave up in the mobile market - they weren't able to compete with ARM's power draw without being slower.
          Also, clock speed is a big part of it. That's 1.5GHz against 4GHz (when looking at the G6400). That's more than twice the clock speed at 4x the power consumption.

          What you're doing is like mocking an economy car because it isn't fast like a sports car or powerful like a truck, failing to understand that it wasn't built to do either.
          Yet there is more than 2.66x performance difference between the G6400 and the RPi4.

          Also, the biggest part of power consumption is the dynamic part (resulting from switching). In a simplified form, that power is proportional to C * V^2 * f, where C is the dynamic power dissipation capacitance, V is supply voltage, and f is frequency. See page 4 of https://www.ti.com/lit/an/scaa035b/scaa035b.pdf

          Increase the frequency and Voltage of the RPi4 and things might not look so well. Though that's a hypothetical - there are hard limits as to how much you can overclock the RPi4.

          A good processor design can scale down quite easily. This is where your analogy breaks - a sports car won't be economical, and an economy car won't get you that acceleration and top speed. But with good processors you can have both. You can have a superfast processor that runs really fast at peak demand, but slows down when idle and doesn't kill your battery. Best example, IMO? Apple's ARM procs.

          Finally, for performance per watt you need to have a third party measure both the performance and the Watts, and then publish the numbers. It's not something that one can do a back-of-the-envelope calculation for, with the TDP (a max value). Feel free to quote such numbers though, I'm actually quite curious.

          Comment


          • #25
            Originally posted by atomsymbol View Post

            I believe more in the potential usefulness of JIT compilation (in a CPU) than in value prediction.



            Predicting multiple branches per cycle isn't fundamentally different from predicting a single branch per cycle. A branch predictor can output 2+ bits per cycle instead of just 1 bit per cycle. Current CPUs (Zen, Skylake-derived) can predict two branches per cycle, although there are some weird irregularities to the implementation so it isn't actually a generic/universal two-branches-per-cycle predictor (at least on Zen, I didn't test Skylake).

            Another reason why we don't have larger instruction windows, in addition to the reason you mentioned, is that a larger instruction window increases the probability of there being multiple memory access instructions in the window ("the memory wall"). Intel Sunny Cove has 4 AGUs and might be able (?) to sustain 2 memory reads and 2 memory writes per cycle. I am probably going to wait until Intel releases SunnyCove-derived desktop CPUs (maybe 2020-sep-02) and AMD releases Zen 3 before buying a new CPU.



            I think it is probable that if memory renaming (MR) gets implemented in x86 CPUs somewhere in the future then a CPU will have to have just a couple of MR-capable very-large cores plus many smaller cores that are not MR-capable - and/or CPU cores will have to be stacked on top of each other.



            Multiple writes to the same register R per cycle rename the register R - multiple writes to the same memory location L per cycle rename the memory location L (if L is in L1D cache).

            I don't know an article about memory renaming.

            Memory renaming does not make sense today, for example because a single instruction fetch window is too small to contain multiple iterations of the same loop.
            About JITs - I attended a talk in 1997 in which Java people claimed that they were going to beat C++ speed by doing clever things with their JITs. Well, that didn't work at all ...

            In order for JITs to take over the world, I think we'd need a paradigm shift of sorts (an Einstein-like person to figure out how to make JIT compilation much much better than it is today).

            Predicting multiple branches per cycle is not the issue here. The issue is that when you have an instruction window of ~256 micro ops, what's the likelihood that the tail is on the right path? Keep in mind that the average number of instructions per basic block, in SPEC cpu benchmarks, is single digits (so there's plenty of branches).

            The memory wall is also about the latency of memory versus CPU core (roughly in the order of 100nanos these days, whereas a processor can easily do 4 GHz, which means 400 CPU cycles to get to RAM). And you can have multiple memory accesses in the window - the lower level caches (and load/store queues) can handle those well.

            Finally, for renaming - we rename registers because there's few of them and they get reused (by necessity). As such, there's many WAW and WAR false dependencies (unlike the RAW real dependencies). I really don't think that there's enough WAWs and WARs in memory, for memory renaming to be worth it.

            There is actually one situation where you might want to rename memory - the stack. Well, that can be done purely at register level though (and yes, I honestly hope it gets done, for reasons which should be obvious if you skim through the following):

            https://iscaconf.org/isca2005/papers/02B-03.PDF


            Comment


            • #26
              Originally posted by herman View Post
              Maybe even Intel will grow desperate and finally get back into the ARM business, haha.
              Intel is known to have an ARM (32-bit) architectural license. The rumor is that they do not have a 64-bit license, but we don't actually know that (all we really know for sure is that the last announced 64-bit architectural license was not Intel, which says nothing about some previous 64-bit license, or some un-announced license).

              Comment


              • #27
                Originally posted by Baguy View Post
                How about pitting the pi up against a 6W Pentium N5000 (basically a slightly better atom) or Cherry trail chip
                I would like to see RPi tested against any of the quad core processors from the Intel "Gemini Lake" series.

                You mentioned N5000, which could be hard to find, but I am thinking a J5005, J4105, and just for fun, a J4005.

                Yes, they are "refreshed" parts with slightly better clock speeds compared to N5000, N4100, and N4000. Still, I think the newer J-series parts are more likely available as SoCs on motherboards in the marketplace.

                ASRock builds a few models of boards with the J-series Gemini Lake SoCs. I own a few and use 1 daily for Kodi @1920x1080, but with an add-on video card as the Intel graphics functions playing back full-motion 1920x1080x30 and 1920x1080x60 video have always disappointed me - YMMV. I would not play games using a SoC like these, but I think they would be quite adequate for typical office tasks.

                Comment


                • #28
                  I assume that the SDcard did affect the performance, with the latest firmware updates, Raspi can boot from USB 3.0 so you can store the OS on a SSD which is better on all metrics (and also not prone to corrupt randomly like SDs used as OS drives).

                  Comment


                  • #29
                    Originally posted by hotaru View Post
                    why run 32-bit on the Pi 4 instead of 64-bit? 64-bit is faster for a lot of workloads due to having more registers.
                    I was going to ask the same thing. ARM64 OS is faster on the Pi4 on certain operations with more than 1Gb of RAM.

                    Comment


                    • #30
                      Ok. Intel needs to get benchmarked against Raspi to keep on winning.... : )

                      Comment

                      Working...
                      X