Originally posted by vladpetric
View Post
Announcement
Collapse
No announcement yet.
How A Raspberry Pi 4 Performs Against Intel's Latest Celeron, Pentium CPUs
Collapse
X
-
- Likes 2
-
Originally posted by schmidtbag View PostYour comment is a bit of a joke:
ARM isn't built to be performant. It's built to be efficient. Compare performance-per-watt (for both idle and load) and suddenly, those Intel chips don't look so great. The G6400 is marketed as 58W, and we all know Intel under-estimates their TDP. Worst-case scenario, the entire RPi4 uses 15W. There's a reason Intel gave up in the mobile market - they weren't able to compete with ARM's power draw without being slower.
Also, clock speed is a big part of it. That's 1.5GHz against 4GHz (when looking at the G6400). That's more than twice the clock speed at 4x the power consumption.
What you're doing is like mocking an economy car because it isn't fast like a sports car or powerful like a truck, failing to understand that it wasn't built to do either.
Also, the biggest part of power consumption is the dynamic part (resulting from switching). In a simplified form, that power is proportional to C * V^2 * f, where C is the dynamic power dissipation capacitance, V is supply voltage, and f is frequency. See page 4 of https://www.ti.com/lit/an/scaa035b/scaa035b.pdf
Increase the frequency and Voltage of the RPi4 and things might not look so well. Though that's a hypothetical - there are hard limits as to how much you can overclock the RPi4.
A good processor design can scale down quite easily. This is where your analogy breaks - a sports car won't be economical, and an economy car won't get you that acceleration and top speed. But with good processors you can have both. You can have a superfast processor that runs really fast at peak demand, but slows down when idle and doesn't kill your battery. Best example, IMO? Apple's ARM procs.
Finally, for performance per watt you need to have a third party measure both the performance and the Watts, and then publish the numbers. It's not something that one can do a back-of-the-envelope calculation for, with the TDP (a max value). Feel free to quote such numbers though, I'm actually quite curious.
Comment
-
Originally posted by atomsymbol
I believe more in the potential usefulness of JIT compilation (in a CPU) than in value prediction.
Predicting multiple branches per cycle isn't fundamentally different from predicting a single branch per cycle. A branch predictor can output 2+ bits per cycle instead of just 1 bit per cycle. Current CPUs (Zen, Skylake-derived) can predict two branches per cycle, although there are some weird irregularities to the implementation so it isn't actually a generic/universal two-branches-per-cycle predictor (at least on Zen, I didn't test Skylake).
Another reason why we don't have larger instruction windows, in addition to the reason you mentioned, is that a larger instruction window increases the probability of there being multiple memory access instructions in the window ("the memory wall"). Intel Sunny Cove has 4 AGUs and might be able (?) to sustain 2 memory reads and 2 memory writes per cycle. I am probably going to wait until Intel releases SunnyCove-derived desktop CPUs (maybe 2020-sep-02) and AMD releases Zen 3 before buying a new CPU.
I think it is probable that if memory renaming (MR) gets implemented in x86 CPUs somewhere in the future then a CPU will have to have just a couple of MR-capable very-large cores plus many smaller cores that are not MR-capable - and/or CPU cores will have to be stacked on top of each other.
Multiple writes to the same register R per cycle rename the register R - multiple writes to the same memory location L per cycle rename the memory location L (if L is in L1D cache).
I don't know an article about memory renaming.
Memory renaming does not make sense today, for example because a single instruction fetch window is too small to contain multiple iterations of the same loop.
In order for JITs to take over the world, I think we'd need a paradigm shift of sorts (an Einstein-like person to figure out how to make JIT compilation much much better than it is today).
Predicting multiple branches per cycle is not the issue here. The issue is that when you have an instruction window of ~256 micro ops, what's the likelihood that the tail is on the right path? Keep in mind that the average number of instructions per basic block, in SPEC cpu benchmarks, is single digits (so there's plenty of branches).
The memory wall is also about the latency of memory versus CPU core (roughly in the order of 100nanos these days, whereas a processor can easily do 4 GHz, which means 400 CPU cycles to get to RAM). And you can have multiple memory accesses in the window - the lower level caches (and load/store queues) can handle those well.
Finally, for renaming - we rename registers because there's few of them and they get reused (by necessity). As such, there's many WAW and WAR false dependencies (unlike the RAW real dependencies). I really don't think that there's enough WAWs and WARs in memory, for memory renaming to be worth it.
There is actually one situation where you might want to rename memory - the stack. Well, that can be done purely at register level though (and yes, I honestly hope it gets done, for reasons which should be obvious if you skim through the following):
- Likes 1
Comment
-
Originally posted by herman View PostMaybe even Intel will grow desperate and finally get back into the ARM business, haha.
- Likes 1
Comment
-
Originally posted by Baguy View PostHow about pitting the pi up against a 6W Pentium N5000 (basically a slightly better atom) or Cherry trail chip
You mentioned N5000, which could be hard to find, but I am thinking a J5005, J4105, and just for fun, a J4005.
Yes, they are "refreshed" parts with slightly better clock speeds compared to N5000, N4100, and N4000. Still, I think the newer J-series parts are more likely available as SoCs on motherboards in the marketplace.
ASRock builds a few models of boards with the J-series Gemini Lake SoCs. I own a few and use 1 daily for Kodi @1920x1080, but with an add-on video card as the Intel graphics functions playing back full-motion 1920x1080x30 and 1920x1080x60 video have always disappointed me - YMMV. I would not play games using a SoC like these, but I think they would be quite adequate for typical office tasks.
- Likes 1
Comment
-
-
Originally posted by schmidtbag View PostARM isn't built to be performant. It's built to be efficient. Compare performance-per-watt (for both idle and load) and suddenly, those Intel chips don't look so great. The G6400 is marketed as 58W, and we all know Intel under-estimates their TDP. Worst-case scenario, the entire RPi4 uses 15W. There's a reason Intel gave up in the mobile market - they weren't able to compete with ARM's power draw without being slower.
Also, clock speed is a big part of it. That's 1.5GHz against 4GHz (when looking at the G6400). That's more than twice the clock speed at 4x the power consumption.
What you're doing is like mocking an economy car because it isn't fast like a sports car or powerful like a truck, failing to understand that it wasn't built to do either.
- Likes 2
Comment
-
Originally posted by vladpetric View PostYet there is more than 2.66x performance difference between the G6400 and the RPi4.
Increase the frequency and Voltage of the RPi4 and things might not look so well. Though that's a hypothetical - there are hard limits as to how much you can overclock the RPi4.
Nobody in their right mind would buy an ARM chip in the hopes of having competitive processing power with a laptop or desktop x86 CPU. You buy ARM because it sips power with reasonable performance. I don't get why you're so focused on performance when that's not the point of going for ARM.
A good processor design can scale down quite easily. This is where your analogy breaks - a sports car won't be economical, and an economy car won't get you that acceleration and top speed. But with good processors you can have both. You can have a superfast processor that runs really fast at peak demand, but slows down when idle and doesn't kill your battery. Best example, IMO? Apple's ARM procs.
The reason Apple's architecture works well is because of their own added instructions and a LOT of OS-level optimizations. Since they control the platform, they don't have to do generic builds of anything; they can fine-tune things with mediocre hardware. I'm sure Apple will be adding on more cores to their desktop ARM CPUs instead of more MHz.
Finally, for performance per watt you need to have a third party measure both the performance and the Watts, and then publish the numbers. It's not something that one can do a back-of-the-envelope calculation for, with the TDP (a max value). Feel free to quote such numbers though, I'm actually quite curious.
- Likes 2
Comment
Comment