Announcement

Collapse
No announcement yet.

12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Phoronix: 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Last week I shared my plans to build a low-cost, 12-core, 30-watt ARMv7 cluster running Ubuntu Linux. The ARM cluster that is built around the PandaBoard ES development boards is now online and producing results... Quite surprising results actually for a low-power Cortex-A9 compute cluster. Results include performance-per-Watt comparisons to Intel Atom and Ivy Bridge processors along with AMD's Fusion APU.

    http://www.phoronix.com/vr.php?view=17473

  • #2
    You should be able to get a lower power usage by using a single power converter rather than one for each.

    Comment


    • #3
      "other planned optimizations include: investigating performance differences if using a high-speed NAS with NFS mount for the cluster rather than SDHC cards (e.g. using something like the Excito B3) or a USB-based SSD "

      why not just go and grab a freeNAS ISO and setup up NFS on any old PC you have around that boots a USB stick, or even try using ISCSI that works for me for the fastest Ethernet block Transfer, plus OC FreeNAS has MANY more file options you might want to also test this ARM cluster over...
      http://www.freenas.org/features/item...ategory_id=108
      Last edited by popper; 06-14-2012, 06:38 PM.

      Comment


      • #4
        Originally posted by AJenbo View Post
        You should be able to get a lower power usage by using a single power converter rather than one for each.
        Exactly what I was thinking. Six transformers are going to perform a lot worse than one (assuming the boards can cope with the voltage droop). I'd actually be surprised if there wasn't a good multi-rail switching PSU designed for this very purpose.

        Comment


        • #5
          I think though the power usage of both the atom 330 and the e350 is quite awful too (well the idle power but since most of the power is apparently idle power...) it should be possible to do quite a bit better with the right boards.
          That said if there's major performance gains with newer distro on the pandaboard that should boost efficiency too. In any case IVB will still blow it away :-).

          Comment


          • #6
            6 pandaboards mean:
            -6 switchmode power bricks (typically 85% efficient)
            -6 onboard switchmode power ICs (90%)
            -6 linear voltage regulators (60%)
            -6 wifi and bluetooth modules (did you turn these off?)
            -6 ethernet controllers (at least 200mW per port)
            -6 hdmi controllers
            -6 dvi controllers

            Comment


            • #7
              Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,

              Comment


              • #8
                now we need this with 64bit ARMs and 1000 or better 10 000mbps rj45 copper based network adapter and then this shit rock da house

                Comment


                • #9
                  Wrong...

                  Originally posted by Popolon View Post
                  Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,
                  The Panda boards that are used in this demo are quite recent and represent the state of the art of what is available. A15 chips are great except for the fact that you can't actually buy them and they won't be out in volume until 2013... just in time to compete with Haswell. Oh, and A15 chips don't even support 64 bit execution modes. Those ARM chips won't be out until 2014 at the earliest.

                  These benchmarks should be a massive wakeup call to anyone who thinks that ARM will just destroy Intel by slapping their chips into a notebook or a server. These benchmarks were a best-case scenario for ARM too... try running some FP intensive code or code that isn't perfectly parallel and the results would be even more skewed toward Ivy Bridge.

                  Comment


                  • #10
                    Originally posted by Popolon View Post
                    Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,

                    while that's true OC you will have to wait until around 3rd quarter for these Exynos 5 developer boards to appear in bulk it seems, hell its not even easy to find Samsung Exynos 4 Quad core 4212 (previously called 4412, and then finally renamed the Exynos Quad) 1.4GHz A9 with arm Midgard Mali T-604 Evaluation Boards yet.

                    and even Dual core Exynos 4 are hard to find right now no matter the price never mind reasonable and full spec dev boards for the masses, you might finally get a reasonable priced Freescale MX 6 quad-core sometime this year from Genesi and their EFIKA MX range for ordinary devs but again that's A9 and no mainstream Gfx OC it would seem.
                    Last edited by popper; 06-15-2012, 10:51 AM.

                    Comment


                    • #11
                      Here are my 2 cents:

                      First, Michael, could you please also create normalized versions of the graphs on page 4 and 5 of how the panda board cluster scales? This would be most helpful, since the cluster's parallelization would be more apparent in this format.

                      Second, looking at the fusion benchmarks (this is just the CPU I happened to choose to do a quick analysis for), while the panda board cluster is indeed ~3x faster, I think there is more to the story. What if we had a similar cluster composed of 4 amd fusion systems?

                      Looking a the costs on ebay, we could build a bare-bones cluster:
                      - amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
                      - 4 gb ram: $50
                      - 64gb ocz ssd: $60

                      TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

                      The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

                      280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

                      I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.

                      I don't know how the powervr graphics compares to the fusion graphics card, but if you were doing opencl/gpu computations, this would also add another factor to which system you would go for.

                      Comment


                      • #12
                        Originally posted by FourDMusic View Post
                        Here are my 2 cents:

                        First, Michael, could you please also create normalized versions of the graphs on page 4 and 5 of how the panda board cluster scales? This would be most helpful, since the cluster's parallelization would be more apparent in this format.

                        Second, looking at the fusion benchmarks (this is just the CPU I happened to choose to do a quick analysis for), while the panda board cluster is indeed ~3x faster, I think there is more to the story. What if we had a similar cluster composed of 4 amd fusion systems?

                        Looking a the costs on ebay, we could build a bare-bones cluster:
                        - amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
                        - 4 gb ram: $50
                        - 64gb ocz ssd: $60

                        TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

                        The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

                        280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

                        I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.

                        I don't know how the powervr graphics compares to the fusion graphics card, but if you were doing opencl/gpu computations, this would also add another factor to which system you would go for.
                        if you were looking to compare the OpenCL/GPU computation then you need to be sourcing and using the current 4212 (previously called 4412, and then finally renamed the Exynos Quad) 1.4GHz A9 with arm Midgard Mali T-604 (with 4 cores not 8 yet) Evaluation Boards at least as it is only the current ARM Midgard architecture that covers the full OpenCL and other GPU compute spec's

                        come the 3rd quarter 2012 i believe Exynos Quad (and other Quad vendors) will also come in 1.6+GHz (perhaps even 8+ Midgard gfx cores) as well as the current 1.4GHz so there's also clock parity with the lower power x86 offerings by then OC
                        Last edited by popper; 06-15-2012, 12:20 PM.

                        Comment


                        • #13
                          Originally posted by AJenbo View Post
                          You should be able to get a lower power usage by using a single power converter rather than one for each.
                          Absolutely, one power supply powering the whole shebang vs. a separate power converter for each will certainly increase efficiency.

                          Comment


                          • #14
                            > while that's true OC you will have to wait until around 3rd quarter for these Exynos 5 developer boards to appear in bulk it seems

                            Q3 is only in 2 weeks those chips (at least Samsung version) are already available for motherboard dev since 6 month. There are also "rumors of" Galaxy note 2 in october with Cortex A15. But the most important is that Cortex A15 @2Ghz is twice faster than Cortex A9@1,2Ghz and consume less power. Some showed that at last test on A9, Samsung Cortex A9, few month ago was already far faster than pandaboard and every Atom chips available. The phoronix test suite was not optimized at all for ARM and Samsung is far more active in Linux developpement for their products.

                            Comment


                            • #15
                              How many ARM processors would it take to equal the throughput of a 3770K at 4.2 GHz turbo mode with 1666 MHz DDR3?

                              Answer: More than there are for sale in your local Verizon store.

                              Now try comparing ARM processors to the Xeon E5-2687W... ho boy, look out. It may be Sandy Bridge, but it's a BEAST.

                              Comment

                              Working...
                              X