Announcement

Collapse
No announcement yet.

12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Phoronix: 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Last week I shared my plans to build a low-cost, 12-core, 30-watt ARMv7 cluster running Ubuntu Linux. The ARM cluster that is built around the PandaBoard ES development boards is now online and producing results... Quite surprising results actually for a low-power Cortex-A9 compute cluster. Results include performance-per-Watt comparisons to Intel Atom and Ivy Bridge processors along with AMD's Fusion APU.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    You should be able to get a lower power usage by using a single power converter rather than one for each.

    Comment


    • #3
      "other planned optimizations include: investigating performance differences if using a high-speed NAS with NFS mount for the cluster rather than SDHC cards (e.g. using something like the Excito B3) or a USB-based SSD "

      why not just go and grab a freeNAS ISO and setup up NFS on any old PC you have around that boots a USB stick, or even try using ISCSI that works for me for the fastest Ethernet block Transfer, plus OC FreeNAS has MANY more file options you might want to also test this ARM cluster over...
      TrueNAS is the World's #1 Open Source Storage. Based on OpenZFS, it is trusted by millions and deployed worldwide.
      Last edited by popper; 14 June 2012, 06:38 PM.

      Comment


      • #4
        Originally posted by AJenbo View Post
        You should be able to get a lower power usage by using a single power converter rather than one for each.
        Exactly what I was thinking. Six transformers are going to perform a lot worse than one (assuming the boards can cope with the voltage droop). I'd actually be surprised if there wasn't a good multi-rail switching PSU designed for this very purpose.

        Comment


        • #5
          I think though the power usage of both the atom 330 and the e350 is quite awful too (well the idle power but since most of the power is apparently idle power...) it should be possible to do quite a bit better with the right boards.
          That said if there's major performance gains with newer distro on the pandaboard that should boost efficiency too. In any case IVB will still blow it away :-).

          Comment


          • #6
            6 pandaboards mean:
            -6 switchmode power bricks (typically 85% efficient)
            -6 onboard switchmode power ICs (90%)
            -6 linear voltage regulators (60%)
            -6 wifi and bluetooth modules (did you turn these off?)
            -6 ethernet controllers (at least 200mW per port)
            -6 hdmi controllers
            -6 dvi controllers

            Comment


            • #7
              Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,

              Comment


              • #8
                Wrong...

                Originally posted by Popolon View Post
                Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,
                The Panda boards that are used in this demo are quite recent and represent the state of the art of what is available. A15 chips are great except for the fact that you can't actually buy them and they won't be out in volume until 2013... just in time to compete with Haswell. Oh, and A15 chips don't even support 64 bit execution modes. Those ARM chips won't be out until 2014 at the earliest.

                These benchmarks should be a massive wakeup call to anyone who thinks that ARM will just destroy Intel by slapping their chips into a notebook or a server. These benchmarks were a best-case scenario for ARM too... try running some FP intensive code or code that isn't perfectly parallel and the results would be even more skewed toward Ivy Bridge.

                Comment


                • #9
                  Originally posted by Popolon View Post
                  Ivy bridge is this year tech, Cortex A9 is 2 year old tech. This year technology for ARM is Cortex A15,

                  while that's true OC you will have to wait until around 3rd quarter for these Exynos 5 developer boards to appear in bulk it seems, hell its not even easy to find Samsung Exynos 4 Quad core 4212 (previously called 4412, and then finally renamed the Exynos Quad) 1.4GHz A9 with arm Midgard Mali T-604 Evaluation Boards yet.

                  and even Dual core Exynos 4 are hard to find right now no matter the price never mind reasonable and full spec dev boards for the masses, you might finally get a reasonable priced Freescale MX 6 quad-core sometime this year from Genesi and their EFIKA MX range for ordinary devs but again that's A9 and no mainstream Gfx OC it would seem.
                  Last edited by popper; 15 June 2012, 10:51 AM.

                  Comment


                  • #10
                    Here are my 2 cents:

                    First, Michael, could you please also create normalized versions of the graphs on page 4 and 5 of how the panda board cluster scales? This would be most helpful, since the cluster's parallelization would be more apparent in this format.

                    Second, looking at the fusion benchmarks (this is just the CPU I happened to choose to do a quick analysis for), while the panda board cluster is indeed ~3x faster, I think there is more to the story. What if we had a similar cluster composed of 4 amd fusion systems?

                    Looking a the costs on ebay, we could build a bare-bones cluster:
                    - amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
                    - 4 gb ram: $50
                    - 64gb ocz ssd: $60

                    TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

                    The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

                    280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

                    I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.

                    I don't know how the powervr graphics compares to the fusion graphics card, but if you were doing opencl/gpu computations, this would also add another factor to which system you would go for.

                    Comment

                    Working...
                    X