No announcement yet.

12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by FourDMusic View Post
    Looking a the costs on ebay, we could build a bare-bones cluster:
    - amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
    - 4 gb ram: $50
    - 64gb ocz ssd: $60

    TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

    The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

    280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

    I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.
    This is a cost comparison — relevent, except that I don't think anybody would seriously use 2-core fully-equiped boards. The ARM chips themselves are cheap (I heard prices of $5-7 somewhere; not certain about it). Stripping a lot of the other stuff off reduces cost and power consumption per CPU. So I think the ARM server boards, when they arrive, will be a lot better in terms of power consumption and price, than this cluster. Ivy Bridge may not look so pretty then, even without A15.


    • #17
      Michael you should try to include the High Performance Linkpack benchmark (the one used in the top 500) for MPI related runs. Unluck this is not trivial. The Atlas library must be recompiled to be tuned for the hardware (otherwise you can have a very very big performance loss, like 50%), otherwise an hardware specific BLAS implementation must be used (for intel hardware there is MKL, but for ARM i don't think there is one). The ethernet connection and the ammount of RAM might be a bottleneck. More RAM is available, more HPL is efficent. It is not the best bench for FLOPS, but it is the sad standard

      Anyway it would be very very nice to have the HPL included in PTS if possible. The atlas compilation mighe just be added to the beginning of the test. It takes hours but it is the only way to have decent performances.


      • #18
        Pandas as a NAS is nonsense

        honestly speaking, you need SATA/SSD for NAS and not just slow SDHC cards. Pity you don't have an option to give a try to cluster of for example free scale i.MX3 Quick Start Boards or new i.MX6. That may be better than pandas IMHO.


        • #19
          On ARM's side, a special multicore chip can be developed say a hypotical 12 core A9 which may outperform this cluster. The most specialized version i know is Claxeda Energycore. Which are 4 core A9 with 4mb L2 cache and 4chips on board (16 cores per board).

          On the cost and x86's side, (3770k + z77 board + 4gb ram + psu costs around 470$). So you can buy 2x 3770k system. Which can be underclocked around 0.9v and can be overclocked to 3.9 ghz with this voltage. I guess ARM system will need lots of time to compansate the compute capacity with low power usage as FourDMusic states.

          On the other hand, you could have used a Celeron G530 system on a cheap h61 board where an underclocked G530 consumes as low as 34w under full load (with linpack). So it could have been more interesting.

          The only place where it is reasonable to use ARM core is with high number of cores (which you need a special hard macro). 16-32 core Special Arm chips would have been more efficient and powerfull.


          • #20
            how to build

            anyone knows what cluster implementation michael was running?
            kerrighed? openssi?
            i bought myselve a mini cluster based on the parrallella boards on kickstarter...
            not sure what i'm going to run on them cluster wise.


            • #21
              Originally posted by kgardas View Post
              honestly speaking, you need SATA/SSD for NAS and not just slow SDHC cards. Pity you don't have an option to give a try to cluster of for example free scale i.MX3 Quick Start Boards or new i.MX6. That may be better than pandas IMHO.
              The Cubieboard is probably the best solution for NAS, cheap (50$), open hardware, open software, SATA connector but still some closed part in firmware, as on every SoC. Use Allwinner A10, that is not the more powerfull processor (1xcortexA9 @1Ghz, but good enough to manage fs, and even 1 core Mali400 for the 3D and CedarX for really powerfull video hardware decoding (quadHD). You can alos findi this SoC in Melee2000 that have 1GB of RAM too and is a dedicated videobox.


              Else if you need power, the best solution, is probably to use the first CortexA15 out, the Samsung Exynos 5250, that is on a more expansive board (~250$), this board also have SATA port and Samsung is involved in Linux kernel développement for their hardware support, that's another good point.