Announcement

Collapse
No announcement yet.

12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Phoronix: 12-Core ARM Cluster Benchmarked Against Atom, Ivy Bridge, Fusion

    Last week I shared my plans to build a low-cost, 12-core, 30-watt ARMv7 cluster running Ubuntu Linux. The ARM cluster that is built around the PandaBoard ES development boards is now online and producing results... Quite surprising results actually for a low-power Cortex-A9 compute cluster. Results include performance-per-Watt comparisons to Intel Atom and Ivy Bridge processors along with AMD's Fusion APU.

    http://www.phoronix.com/vr.php?view=17473

  • Popolon
    replied
    Originally posted by kgardas View Post
    Hello,
    honestly speaking, you need SATA/SSD for NAS and not just slow SDHC cards. Pity you don't have an option to give a try to cluster of for example free scale i.MX3 Quick Start Boards or new i.MX6. That may be better than pandas IMHO.
    Karel
    The Cubieboard is probably the best solution for NAS, cheap (50$), open hardware, open software, SATA connector but still some closed part in firmware, as on every SoC. Use Allwinner A10, that is not the more powerfull processor (1xcortexA9 @1Ghz, but good enough to manage fs, and even 1 core Mali400 for the 3D and CedarX for really powerfull video hardware decoding (quadHD). You can alos findi this SoC in Melee2000 that have 1GB of RAM too and is a dedicated videobox.

    http://cubieboard.org

    Else if you need power, the best solution, is probably to use the first CortexA15 out, the Samsung Exynos 5250, that is on a more expansive board (~250$), this board also have SATA port and Samsung is involved in Linux kernel d?veloppement for their hardware support, that's another good point.

    http://www.arndaleboard.org/wiki/index.php/Main_Page

    Leave a comment:


  • piwi3910
    replied
    how to build

    anyone knows what cluster implementation michael was running?
    kerrighed? openssi?
    i bought myselve a mini cluster based on the parrallella boards on kickstarter...
    not sure what i'm going to run on them cluster wise.

    Leave a comment:


  • kukreknecmi
    replied
    On ARM's side, a special multicore chip can be developed say a hypotical 12 core A9 which may outperform this cluster. The most specialized version i know is Claxeda Energycore. Which are 4 core A9 with 4mb L2 cache and 4chips on board (16 cores per board).

    On the cost and x86's side, (3770k + z77 board + 4gb ram + psu costs around 470$). So you can buy 2x 3770k system. Which can be underclocked around 0.9v and can be overclocked to 3.9 ghz with this voltage. I guess ARM system will need lots of time to compansate the compute capacity with low power usage as FourDMusic states.

    On the other hand, you could have used a Celeron G530 system on a cheap h61 board where an underclocked G530 consumes as low as 34w under full load (with linpack). So it could have been more interesting.

    The only place where it is reasonable to use ARM core is with high number of cores (which you need a special hard macro). 16-32 core Special Arm chips would have been more efficient and powerfull.

    Leave a comment:


  • kgardas
    replied
    Pandas as a NAS is nonsense

    Hello,
    honestly speaking, you need SATA/SSD for NAS and not just slow SDHC cards. Pity you don't have an option to give a try to cluster of for example free scale i.MX3 Quick Start Boards or new i.MX6. That may be better than pandas IMHO.
    Karel

    Leave a comment:


  • enrico.tagliavini
    replied
    Michael you should try to include the High Performance Linkpack benchmark (the one used in the top 500) for MPI related runs. Unluck this is not trivial. The Atlas library must be recompiled to be tuned for the hardware (otherwise you can have a very very big performance loss, like 50%), otherwise an hardware specific BLAS implementation must be used (for intel hardware there is MKL, but for ARM i don't think there is one). The ethernet connection and the ammount of RAM might be a bottleneck. More RAM is available, more HPL is efficent. It is not the best bench for FLOPS, but it is the sad standard

    Anyway it would be very very nice to have the HPL included in PTS if possible. The atlas compilation mighe just be added to the beginning of the test. It takes hours but it is the only way to have decent performances.

    Leave a comment:


  • Cyborg16
    replied
    Originally posted by FourDMusic View Post
    Looking a the costs on ebay, we could build a bare-bones cluster:
    - amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
    - 4 gb ram: $50
    - 64gb ocz ssd: $60

    TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

    The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

    280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

    I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.
    This is a cost comparison ? relevent, except that I don't think anybody would seriously use 2-core fully-equiped boards. The ARM chips themselves are cheap (I heard prices of $5-7 somewhere; not certain about it). Stripping a lot of the other stuff off reduces cost and power consumption per CPU. So I think the ARM server boards, when they arrive, will be a lot better in terms of power consumption and price, than this cluster. Ivy Bridge may not look so pretty then, even without A15.

    Leave a comment:


  • allquixotic
    replied
    How many ARM processors would it take to equal the throughput of a 3770K at 4.2 GHz turbo mode with 1666 MHz DDR3?

    Answer: More than there are for sale in your local Verizon store.

    Now try comparing ARM processors to the Xeon E5-2687W... ho boy, look out. It may be Sandy Bridge, but it's a BEAST.

    Leave a comment:


  • Popolon
    replied
    > while that's true OC you will have to wait until around 3rd quarter for these Exynos 5 developer boards to appear in bulk it seems

    Q3 is only in 2 weeks those chips (at least Samsung version) are already available for motherboard dev since 6 month. There are also "rumors of" Galaxy note 2 in october with Cortex A15. But the most important is that Cortex A15 @2Ghz is twice faster than Cortex [email protected],2Ghz and consume less power. Some showed that at last test on A9, Samsung Cortex A9, few month ago was already far faster than pandaboard and every Atom chips available. The phoronix test suite was not optimized at all for ARM and Samsung is far more active in Linux developpement for their products.

    Leave a comment:


  • gururise
    replied
    Originally posted by AJenbo View Post
    You should be able to get a lower power usage by using a single power converter rather than one for each.
    Absolutely, one power supply powering the whole shebang vs. a separate power converter for each will certainly increase efficiency.

    Leave a comment:

Working...
X