Announcement

Collapse
No announcement yet.

New SMP build

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New SMP build

    Hi all

    I'm in the process of replacing my Dual SMP AMD MPX system at the moment. I plan on staying with SMP, but with multi-core now, of course!

    My plan is to build a dual Quad Xeon system with either 4 or 8GB of RAM. I run 50/50 Win XP Pro and a custom Linux distro based loosely on LFS. As the days go by I spend more time with Linux than I do with Windows.

    I have the following on my shopping list so far:

    Tyan S5396A2NRF i5400, S771 x 2, PCI-E (x16), DDR2 ECC 533/667 MHz, SATA II, SATA RAID, E-ATX/SSI
    Intel Xeon E5420A Quad Core, S771, Harpertown Core, 2.5GHz, FSB 1333MHz, 12MB Cache
    Crucial CT2KIT25672AF667 4GB kit (2GBx2), 240-pin FB-DIMM DDR2 PC2-5300
    Gainward Bliss 8800 GT 1024MB GS NVIDIA 8800 GT, 1024MB TV, DVI-DVI GS PCI-E 2.0, Mem 1900MHz GDDR3, GPU 650MHz
    Enermax EG1000EWLDXX 1000W V2 Galaxy Modular PSU 85% Efficiency EPS12v Triple Quad +24 Rails Silent x2 Fan
    Silverstone SST-TJ10S Temjin Aluminium Tower Chassis in silver, RoHS


    I'd be pleased to read your comments, good and bad. I'm particularly interested if you notice anything glaringly obvious. For example, it's only through reading the Phoronix reviews that I realised that I had to get an SSI-compatible case to support the weighty XEON heatsinks (thank you Phoronix!).

  • #2
    Don't Barcelona-class AMD processors scale better, as far as SMP goes? They have more memory bandwidth, too. My Phenom's pbzip2 results are much better than that of Intel CPUs. The latter fare better in other areas, notably where L2 cache size matters, but since you're building a dual-quad rig…
    Last edited by apaige; 16 May 2008, 05:54 PM.

    Comment


    • #3
      Originally posted by apaige View Post
      Don't Barcelona-class AMD processors scale better, as far as SMP goes? They have more memory bandwidth, too. My Phenom's pbzip2 results are much better than that of Intel CPUs. The latter fare better in other areas, notably where L2 cache size matters, but since you're building a dual-quad rig…
      that is some rather interresting results...


      Comment


      • #4
        Same test (pts 0.5.0), Phenom 9600 (2.3GHz), 4 cores, gcc 4.3.0:
        Code:
        Parallel BZIP2 v1.0.2 - by: Jeff Gilchrist [http://compression.ca]
        [July 25, 2007]             (uses libbzip2 by Julian Seward)
        
                 # CPUs: 4
         BWT Block Size: 500k
        File Block Size: 900k
        -------------------------------------------
                 File #: 1 of 1
             Input Name: bigfile
            Output Name: bigfile.bz2
        
             Input Size: 691505952 bytes
        Compressing data...
            Output Size: 425971060 bytes
        -------------------------------------------
        
             Wall Clock: 38.688440 seconds
        38.7s versus 57s for kte's Phenom 9850 @2.7GHz and 69.8s for khurios' Phenom 9500 @2.2GHz. Maybe his hard drive was the bottleneck, maybe his RAM was set in ganged mode, maybe the TLB bug patch was enabled, I don't know. GCC 4.3.x also provides performance gains with recent processors such as the Phenom and the Core 2 Duo/Quad CPUs.
        But while 20.6s for your C2Q @3.2GHz is nothing to sneeze at, I've seen very different results for more comparable CPUs such as the Q6600. It's hard to find comparable data because the benchmark file changed so often - it'll be easier once pts 1.0 comes out with a definitive file.

        Anyway, I don't have any experience with SMP systems with more than 4 cores. I just read that while AMD has had very little success on the consumer PC front, it currently has an edge in the server and HPC markets. http://www.anandtech.com/weblog/showpost.aspx?i=443
        I guess what's important is to clearly identify the target usage, and determine which offering would perform better in the relevant scenarios. The OP hasn't stated what those would be.

        Comment


        • #5
          AMD does have more bandwidth available yes.

          Comment


          • #6
            Once more, there are very odd results in the PTS database.
            PTS 0.7.0 (latest), multicore benchmarks: Phenom 9850 @2.50GHz vs. Core 2 Quad Q6600 @3.38GHz. The much higher-clocked Intel CPU is slower than the Phenom in ALL benchmarks. That's got to be wrong. And while the p7zip bench result is only mildly lower, the other ones are MUCH lower. There's gotta be a bottleneck somewhere (the hard drive perhaps?). EDIT: then again, the p7zip benchmark doesn't involve the hard drive at all.

            Even the Phenom results are somewhat surprising. My lower-clocked Phenom (200MHz per core slower) scores about 6000 (vs. 4653) with the benchmark settings (i.e. p7zip compiled without optimizations), while the OpenSSL result somewhat matches my expectations (162 on mine vs. 167). With optimizations (gcc 4.3.0, -O3 -march=amdfam10 and the special amd64 makefile present in the package), p7zip scores 6891. Which brings me to another issue I'd like to mention: the lack of optimization in pts builds (but I guess that's for another thread).
            Last edited by apaige; 17 May 2008, 09:53 AM.

            Comment


            • #7
              Many benchmarks use tempfiles. Thats one of the biggest drawbacks when you compare cpu speed.

              Comment


              • #8
                Many of those could be modified to output to /dev/null (that's the case now for the audio encoding profiles). Still, how do you explain the p7zip discrepancies? That benchmark doesn't use tempfiles, as far as I can tell.

                Comment


                • #9
                  p7zip is no good mulitcore benchmark, a dual core [email protected] ghz has already 3800 MIPS. Maybe the C2Q got too hot and throttled down.

                  Comment


                  • #10
                    There a few things I notice about the p7zip benchmark that can dramatically effect it's performance.

                    -scheduler choice
                    -optimization builds
                    -tlb workarounds for phenoms
                    -assembly or non assembly compiles
                    -motherboard chipsets

                    Also the Phenom system listed by apaige in the link also is using a rt kernel.
                    Last edited by deanjo; 17 May 2008, 12:45 PM.

                    Comment

                    Working...
                    X