Announcement

Collapse
No announcement yet.

AMD Opteron 2356 Dual Quad-Core

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Opteron 2356 Dual Quad-Core

    Phoronix: AMD Opteron 2356 Dual Quad-Core

    When looking at the AMD Phenom 9500 under Linux, we had found this processor had posed a number of issues from kernel panics to other troubles when running Ubuntu 7.10 with the Linux 2.6.22 kernel. Once, however, upgrading to Ubuntu 8.04 with the Linux 2.6.24 kernel these problems had vanished and we were pleased by this native quad-core desktop processor from AMD. Released a month prior to the first Phenom desktop CPUs were the quad-core Opteron 2300 "Barcelona" processors. We hadn't looked at any AMD Barcelona processors at that time, but today we finally have our hands on two of the new AMD Opteron 2356 server/workstation processors. The Opteron 2356 CPUs come clocked at 2.30GHz, and is a revision B3 Opteron meaning that it has a proper fix for the TLB erratum -- this model was introduced only earlier this month. We have benchmarked the new Opteron 2356 in both single and dual CPU configurations and have compared the results -- under Linux -- to two of Intel's quad-core Xeon processors.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    hmm, seems like amd should go more for pure cpu power now, not tech, cause they BASH intel at tech .

    Comment


    • #3
      As for the lower Nexuiz score with two CPUs: I think that's mostly thanks to NUMA (Non Uniform Memory Architecture). In theory the system should allocate memory in memory areas attached to the memory controller built into the CPU the thread/process is supposed to run on (AMD CPUs have an integrated memory controller, just as reminder). However, if the memory gets allocated on CPU A, but the thread is moved to a core on CPU B, all memory accesses have to pass through the HyperTransport connection to CPU A, inducing additional latency and smaller bandwidth (you can see the effect in the memory benchmarks, too).

      The scheduler should (if possible) take care to not move threads to a CPU with only remote memory access. Is the Ubuntu 8.04 standard kernel NUMA-aware?
      Last edited by SavageX; 15 April 2008, 09:28 AM.

      Comment


      • #4
        sad, that we do not see a

        cat /proc/cpuinfo

        and a

        cat /proc/interrupts

        of this baby!

        Comment


        • #5
          processor : 0
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 0
          siblings : 4
          core id : 0
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4609.12
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 1
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 0
          siblings : 4
          core id : 1
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4605.75
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 2
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 0
          siblings : 4
          core id : 2
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.42
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 3
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 0
          siblings : 4
          core id : 3
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.44
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 4
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 1
          siblings : 4
          core id : 0
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.31
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 5
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 1
          siblings : 4
          core id : 1
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.32
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 6
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 1
          siblings : 4
          core id : 2
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.33
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate

          processor : 7
          vendor_id : AuthenticAMD
          cpu family : 16
          model : 2
          model name : Quad-Core AMD Opteron(tm) Processor 2356
          stepping : 3
          cpu MHz : 2300.093
          cache size : 512 KB
          physical id : 1
          siblings : 4
          core id : 3
          cpu cores : 4
          fpu : yes
          fpu_exception : yes
          cpuid level : 5
          wp : yes
          flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
          bogomips : 4600.32
          TLB size : 1024 4K pages
          clflush size : 64
          cache_alignment : 64
          address sizes : 48 bits physical, 48 bits virtual
          power management: ts ttp tm stc 100mhzsteps hwpstate
          Michael Larabel
          https://www.michaellarabel.com/

          Comment


          • #6
            Originally posted by SavageX View Post
            As for the lower Nexuiz score with two CPUs: I think that's mostly thanks to NUMA (Non Uniform Memory Architecture). In theory the system should allocate memory in memory areas attached to the memory controller built into the CPU the thread/process is supposed to run on (AMD CPUs have an integrated memory controller, just as reminder). However, if the memory gets allocated on CPU A, but the thread is moved to a core on CPU B, all memory accesses have to pass through the HyperTransport connection to CPU A, inducing additional latency and smaller bandwidth (you can see the effect in the memory benchmarks, too).

            The scheduler should (if possible) take care to not move threads to a CPU with only remote memory access. Is the Ubuntu 8.04 standard kernel NUMA-aware?
            I'm wondering how many threads do Nexuiz have? Does it spawn as many threads as the number of processing cores the system have?

            Micheal, probably u can verify it by disabling 2 cores in both CPU. Then compare the result with 1 CPU 4 cores. If 2+2 cores is still slower than 4+0, then I think it might be NUMA. (probably can check the kernel's make menuconfig too?)
            Last edited by davidletterboyz; 16 April 2008, 01:32 PM.

            Comment


            • #7
              Originally posted by davidletterboyz View Post
              I'm wondering how many threads do Nexuiz have? Does it spawn as many threads as the number of processing cores the system have?
              It spawns exactly 1 (one) thread. There may be a few tasks during rendering which could potentially be moved into parallel theads, but as of now nothing of that sort materialized.

              Upside of that: On an eight core system power management can put seven cores to sleep. Green open-source gaming fun!

              Comment


              • #8
                Originally posted by SavageX View Post
                It spawns exactly 1 (one) thread. There may be a few tasks during rendering which could potentially be moved into parallel theads, but as of now nothing of that sort materialized.

                Upside of that: On an eight core system power management can put seven cores to sleep. Green open-source gaming fun!
                Oh I see. But then, it still could be the load balancing penalty that caused the 8 cores system to slow down a bit.
                Last edited by davidletterboyz; 17 April 2008, 12:31 PM.

                Comment


                • #9
                  fair???

                  I don't think this is a fair test, ubuntu is binary-based, for a truly fair test, a source-based distro should be used to get optimized performance (ie gentoo/LSF)

                  Comment


                  • #10
                    Originally posted by some-guy View Post
                    I don't think this is a fair test, ubuntu is binary-based, for a truly fair test, a source-based distro should be used to get optimized performance (ie gentoo/LSF)
                    Depending on your environment your CPU may never see "optimized" code during its whole live time. Thus testing plain-vanilla code is relevant for many users.

                    Comment

                    Working...
                    X