Announcement

Collapse
No announcement yet.

Quick, overall system performance suite?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • timtaw
    replied
    Okay, here's another stab at this topic.

    In the last couple of days I tried to gain some more insight into the various tests. Which tests are widely recognized and, ideally, scientifically recognized? Which tests allow long-term comparison and scale well? I've also sampled some of the tests performed by phoronix.com during the past five years in order to see which ones are widely used.

    Taking into account and weighting all these considerations, I came up with this proposal (I've created corresponding test suites but I'm not allowed to create an attachment).

    Again, feedback is welcome!

    Superfast screening
    Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory.

    Processor (single-threaded):
    • SciMark (Computational Test: Fast Fourier Transform): 3 minutes



    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes



    Disk:
    • Flexible IO Tester (Type: Random Read; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes
    • Flexible IO Tester (Type: Random Write; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes



    Memory:
    • RAMspeed SMP (Type: Add; Benchmark: Integer): 6 minutes



    Estimated total runtime: 24 minutes
    Approx. Download size: 0,53 MB
    Approx. installed size: 5,87 MB


    Fast screening
    Fast and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.

    Processor (single-threaded):
    • SciMark (Computational Test: Test All Options): 16 minutes



    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes



    Disk:
    • Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 15 minutes



    Memory:
    • RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes



    Network:
    • Loopback TCP Network Performance: 3 minutes



    Estimated total runtime: 1:03 hours
    Approx. Download size: 0,53 MB
    Approx. installed size: 5,87 MB


    Live screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. This suite is able to be run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime.

    [Contains all tests from 'Fast screening' plus:]

    Processor (single-threaded):
    • FLAC Audio Encoding: 2 minutes



    Processor (multi-threaded):
    • FFTE: 4 minutes
    • ebizzy: 2 minutes
    • BLAKE2: 1 minute
    • John The Ripper (Test: Test All Options): 9 minutes
    • C-Ray: 6 minutes
    • LAME MP3 Encoding: 2 minutes
    • Gzip Compression: 4 minutes
    • Smallpt: 4 minutes
    • Stockfish: 4 minutes



    Disk:
    • PostMark: 17 minutes



    System:
    • Hierarchical INTegration (Test: Test All Options): 51 minutes



    Memory:
    • CacheBench (Test: Test All Options): 20 minutes
    • Stream (Type: Test All Options): 41 minutes



    Estimated total runtime: 3:50 hours
    Approx. Download size: 84 MB
    Approx. installed size: 30 MB


    Standard screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

    [Contains all tests from 'Live screening' plus:]

    Processor (multi-threaded):
    • Dbench (Client Count: 6): 37 minutes
    • SQLite (Test Target: Default Test Directory): 24 minutes



    Disk:
    • OpenSSL: 2 minutes
    • 7-Zip Compression: 4 minutes
    • x264: 3 minutes
    • GraphicsMagick (Operation: Test All Options): 16 minutes
    • Gcrypt Library: 4 minutes
    • GnuPG: 2 minutes
    • Primesieve: 15 minutes



    System:
    • Apache Benchmark: 5 minutes
    • NGINX Benchmark: 4 minutes
    • PostgreSQL pgbench: 1:35 hours



    Estimated total runtime: 7:21 hours
    Approx. Download size: 490 MB
    Approx. installed size: 2000 MB


    Long screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

    [Contains all tests from 'Standard screening' plus:]

    Processor (multi-threaded):
    • HPC Challenge (Test: G-HPL): 52 minutes
    • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes
    • FFTW (Build: Test All Options; Site: 2D FFT Size 32): 54 minutes
    • High Performance Conjugate Gradient: unknown



    Disk:
    • BlogBench (Test: Test All Options): 1:04 hours
    • Iozone (Record Size: Test All Options; File Size: 4GB; Disk Test: Test All Options): 2:13 hours
    • FS-Mark (Test: Test All Options): 1:03 hours



    Estimated total runtime: 14:06 hours
    Approx. Download size: 497 MB
    Approx. installed size: 2050 MB

    Leave a comment:


  • timtaw
    replied
    Well, I'm currently waiting for some feedback from the more experienced users here. I'm still unsure about how the exact options for each test should look like (e.g. take 'fio' - buffer or no buffer? What block size is most representative?) and if some of the tests are redundant (i.e. produce very similiar results and thus if some of them can be removed) or if vital tests that I haven't taken into consideration yet are missing.

    I will build and provide the final profiles, but first I'd like to bring up a discussion on how these profiles should look like and what makes sense and what doesn't make sense.

    Hopefully users with more experience than me can chime in!

    Leave a comment:


  • suberimakuri
    replied
    Hi timtaw , how did you get on with this? Meindata?
    Quick screening looks good, did you ever build a test profile?

    Leave a comment:


  • timtaw
    replied
    Great thread!

    I propose creating four consecutive standardized test sets which build upon each other, meaning that test set 2 should include all tests from test set 1 and so on:
    1. Quick screening: Quick and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.
    2. Small screening: Should be able to run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime
    3. Extensive screening: Adds useful tests where download and on-disk test size do not matter
    4. Long screening: Adds a broad testing range where test runtime does not matter
    5. (Graphics: Maybe some graphics-related tests, but I'll leave that to the experts out there.)

    Goals for these test sets should be:
    1. The tests should be representative for each subsystem
    2. They should be as timeless as possible in order to be comparable to past and future systems (which rules out compiling tests, as comparing compilation performance on Linux 2.6 vs. Linux 4.11 would produce very different results)
    3. They should be quite popular among the community, e.g. on OpenBenchmarking.org, so it is easier to compare to other systems
    4. They should be a healthy mix of theoretical and real-world-usage benchmarks
    5. The tests should be as self-contained as possible, minimizing dependencies on other packages upon installation.

    Question is, which tests are most representative for all four (five) test sets?

    Some ideas, based on this thread and the list of the most downloaded benchmarks (test options in parenthesis):

    Quick screening:
    Processor (single-threaded):
    • SciMark (Computational Test: Test All Options): 16 minutes

    Processor (multi-threaded):
    • C-Ray: 6 minutes
    • 7-Zip Compression: 4 minutes
    • Gzip Compression: 4 minutes

    Disk:
    • Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: Yes; Direct: No; Block Size: 512 KB): 15 minutes

    Memory:
    • RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes

    Network:
    • Loopback TCP Network Performance: 3 minutes



    Estimated total runtime: 1:14 hours
    Approx. Download size: 5 MB
    Approx. installed size: 22 MB



    Small screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • FFTE: 4 minutes
    • BLAKE2: 1 minute
    • CacheBench (Test: Test All Options): 20 minutes
    • Gcrypt Library: 4 minutes
    • GnuPG: 2 minutes
    • GraphicsMagick (Operation: Test All Options): 16 minutes
    • John The Ripper (Test: Test All Options): 9 minutes
    • ebizzy: 2 minutes
    • FLAC Audio Encoding: 2 minutes
    • Himeno Benchmark: 3 minutes

    Disk:
    • PostMark: 17 minutes
    • SQLite (Test Target: Default Test Directory): 24 minutes

    Memory:
    • Stream (Type: Test All Options): 41 minutes

    Network:


    Estimated total runtime: 3:39 hours
    Approx. Download size: 27 MB
    Approx. installed size: 101 MB



    Extensive screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • LAME MP3 Encoding: 2 minutes
    • OpenSSL: 2 minutes
    • x264: 3 minutes

    Disk:
    • BlogBench (Test: Test All Options): 1:04 hours

    Memory:

    Network:

    System:
    • Apache Benchmark: 5 minutes
    • NGINX Benchmark: 4 minutes
    • PostgreSQL pgbench: 1:35 hours


    Estimated total runtime: 6:34 hours
    Approx. Download size: 469 MB
    Approx. installed size: 1950 MB



    Long screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • BYTE Unix Benchmark (Computational Test: Test All Options): 4:15 hours
    • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes
    • Primesieve: 15 minutes

    Disk:
    • FS-Mark (Test: Test All Options): 1:03 hours
    • Iozone (Record Size: Test All Options; File Size: 512MB; Disk Test: Test All Options): 2:13 hours
    • Dbench (Client Count: 6): 37 minutes

    Memory:

    Network:

    System:
    • Hierarchical INTegration (Test: Test All Options): 51 minutes


    Estimated total runtime: 16:27 hours
    Approx. Download size: 493 MB
    Approx. installed size: 2010 MB



    What do you think? Which tests would you add and which tests would you remove? Are the various test options sane? Are the tests representative?

    Leave a comment:


  • mendieta
    replied
    Originally posted by Tijok View Post
    I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

    How can I help, Mendieta?
    Hi Tijok

    Sorry for the delay, crazy week. I'd love to help you with the testing at school. I don't believe this thread ever helped creating a Quick and Generic performance suite for CPU/Disk/Graphics. But I believe we digged enough to let you achieve your goals.

    Could you give a bit more detail?
    • Do you care about all three major components?? (disk/graphics/CPU) or just some of them?
    • Do you install a standard Linux distribution on each machine? Or do you plan to test with a Linux Live USB image, etc?
    • If you care about graphics, do you care about both 2D and 3D?


    This is what we can do:
    • I can create a test for you, up on Openbenchmarking, and help you run against it.
    • You create an Openbenchmarking account for the tests, and select a password that your students would reuse with the same account.
    • Each time you have a new machine, someone goes into Openbenchmarking, locates the latest test in the stream of tests, and they run against it, using the account information for the test.


    The results, if you use the Analyze tab in Openbenchmarking.org, would look like this (note that all these tests look very similar because I am using the same machine with different software/BIOS settings):

    OpenBenchmarking.org, Phoronix Test Suite, Linux benchmarking, automated benchmarking, benchmarking results, benchmarking repository, open source benchmarking, benchmarking test profiles


    You would get disk speed, single-threaded CPU performance, multi-threaded performance and graphics speed, and the average at the end, all normalized to 1 (I can run initially with a low end Laptop and you take over from there)

    Sounds like a plan? Cheers!

    Leave a comment:


  • Michael
    replied
    Originally posted by Tijok View Post
    I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

    How can I help, Mendieta?

    Side note: The
    Code:
    phoronix-test-suite benchmark mendieta-4549-6954-342
    gave me an invalid argument, but
    Code:
    phoronix-test-suite benchmark 1306113-MEND-QUICKBE80
    after
    Code:
    aptitude install libxrender-dev
    is looking good so far!
    The former format was for Phoronix Global (since deprecated several years ago and not supported by modern versions of PTS) as has been replaced by http://OpenBenchmarking.org since Phoronix Test Suite 3.0.

    Is there anything else you're looking out of the testing experience / needs?

    Leave a comment:


  • Tijok
    replied
    Not to resurrect a dead horse, but...

    I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

    How can I help, Mendieta?

    Side note: The
    Code:
    phoronix-test-suite benchmark mendieta-4549-6954-342
    gave me an invalid argument, but
    Code:
    phoronix-test-suite benchmark 1306113-MEND-QUICKBE80
    after
    Code:
    aptitude install libxrender-dev
    is looking good so far!

    Leave a comment:


  • teotwawki
    replied
    PCGA pass / fail

    A long time ago, the PC Gaming Alliance was formed to unify the PC industry in defense of consoles, such as by providing the consumer with some form of assurance that any given PC was up to the job of playing any given game. I.e. a benchmark much like WEI.

    Unfortunately, they hide behind NDAs and to date have never published any recommended hardware specs. nor any compliance test.

    I wonder if any Phoronix readers might also be PCGA members. In which case would it be possible to create a PTS profile that shows PCGA pass/fail compliance without you breaching the terms of the NDA?

    I suspect the whole PGGA may become moot once Steam boxes start showing up as Valve seem pretty keen on keeping things fairly open so their "Good / Better / Best" grading scheme should be easy to define in a PTS profile.

    Leave a comment:


  • mendieta
    replied
    Thank for the suggestion, I actually don't think you can trust Microsoft to set a standard and cooperate with others, when their whole business model relies on breaking compatibility to force you to upgrade, and refusing to cooperate with others. Actually, WEI was dropped already. It was never a good idea, because it had a hardcoded range. So, a supercomputer would get the same score as a decent desktop computer. Which is insane.

    I think we are trying to get something more like geekbench, but based on a (geometric, to remove scaling issues) real world tests, and also including non-trivial disk and graphics components. For instance, I just upgraded my computer, and the geekbench score got 4 times higher. My quickbench tests for CPU are similar (about 4 times faster for single threaded, and 4.5 times faster for multithreaded). But I can also look at the disk speed up (besides faster CPU, faster RAM, and moving from Sata II to Sata III controller on a Sata III SSD, the disk scored 2.5 times higher)

    Thanks, again.

    Leave a comment:


  • teotwawki
    replied
    Hi all, I'm new to this thread so you may have already thought of this and rejected it for some reason but why not make this comparable to the Windows Experience Index score returned from the Windows System Assessment Tool? http://en.wikipedia.org/wiki/Windows...ssessment_Tool

    It's aims were exactly what we're looking for - a test of each sub-system with an over-all score that's quick to understand and compare.
    I'm sure Microsoft would have put in a lot of research into the weights required to yield a simple number that reasonably reflects the actual "feel" or "user experience" of diversely different machines. It's also pretty well documented (Google WSAT) and includes a command line tool that lets you run individual tests which would help in calibrating a PTS version to match it's scores in each sub-system.

    E.g. For the 3D sub-system, use the new Unvanquished tests, then apply a calibrated weighting so it produces the same score that WEI does.

    Microsoft introduced WEI with Vista but seem to have dropped it from Windows 8 and it was never ported to XP. But if they have abandoned it, then they may be willing to open up more details.

    How does the Phoronix Test Suite - User Experience Index sound ?

    P.S. Sorry if I double posted, but first try didn't seem to work & I had to re-type it all - Grr!!!

    Leave a comment:

Working...
X