Announcement

Collapse
No announcement yet.

Quick, overall system performance suite?

Collapse
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mendieta
    started a topic Quick, overall system performance suite?

    Quick, overall system performance suite?

    Hi

    I am getting a new computer soon, and of course I am drooling about overclocking and stuff ;-)

    Long story short, I installed PTS in my old machine to start playing. I think the easy access to global and how you can make comparisons with online results is incredibly cool and useful.

    What I found lacking is in usability for someone who wants a quick test. Anyone used geekbench? It is a pleasure. Click, download, run, and you get a score for your machine. And you get in a minute or two in a slow machine. Granted, it is lacking disk and graphics performance, but I think we could use something like this. I have some ideas, and I am planning to post in the sticky thread for this forum (PTS). But I wonder if such a quick and to the point suite already exists? I saw a bunch of "sys performance" suites, but they involve 100Mb or more of downloads, and many dozen minutes of runtime. Am I missing something?

    Thanks so much Michael and all for the great work!

  • timtaw
    replied
    Just for your information, I just updated the 'Superfast screening' suite which now includes C-Ray as a test for massively threaded processor speed. I also created tests for live systems which are intended to be run from a live medium (e.g. a thumb drive) and match the corresponding 'normal' screening suites, but exclude tests which produce invalid results on live systems:

    Superfast screening (live) [timtaw/screening-live-superfast]
    Fast screening (live) [timtaw/screening-live-fast]
    Light screening (live) [timtaw/screening-live-light]

    Leave a comment:


  • timtaw
    replied
    I did some testing over the past few weeks using these profiles. Things are looking good so far; I only did minor modifications:
    • Gzip Compression has been moved to the 'screening-standard' suite, because during the test it writes a large 2 GB file, which is not feasible on live systems
    • FS-Mark has been moved to the 'screening-standard' suite because of its popularity and usefulness
    • Dbench has been moved to the 'screening-long' suite because of its decreasing popularity
    • Fixed wrong assignment of tests to the filesystem category in the 'screening-standard' suite
    • Moved RAMspeed SMP to 'standard' test suite
    • Flexible IO Tester: Change options to Buffered: Yes - Direct: No.
    • Removed FFTW as we already use a similiar test with FFTE.
    • Removed IOzone.
    • PostgreSQL pgbench got an additional test with heavy contention.
    • Added CLOMP.
    • SciMark now uses the COMPOSITE test in the 'superfast' suite.

    I had troubles running some of the tests, but I assume that's not a general problem.

    Accumulated test results of all tests can be found at: [1707218-TIMT-RESULTS99]. Links to results of the various suites are added below.

    Superfast screening [timtaw/screening-superfast]
    Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory. These tests feature a small download and small on-disk test size with very short runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

    Example results: [1707210-TIMT-RESULTS90]

    Processor (single-threaded):
    • SciMark (Computational Test: Fast Fourier Transform): 3 minutes

    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes

    Memory:
    • Stream (Type: Copy): 10 minutes

    Filesystem:
    • Flexible IO Tester (Type: Random Read - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 6 minutes
    • Flexible IO Tester (Type: Random Write - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 6 minutes

    Estimated total runtime: 28 minutes
    Approx. Download size: 0,5 MB
    Approx. installed size: 4,25 MB


    Fast screening [timtaw/screening-fast]
    Fast and representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. These tests feature a small download and small on-disk test size with short runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

    Example results: [1707217-TIMT-RESULTS18]

    Processor (single-threaded):
    • SciMark (Computational Test: Test All Options): 16 minutes

    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes

    Processor (massively threaded):
    • C-Ray: 6 minutes

    Memory:
    • Stream (Type: Test All Options): 41 minutes

    Network:
    • Loopback TCP Network Performance: 3 minutes

    Filesystem:
    • Flexible IO Tester (Type: Test All Options - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 15 minutes

    Estimated total runtime: 1:24 hours
    Approx. Download size: 0,7 MB
    Approx. installed size: 11,25 MB


    Light screening [timtaw/screening-light]
    Representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. These tests feature a small download and small on-disk test size with acceptable runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

    Example results: [1707217-TIMT-RESULTS52]

    Contains all the tests from the 'Fast screening' plus:

    Processor (single-threaded):
    • LAME MP3 Encoding: 2 minutes
    • FLAC Audio Encoding: 2 minutes

    Processor (multi-threaded):
    • FFTE: 4 minutes
    • ebizzy: 2 minutes
    • BLAKE2: 1 minute
    • Stockfish: 4 minutes

    Processor (massively threaded):
    • John The Ripper (Test: Test All Options): 9 minutes
    • Smallpt: 4 minutes
    • CLOMP: 5 minutes

    Memory:
    • CacheBench (Test: Test All Options): 20 minutes

    System:
    • Hierarchical INTegration (Test: FLOAT): 18 minutes

    Filesystem:
    • PostMark: 17 minutes

    Estimated total runtime: 2:52 hours
    Approx. Download size: 84 MB
    Approx. installed size: 31 MB


    Standard screening [timtaw/screening-standard]
    Representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network with a healthy mix of theoretical and practical benchmarks. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

    Example results: [1707219-TIMT-RESULTS57]

    Contains all the tests from the 'Light screening' plus:

    Processor (multi-threaded):
    • OpenSSL: 2 minutes
    • GraphicsMagick (Operation: Test All Options): 16 minutes
    • Gcrypt Library: 4 minutes
    • GnuPG: 2 minutes
    • Gzip Compression: 4 minutes

    Processor (massively threaded):
    • 7-Zip Compression: 4 minutes
    • x264: 3 minutes
    • Primesieve: 15 minutes

    Memory:
    • RAMspeed SMP (Type: Test All Options - Benchmark: Test All Options): 50 minutes

    System:
    • Apache Benchmark: 5 minutes
    • NGINX Benchmark: 4 minutes

    Filesystem:
    • FS-Mark (Test: Test All Options): 1:03 hours
    • SQLite (Test Target: Default Test Directory): 24 minutes

    Estimated total runtime: 6:08 hours
    Approx. Download size: 470 MB
    Approx. installed size: 14,1 GB


    Long screening [timtaw/screening-long]
    Extensive test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

    Example results: [1707213-TIMT-RESULTS21]

    Contains all the tests from the 'Standard screening' plus:

    Processor (multi-threaded):
    • HPC Challenge (Test / Class: G-HPL): 52 minutes
    • High Performance Conjugate Gradient: unknown
    • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes

    System:
    • PostgreSQL pgbench (Scaling: Test All Options - Test: Normal Load - Mode: Test All Options): 2:10 hours
    • PostgreSQL pgbench (Scaling: Test All Options - Test: Heavy Contention - Mode: Test All Options): 2:10 hours

    Filesystem:
    • BlogBench (Test: Test All Options): 1:04 hours
    • Dbench (Client Count: 6): 37 minutes
    • Dbench (Client Count: 256): 37 minutes

    Estimated total runtime: 14:17 hours
    Approx. Download size: 513 MB
    Approx. installed size: 16,2 GB

    Leave a comment:


  • timtaw
    replied
    Just to let you know, I'm in the last round of extensive testing which will take approx. 2 weeks. I'll report back here.

    Leave a comment:


  • timtaw
    replied
    Originally posted by fatal-man View Post
    or is anything of this all ready easily available?
    Not that I know of. There is a plethora of benchmarks out there, each with a very different degree of reliability, scalability and trustworthiness.

    According to my research over the past few weeks the following tests are the most (1) widely recognized, (2) scalable and (3) future-proof (e.g. a certain file format may become obsolete in the future or compressing a 2 GB test file may take 30 seconds today, but a few years ahead it may only take milliseconds, so the test should not be based on such simple calculations) tests available on pts:

    Processor (multi-threaded):
    • Himeno Benchmark
    • HPC Challenge (G-HPL - this is the LINPACK that is the base for the TOP500 supercomputer list)
    • High Performance Conjugate Gradient (aims to complement the LINPACK with a new, more practical, metric)
    • FFTE
    • NAS Parallel Benchmarks (seems to be quite old; is it still relevant?)



    Processor (single-threaded):
    • SciMark



    Memory:
    • Stream
    • RAMspeed SMP
    • CacheBench



    Filesystem:
    • Flexible IO Tester
    • Iozone
    • Dbench



    System:
    • Hierarchical INTegration (Practical test that ranks a computer system as a whole, including processors, memory and buses. While almost ancient it is scalable from small serial systems to supercomputers. Almost immune to artificial optimization.)
    • ebizzy



    I totally agree with your remarks. I'm excited about the large result base that Michael created with phoronix and openbenchmarking.org. And I believe that standardized test sets would perfectly complement this existing infrastructure and benefit all. When I initially came to phoronix, my first search was for standard test sets that are representative as I lacked a deeper understanding of each available test. Which tests are important? Which tests do have a large user base so my results are as comparable as possible?
    I assume that most people want a quick way to benchmark their systems so they can compare their machines with other systems. Benchmarking as such is an own science field that most users don't want and need to be bothered with.

    Standardized test sets would not only lend a helping hand to newcomers, it would also enlarge the amount of valid test results that everybody can compare to. There are no drawbacks to that - when somebody wants to tackle a certain aspect of his system that is not covered by standard test sets, he is free to do so anyway!

    I'm currently finishing performing test runs of the test sets I posted earlier. Results look promising although there are some modifications that seem to be practical. I'll report my findings in the next few days on this thread.
    Last edited by timtaw; 06-19-2017, 09:27 AM. Reason: Corrected spelling error

    Leave a comment:


  • fatal-man
    replied
    I am happy to have found this thread. Since 13.04 I have been running 1306245-SO-CALCULATE89 before and after Ubuntu release upgrades to make sure the performance has improved or at least not decreased. The problem is that some tests in this test suite can no longer be installed, it takes quite some time and I'm not quite sure of the effectiveness.

    So I see the benefit of having a short/quick standardized test suite to verify that changes lead to improvements or at least not drawbacks. If drawbacks are found i guess a longer or more detailed standardized test suite or subset may help to pinpoint the root cause.
    I guess a short/quick standardized test suite would attract more users and then one could use openbenchmarking.org to do several interesting comparisons:
    • how does similar configurations perform?
    • how would performance change with other hardware or with a different computer?

    I guess when the short/quick standardized test suite is run for the first time on a certain configuration, the user may be requested to run the more detailed standardized test.
    Maybe with much more data it would be possible to find out which tests correlate and then select the most compatible/quick test to replace in the standardized test suite.
    Maybe then even the distros would include the standardized short/quick test as part of distribution upgrades or just periodically to find and fix regressions.

    or is anything of this all ready easily available?

    Leave a comment:


  • timtaw
    replied
    Okay, here's another stab at this topic.

    In the last couple of days I tried to gain some more insight into the various tests. Which tests are widely recognized and, ideally, scientifically recognized? Which tests allow long-term comparison and scale well? I've also sampled some of the tests performed by phoronix.com during the past five years in order to see which ones are widely used.

    Taking into account and weighting all these considerations, I came up with this proposal (I've created corresponding test suites but I'm not allowed to create an attachment).

    Again, feedback is welcome!

    Superfast screening
    Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory.

    Processor (single-threaded):
    • SciMark (Computational Test: Fast Fourier Transform): 3 minutes



    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes



    Disk:
    • Flexible IO Tester (Type: Random Read; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes
    • Flexible IO Tester (Type: Random Write; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes



    Memory:
    • RAMspeed SMP (Type: Add; Benchmark: Integer): 6 minutes



    Estimated total runtime: 24 minutes
    Approx. Download size: 0,53 MB
    Approx. installed size: 5,87 MB


    Fast screening
    Fast and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.

    Processor (single-threaded):
    • SciMark (Computational Test: Test All Options): 16 minutes



    Processor (multi-threaded):
    • Himeno Benchmark: 3 minutes



    Disk:
    • Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 15 minutes



    Memory:
    • RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes



    Network:
    • Loopback TCP Network Performance: 3 minutes



    Estimated total runtime: 1:03 hours
    Approx. Download size: 0,53 MB
    Approx. installed size: 5,87 MB


    Live screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. This suite is able to be run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime.

    [Contains all tests from 'Fast screening' plus:]

    Processor (single-threaded):
    • FLAC Audio Encoding: 2 minutes



    Processor (multi-threaded):
    • FFTE: 4 minutes
    • ebizzy: 2 minutes
    • BLAKE2: 1 minute
    • John The Ripper (Test: Test All Options): 9 minutes
    • C-Ray: 6 minutes
    • LAME MP3 Encoding: 2 minutes
    • Gzip Compression: 4 minutes
    • Smallpt: 4 minutes
    • Stockfish: 4 minutes



    Disk:
    • PostMark: 17 minutes



    System:
    • Hierarchical INTegration (Test: Test All Options): 51 minutes



    Memory:
    • CacheBench (Test: Test All Options): 20 minutes
    • Stream (Type: Test All Options): 41 minutes



    Estimated total runtime: 3:50 hours
    Approx. Download size: 84 MB
    Approx. installed size: 30 MB


    Standard screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

    [Contains all tests from 'Live screening' plus:]

    Processor (multi-threaded):
    • Dbench (Client Count: 6): 37 minutes
    • SQLite (Test Target: Default Test Directory): 24 minutes



    Disk:
    • OpenSSL: 2 minutes
    • 7-Zip Compression: 4 minutes
    • x264: 3 minutes
    • GraphicsMagick (Operation: Test All Options): 16 minutes
    • Gcrypt Library: 4 minutes
    • GnuPG: 2 minutes
    • Primesieve: 15 minutes



    System:
    • Apache Benchmark: 5 minutes
    • NGINX Benchmark: 4 minutes
    • PostgreSQL pgbench: 1:35 hours



    Estimated total runtime: 7:21 hours
    Approx. Download size: 490 MB
    Approx. installed size: 2000 MB


    Long screening
    Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

    [Contains all tests from 'Standard screening' plus:]

    Processor (multi-threaded):
    • HPC Challenge (Test: G-HPL): 52 minutes
    • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes
    • FFTW (Build: Test All Options; Site: 2D FFT Size 32): 54 minutes
    • High Performance Conjugate Gradient: unknown



    Disk:
    • BlogBench (Test: Test All Options): 1:04 hours
    • Iozone (Record Size: Test All Options; File Size: 4GB; Disk Test: Test All Options): 2:13 hours
    • FS-Mark (Test: Test All Options): 1:03 hours



    Estimated total runtime: 14:06 hours
    Approx. Download size: 497 MB
    Approx. installed size: 2050 MB

    Leave a comment:


  • timtaw
    replied
    Well, I'm currently waiting for some feedback from the more experienced users here. I'm still unsure about how the exact options for each test should look like (e.g. take 'fio' - buffer or no buffer? What block size is most representative?) and if some of the tests are redundant (i.e. produce very similiar results and thus if some of them can be removed) or if vital tests that I haven't taken into consideration yet are missing.

    I will build and provide the final profiles, but first I'd like to bring up a discussion on how these profiles should look like and what makes sense and what doesn't make sense.

    Hopefully users with more experience than me can chime in!

    Leave a comment:


  • suberimakuri
    replied
    Hi timtaw , how did you get on with this? Meindata?
    Quick screening looks good, did you ever build a test profile?

    Leave a comment:


  • timtaw
    replied
    Great thread!

    I propose creating four consecutive standardized test sets which build upon each other, meaning that test set 2 should include all tests from test set 1 and so on:
    1. Quick screening: Quick and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.
    2. Small screening: Should be able to run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime
    3. Extensive screening: Adds useful tests where download and on-disk test size do not matter
    4. Long screening: Adds a broad testing range where test runtime does not matter
    5. (Graphics: Maybe some graphics-related tests, but I'll leave that to the experts out there.)

    Goals for these test sets should be:
    1. The tests should be representative for each subsystem
    2. They should be as timeless as possible in order to be comparable to past and future systems (which rules out compiling tests, as comparing compilation performance on Linux 2.6 vs. Linux 4.11 would produce very different results)
    3. They should be quite popular among the community, e.g. on OpenBenchmarking.org, so it is easier to compare to other systems
    4. They should be a healthy mix of theoretical and real-world-usage benchmarks
    5. The tests should be as self-contained as possible, minimizing dependencies on other packages upon installation.

    Question is, which tests are most representative for all four (five) test sets?

    Some ideas, based on this thread and the list of the most downloaded benchmarks (test options in parenthesis):

    Quick screening:
    Processor (single-threaded):
    • SciMark (Computational Test: Test All Options): 16 minutes

    Processor (multi-threaded):
    • C-Ray: 6 minutes
    • 7-Zip Compression: 4 minutes
    • Gzip Compression: 4 minutes

    Disk:
    • Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: Yes; Direct: No; Block Size: 512 KB): 15 minutes

    Memory:
    • RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes

    Network:
    • Loopback TCP Network Performance: 3 minutes



    Estimated total runtime: 1:14 hours
    Approx. Download size: 5 MB
    Approx. installed size: 22 MB



    Small screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • FFTE: 4 minutes
    • BLAKE2: 1 minute
    • CacheBench (Test: Test All Options): 20 minutes
    • Gcrypt Library: 4 minutes
    • GnuPG: 2 minutes
    • GraphicsMagick (Operation: Test All Options): 16 minutes
    • John The Ripper (Test: Test All Options): 9 minutes
    • ebizzy: 2 minutes
    • FLAC Audio Encoding: 2 minutes
    • Himeno Benchmark: 3 minutes

    Disk:
    • PostMark: 17 minutes
    • SQLite (Test Target: Default Test Directory): 24 minutes

    Memory:
    • Stream (Type: Test All Options): 41 minutes

    Network:


    Estimated total runtime: 3:39 hours
    Approx. Download size: 27 MB
    Approx. installed size: 101 MB



    Extensive screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • LAME MP3 Encoding: 2 minutes
    • OpenSSL: 2 minutes
    • x264: 3 minutes

    Disk:
    • BlogBench (Test: Test All Options): 1:04 hours

    Memory:

    Network:

    System:
    • Apache Benchmark: 5 minutes
    • NGINX Benchmark: 4 minutes
    • PostgreSQL pgbench: 1:35 hours


    Estimated total runtime: 6:34 hours
    Approx. Download size: 469 MB
    Approx. installed size: 1950 MB



    Long screening:
    Processor (single-threaded):

    Processor (multi-threaded):
    • BYTE Unix Benchmark (Computational Test: Test All Options): 4:15 hours
    • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes
    • Primesieve: 15 minutes

    Disk:
    • FS-Mark (Test: Test All Options): 1:03 hours
    • Iozone (Record Size: Test All Options; File Size: 512MB; Disk Test: Test All Options): 2:13 hours
    • Dbench (Client Count: 6): 37 minutes

    Memory:

    Network:

    System:
    • Hierarchical INTegration (Test: Test All Options): 51 minutes


    Estimated total runtime: 16:27 hours
    Approx. Download size: 493 MB
    Approx. installed size: 2010 MB



    What do you think? Which tests would you add and which tests would you remove? Are the various test options sane? Are the tests representative?

    Leave a comment:

Working...
X