Announcement

Collapse
No announcement yet.

Quick, overall system performance suite?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Hi timtaw , how did you get on with this? Meindata?
    Quick screening looks good, did you ever build a test profile?

    Comment


    • #62
      Well, I'm currently waiting for some feedback from the more experienced users here. I'm still unsure about how the exact options for each test should look like (e.g. take 'fio' - buffer or no buffer? What block size is most representative?) and if some of the tests are redundant (i.e. produce very similiar results and thus if some of them can be removed) or if vital tests that I haven't taken into consideration yet are missing.

      I will build and provide the final profiles, but first I'd like to bring up a discussion on how these profiles should look like and what makes sense and what doesn't make sense.

      Hopefully users with more experience than me can chime in!

      Comment


      • #63
        Okay, here's another stab at this topic.

        In the last couple of days I tried to gain some more insight into the various tests. Which tests are widely recognized and, ideally, scientifically recognized? Which tests allow long-term comparison and scale well? I've also sampled some of the tests performed by phoronix.com during the past five years in order to see which ones are widely used.

        Taking into account and weighting all these considerations, I came up with this proposal (I've created corresponding test suites but I'm not allowed to create an attachment).

        Again, feedback is welcome!

        Superfast screening
        Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory.

        Processor (single-threaded):
        • SciMark (Computational Test: Fast Fourier Transform): 3 minutes



        Processor (multi-threaded):
        • Himeno Benchmark: 3 minutes



        Disk:
        • Flexible IO Tester (Type: Random Read; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes
        • Flexible IO Tester (Type: Random Write; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes



        Memory:
        • RAMspeed SMP (Type: Add; Benchmark: Integer): 6 minutes



        Estimated total runtime: 24 minutes
        Approx. Download size: 0,53 MB
        Approx. installed size: 5,87 MB


        Fast screening
        Fast and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.

        Processor (single-threaded):
        • SciMark (Computational Test: Test All Options): 16 minutes



        Processor (multi-threaded):
        • Himeno Benchmark: 3 minutes



        Disk:
        • Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 15 minutes



        Memory:
        • RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes



        Network:
        • Loopback TCP Network Performance: 3 minutes



        Estimated total runtime: 1:03 hours
        Approx. Download size: 0,53 MB
        Approx. installed size: 5,87 MB


        Live screening
        Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. This suite is able to be run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime.

        [Contains all tests from 'Fast screening' plus:]

        Processor (single-threaded):
        • FLAC Audio Encoding: 2 minutes



        Processor (multi-threaded):
        • FFTE: 4 minutes
        • ebizzy: 2 minutes
        • BLAKE2: 1 minute
        • John The Ripper (Test: Test All Options): 9 minutes
        • C-Ray: 6 minutes
        • LAME MP3 Encoding: 2 minutes
        • Gzip Compression: 4 minutes
        • Smallpt: 4 minutes
        • Stockfish: 4 minutes



        Disk:
        • PostMark: 17 minutes



        System:
        • Hierarchical INTegration (Test: Test All Options): 51 minutes



        Memory:
        • CacheBench (Test: Test All Options): 20 minutes
        • Stream (Type: Test All Options): 41 minutes



        Estimated total runtime: 3:50 hours
        Approx. Download size: 84 MB
        Approx. installed size: 30 MB


        Standard screening
        Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

        [Contains all tests from 'Live screening' plus:]

        Processor (multi-threaded):
        • Dbench (Client Count: 6): 37 minutes
        • SQLite (Test Target: Default Test Directory): 24 minutes



        Disk:
        • OpenSSL: 2 minutes
        • 7-Zip Compression: 4 minutes
        • x264: 3 minutes
        • GraphicsMagick (Operation: Test All Options): 16 minutes
        • Gcrypt Library: 4 minutes
        • GnuPG: 2 minutes
        • Primesieve: 15 minutes



        System:
        • Apache Benchmark: 5 minutes
        • NGINX Benchmark: 4 minutes
        • PostgreSQL pgbench: 1:35 hours



        Estimated total runtime: 7:21 hours
        Approx. Download size: 490 MB
        Approx. installed size: 2000 MB


        Long screening
        Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

        [Contains all tests from 'Standard screening' plus:]

        Processor (multi-threaded):
        • HPC Challenge (Test: G-HPL): 52 minutes
        • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes
        • FFTW (Build: Test All Options; Site: 2D FFT Size 32): 54 minutes
        • High Performance Conjugate Gradient: unknown



        Disk:
        • BlogBench (Test: Test All Options): 1:04 hours
        • Iozone (Record Size: Test All Options; File Size: 4GB; Disk Test: Test All Options): 2:13 hours
        • FS-Mark (Test: Test All Options): 1:03 hours



        Estimated total runtime: 14:06 hours
        Approx. Download size: 497 MB
        Approx. installed size: 2050 MB

        Comment


        • #64
          I am happy to have found this thread. Since 13.04 I have been running 1306245-SO-CALCULATE89 before and after Ubuntu release upgrades to make sure the performance has improved or at least not decreased. The problem is that some tests in this test suite can no longer be installed, it takes quite some time and I'm not quite sure of the effectiveness.

          So I see the benefit of having a short/quick standardized test suite to verify that changes lead to improvements or at least not drawbacks. If drawbacks are found i guess a longer or more detailed standardized test suite or subset may help to pinpoint the root cause.
          I guess a short/quick standardized test suite would attract more users and then one could use openbenchmarking.org to do several interesting comparisons:
          • how does similar configurations perform?
          • how would performance change with other hardware or with a different computer?

          I guess when the short/quick standardized test suite is run for the first time on a certain configuration, the user may be requested to run the more detailed standardized test.
          Maybe with much more data it would be possible to find out which tests correlate and then select the most compatible/quick test to replace in the standardized test suite.
          Maybe then even the distros would include the standardized short/quick test as part of distribution upgrades or just periodically to find and fix regressions.

          or is anything of this all ready easily available?

          Comment


          • #65
            Originally posted by fatal-man View Post
            or is anything of this all ready easily available?
            Not that I know of. There is a plethora of benchmarks out there, each with a very different degree of reliability, scalability and trustworthiness.

            According to my research over the past few weeks the following tests are the most (1) widely recognized, (2) scalable and (3) future-proof (e.g. a certain file format may become obsolete in the future or compressing a 2 GB test file may take 30 seconds today, but a few years ahead it may only take milliseconds, so the test should not be based on such simple calculations) tests available on pts:

            Processor (multi-threaded):
            • Himeno Benchmark
            • HPC Challenge (G-HPL - this is the LINPACK that is the base for the TOP500 supercomputer list)
            • High Performance Conjugate Gradient (aims to complement the LINPACK with a new, more practical, metric)
            • FFTE
            • NAS Parallel Benchmarks (seems to be quite old; is it still relevant?)



            Processor (single-threaded):
            • SciMark



            Memory:
            • Stream
            • RAMspeed SMP
            • CacheBench



            Filesystem:
            • Flexible IO Tester
            • Iozone
            • Dbench



            System:
            • Hierarchical INTegration (Practical test that ranks a computer system as a whole, including processors, memory and buses. While almost ancient it is scalable from small serial systems to supercomputers. Almost immune to artificial optimization.)
            • ebizzy



            I totally agree with your remarks. I'm excited about the large result base that Michael created with phoronix and openbenchmarking.org. And I believe that standardized test sets would perfectly complement this existing infrastructure and benefit all. When I initially came to phoronix, my first search was for standard test sets that are representative as I lacked a deeper understanding of each available test. Which tests are important? Which tests do have a large user base so my results are as comparable as possible?
            I assume that most people want a quick way to benchmark their systems so they can compare their machines with other systems. Benchmarking as such is an own science field that most users don't want and need to be bothered with.

            Standardized test sets would not only lend a helping hand to newcomers, it would also enlarge the amount of valid test results that everybody can compare to. There are no drawbacks to that - when somebody wants to tackle a certain aspect of his system that is not covered by standard test sets, he is free to do so anyway!

            I'm currently finishing performing test runs of the test sets I posted earlier. Results look promising although there are some modifications that seem to be practical. I'll report my findings in the next few days on this thread.
            Last edited by timtaw; 19 June 2017, 09:27 AM. Reason: Corrected spelling error

            Comment


            • #66
              Just to let you know, I'm in the last round of extensive testing which will take approx. 2 weeks. I'll report back here.

              Comment


              • #67
                I did some testing over the past few weeks using these profiles. Things are looking good so far; I only did minor modifications:
                • Gzip Compression has been moved to the 'screening-standard' suite, because during the test it writes a large 2 GB file, which is not feasible on live systems
                • FS-Mark has been moved to the 'screening-standard' suite because of its popularity and usefulness
                • Dbench has been moved to the 'screening-long' suite because of its decreasing popularity
                • Fixed wrong assignment of tests to the filesystem category in the 'screening-standard' suite
                • Moved RAMspeed SMP to 'standard' test suite
                • Flexible IO Tester: Change options to Buffered: Yes - Direct: No.
                • Removed FFTW as we already use a similiar test with FFTE.
                • Removed IOzone.
                • PostgreSQL pgbench got an additional test with heavy contention.
                • Added CLOMP.
                • SciMark now uses the COMPOSITE test in the 'superfast' suite.

                I had troubles running some of the tests, but I assume that's not a general problem.

                Accumulated test results of all tests can be found at: [1707218-TIMT-RESULTS99]. Links to results of the various suites are added below.

                Superfast screening [timtaw/screening-superfast]
                Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory. These tests feature a small download and small on-disk test size with very short runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

                Example results: [1707210-TIMT-RESULTS90]

                Processor (single-threaded):
                • SciMark (Computational Test: Fast Fourier Transform): 3 minutes

                Processor (multi-threaded):
                • Himeno Benchmark: 3 minutes

                Memory:
                • Stream (Type: Copy): 10 minutes

                Filesystem:
                • Flexible IO Tester (Type: Random Read - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 6 minutes
                • Flexible IO Tester (Type: Random Write - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 6 minutes

                Estimated total runtime: 28 minutes
                Approx. Download size: 0,5 MB
                Approx. installed size: 4,25 MB


                Fast screening [timtaw/screening-fast]
                Fast and representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. These tests feature a small download and small on-disk test size with short runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

                Example results: [1707217-TIMT-RESULTS18]

                Processor (single-threaded):
                • SciMark (Computational Test: Test All Options): 16 minutes

                Processor (multi-threaded):
                • Himeno Benchmark: 3 minutes

                Processor (massively threaded):
                • C-Ray: 6 minutes

                Memory:
                • Stream (Type: Test All Options): 41 minutes

                Network:
                • Loopback TCP Network Performance: 3 minutes

                Filesystem:
                • Flexible IO Tester (Type: Test All Options - IO Engine: POSIX AIO - Buffered: Yes - Direct: No - Block Size: 4 KB): 15 minutes

                Estimated total runtime: 1:24 hours
                Approx. Download size: 0,7 MB
                Approx. installed size: 11,25 MB


                Light screening [timtaw/screening-light]
                Representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. These tests feature a small download and small on-disk test size with acceptable runtime, which makes them suitable to be run from a live medium on systems with limited amount (at least 4 GB) of RAM. However, note that many tests, especially disk-related tests, do not produce valid results when run from a live medium.

                Example results: [1707217-TIMT-RESULTS52]

                Contains all the tests from the 'Fast screening' plus:

                Processor (single-threaded):
                • LAME MP3 Encoding: 2 minutes
                • FLAC Audio Encoding: 2 minutes

                Processor (multi-threaded):
                • FFTE: 4 minutes
                • ebizzy: 2 minutes
                • BLAKE2: 1 minute
                • Stockfish: 4 minutes

                Processor (massively threaded):
                • John The Ripper (Test: Test All Options): 9 minutes
                • Smallpt: 4 minutes
                • CLOMP: 5 minutes

                Memory:
                • CacheBench (Test: Test All Options): 20 minutes

                System:
                • Hierarchical INTegration (Test: FLOAT): 18 minutes

                Filesystem:
                • PostMark: 17 minutes

                Estimated total runtime: 2:52 hours
                Approx. Download size: 84 MB
                Approx. installed size: 31 MB


                Standard screening [timtaw/screening-standard]
                Representative test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network with a healthy mix of theoretical and practical benchmarks. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

                Example results: [1707219-TIMT-RESULTS57]

                Contains all the tests from the 'Light screening' plus:

                Processor (multi-threaded):
                • OpenSSL: 2 minutes
                • GraphicsMagick (Operation: Test All Options): 16 minutes
                • Gcrypt Library: 4 minutes
                • GnuPG: 2 minutes
                • Gzip Compression: 4 minutes

                Processor (massively threaded):
                • 7-Zip Compression: 4 minutes
                • x264: 3 minutes
                • Primesieve: 15 minutes

                Memory:
                • RAMspeed SMP (Type: Test All Options - Benchmark: Test All Options): 50 minutes

                System:
                • Apache Benchmark: 5 minutes
                • NGINX Benchmark: 4 minutes

                Filesystem:
                • FS-Mark (Test: Test All Options): 1:03 hours
                • SQLite (Test Target: Default Test Directory): 24 minutes

                Estimated total runtime: 6:08 hours
                Approx. Download size: 470 MB
                Approx. installed size: 14,1 GB


                Long screening [timtaw/screening-long]
                Extensive test of all essential subsystems: Processor (single-threaded, multi-threaded and massively threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

                Example results: [1707213-TIMT-RESULTS21]

                Contains all the tests from the 'Standard screening' plus:

                Processor (multi-threaded):
                • HPC Challenge (Test / Class: G-HPL): 52 minutes
                • High Performance Conjugate Gradient: unknown
                • NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes

                System:
                • PostgreSQL pgbench (Scaling: Test All Options - Test: Normal Load - Mode: Test All Options): 2:10 hours
                • PostgreSQL pgbench (Scaling: Test All Options - Test: Heavy Contention - Mode: Test All Options): 2:10 hours

                Filesystem:
                • BlogBench (Test: Test All Options): 1:04 hours
                • Dbench (Client Count: 6): 37 minutes
                • Dbench (Client Count: 256): 37 minutes

                Estimated total runtime: 14:17 hours
                Approx. Download size: 513 MB
                Approx. installed size: 16,2 GB

                Comment


                • #68
                  Just for your information, I just updated the 'Superfast screening' suite which now includes C-Ray as a test for massively threaded processor speed. I also created tests for live systems which are intended to be run from a live medium (e.g. a thumb drive) and match the corresponding 'normal' screening suites, but exclude tests which produce invalid results on live systems:

                  Superfast screening (live) [timtaw/screening-live-superfast]
                  Fast screening (live) [timtaw/screening-live-fast]
                  Light screening (live) [timtaw/screening-live-light]

                  Comment


                  • #69
                    I think kernel compilation is one of the best benchmarks, an allmodconfig or defconfig. It's RAM speed/latency sensitive and storage speed/latency sensitive, and it will stress more parts of the CPU than I would say most other benchmarks. It's also relatively consistent over time. For GPU/Math, foldingathome runs on AVX1/2/3 and most GPUs, you know it's optimized, but it's not I/O sensitive. Maybe a Civ game for that. Just my .02 on simple benchmarking.

                    Comment


                    • #70
                      Originally posted by audi100quattro View Post
                      It's also relatively consistent over time.
                      Are you sure about that? Would the compilation of Kernel 2.6.28 and 4.16 lead to the same result on the same system?

                      Comment

                      Working...
                      X