Quick, overall system performance suite?

timtaw replied

09 June 2017, 10:45 AM
Okay, here's another stab at this topic.

In the last couple of days I tried to gain some more insight into the various tests. Which tests are widely recognized and, ideally, scientifically recognized? Which tests allow long-term comparison and scale well? I've also sampled some of the tests performed by phoronix.com during the past five years in order to see which ones are widely used.

Taking into account and weighting all these considerations, I came up with this proposal (I've created corresponding test suites but I'm not allowed to create an attachment).

Again, feedback is welcome!

Superfast screening
Very fast and trivial test in order to get a first impression of a system's performance: Processor (single-threaded and multi-threaded), Disk and Memory.

Processor (single-threaded):
SciMark (Computational Test: Fast Fourier Transform): 3 minutes

Processor (multi-threaded):
Himeno Benchmark: 3 minutes

Disk:
Flexible IO Tester (Type: Random Read; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes

Flexible IO Tester (Type: Random Write; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 6 minutes

Memory:
RAMspeed SMP (Type: Add; Benchmark: Integer): 6 minutes

Estimated total runtime: 24 minutes
Approx. Download size: 0,53 MB
Approx. installed size: 5,87 MB

Fast screening
Fast and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.

Processor (single-threaded):
SciMark (Computational Test: Test All Options): 16 minutes

Processor (multi-threaded):
Himeno Benchmark: 3 minutes

Disk:
Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: No; Direct: Yes; Block Size: 4 KB): 15 minutes

Memory:
RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes

Network:
Loopback TCP Network Performance: 3 minutes

Estimated total runtime: 1:03 hours
Approx. Download size: 0,53 MB
Approx. installed size: 5,87 MB

Live screening
Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. This suite is able to be run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime.

[Contains all tests from 'Fast screening' plus:]

Processor (single-threaded):
FLAC Audio Encoding: 2 minutes

Processor (multi-threaded):
FFTE: 4 minutes

ebizzy: 2 minutes

BLAKE2: 1 minute

John The Ripper (Test: Test All Options): 9 minutes

C-Ray: 6 minutes

LAME MP3 Encoding: 2 minutes

Gzip Compression: 4 minutes

Smallpt: 4 minutes

Stockfish: 4 minutes

Disk:
PostMark: 17 minutes

System:
Hierarchical INTegration (Test: Test All Options): 51 minutes

Memory:
CacheBench (Test: Test All Options): 20 minutes

Stream (Type: Test All Options): 41 minutes

Estimated total runtime: 3:50 hours
Approx. Download size: 84 MB
Approx. installed size: 30 MB

Standard screening
Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of the required on-disk-size this suite is intended to be installed on a target system where download size and on-disk test size do not matter.

[Contains all tests from 'Live screening' plus:]

Processor (multi-threaded):
Dbench (Client Count: 6): 37 minutes

SQLite (Test Target: Default Test Directory): 24 minutes

Disk:
OpenSSL: 2 minutes

7-Zip Compression: 4 minutes

x264: 3 minutes

GraphicsMagick (Operation: Test All Options): 16 minutes

Gcrypt Library: 4 minutes

GnuPG: 2 minutes

Primesieve: 15 minutes

System:
Apache Benchmark: 5 minutes

NGINX Benchmark: 4 minutes

PostgreSQL pgbench: 1:35 hours

Estimated total runtime: 7:21 hours
Approx. Download size: 490 MB
Approx. installed size: 2000 MB

Long screening
Representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network. Because of of the required on-disk-size this suite is intended to be installed on a target system where download size, on-disk test size and runtime do not matter.

[Contains all tests from 'Standard screening' plus:]

Processor (multi-threaded):
HPC Challenge (Test: G-HPL): 52 minutes

NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes

FFTW (Build: Test All Options; Site: 2D FFT Size 32): 54 minutes

High Performance Conjugate Gradient: unknown

Disk:
BlogBench (Test: Test All Options): 1:04 hours

Iozone (Record Size: Test All Options; File Size: 4GB; Disk Test: Test All Options): 2:13 hours

FS-Mark (Test: Test All Options): 1:03 hours

Estimated total runtime: 14:06 hours
Approx. Download size: 497 MB
Approx. installed size: 2050 MB
Leave a comment:
timtaw replied

06 June 2017, 08:42 AM
Well, I'm currently waiting for some feedback from the more experienced users here. I'm still unsure about how the exact options for each test should look like (e.g. take 'fio' - buffer or no buffer? What block size is most representative?) and if some of the tests are redundant (i.e. produce very similiar results and thus if some of them can be removed) or if vital tests that I haven't taken into consideration yet are missing.

I will build and provide the final profiles, but first I'd like to bring up a discussion on how these profiles should look like and what makes sense and what doesn't make sense.

Hopefully users with more experience than me can chime in!
Leave a comment:
suberimakuri replied

06 June 2017, 03:08 AM
Hi timtaw , how did you get on with this? Meindata?
Quick screening looks good, did you ever build a test profile?
Leave a comment:
timtaw replied

03 June 2017, 04:24 PM
Great thread!

I propose creating four consecutive standardized test sets which build upon each other, meaning that test set 2 should include all tests from test set 1 and so on:
Quick screening: Quick and representative test of all essential subsystems: Processor (single-threaded and multi-threaded), Disk, Memory and Network.

Small screening: Should be able to run from a live medium on systems with an limited amount of RAM, which means the requirements are a small download and small on-disk test size and not-too-long runtime

Extensive screening: Adds useful tests where download and on-disk test size do not matter

Long screening: Adds a broad testing range where test runtime does not matter

(Graphics: Maybe some graphics-related tests, but I'll leave that to the experts out there.)

Goals for these test sets should be:
The tests should be representative for each subsystem

They should be as timeless as possible in order to be comparable to past and future systems (which rules out compiling tests, as comparing compilation performance on Linux 2.6 vs. Linux 4.11 would produce very different results)

They should be quite popular among the community, e.g. on OpenBenchmarking.org, so it is easier to compare to other systems

They should be a healthy mix of theoretical and real-world-usage benchmarks

The tests should be as self-contained as possible, minimizing dependencies on other packages upon installation.

Question is, which tests are most representative for all four (five) test sets?

Some ideas, based on this thread and the list of the most downloaded benchmarks (test options in parenthesis):

Quick screening:
Processor (single-threaded):
SciMark (Computational Test: Test All Options): 16 minutes

Processor (multi-threaded):
C-Ray: 6 minutes

7-Zip Compression: 4 minutes

Gzip Compression: 4 minutes

Disk:
Flexible IO Tester (Type: Test All Options; IO Engine: POSIX AIO; Buffered: Yes; Direct: No; Block Size: 512 KB): 15 minutes

Memory:
RAMspeed SMP (Type: Test All Options; Benchmark: Integer): 26 minutes

Network:
Loopback TCP Network Performance: 3 minutes

Estimated total runtime: 1:14 hours
Approx. Download size: 5 MB
Approx. installed size: 22 MB

Small screening:
Processor (single-threaded):

Processor (multi-threaded):
FFTE: 4 minutes

BLAKE2: 1 minute

CacheBench (Test: Test All Options): 20 minutes

Gcrypt Library: 4 minutes

GnuPG: 2 minutes

GraphicsMagick (Operation: Test All Options): 16 minutes

John The Ripper (Test: Test All Options): 9 minutes

ebizzy: 2 minutes

FLAC Audio Encoding: 2 minutes

Himeno Benchmark: 3 minutes

Disk:
PostMark: 17 minutes

SQLite (Test Target: Default Test Directory): 24 minutes

Memory:
Stream (Type: Test All Options): 41 minutes

Network:

Estimated total runtime: 3:39 hours
Approx. Download size: 27 MB
Approx. installed size: 101 MB

Extensive screening:
Processor (single-threaded):

Processor (multi-threaded):
LAME MP3 Encoding: 2 minutes

OpenSSL: 2 minutes

x264: 3 minutes

Disk:
BlogBench (Test: Test All Options): 1:04 hours

Memory:

Network:

System:
Apache Benchmark: 5 minutes

NGINX Benchmark: 4 minutes

PostgreSQL pgbench: 1:35 hours

Estimated total runtime: 6:34 hours
Approx. Download size: 469 MB
Approx. installed size: 1950 MB

Long screening:
Processor (single-threaded):

Processor (multi-threaded):
BYTE Unix Benchmark (Computational Test: Test All Options): 4:15 hours

NAS Parallel Benchmarks (Test / Class: Test All Options): 39 minutes

Primesieve: 15 minutes

Disk:
FS-Mark (Test: Test All Options): 1:03 hours

Iozone (Record Size: Test All Options; File Size: 512MB; Disk Test: Test All Options): 2:13 hours

Dbench (Client Count: 6): 37 minutes

Memory:

Network:

System:
Hierarchical INTegration (Test: Test All Options): 51 minutes

Estimated total runtime: 16:27 hours
Approx. Download size: 493 MB
Approx. installed size: 2010 MB

What do you think? Which tests would you add and which tests would you remove? Are the various test options sane? Are the tests representative?
Leave a comment:
mendieta replied

24 May 2014, 01:26 PM
Originally posted by Tijok View Post

I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

How can I help, Mendieta?

Hi Tijok

Sorry for the delay, crazy week. I'd love to help you with the testing at school. I don't believe this thread ever helped creating a Quick and Generic performance suite for CPU/Disk/Graphics. But I believe we digged enough to let you achieve your goals.

Could you give a bit more detail?

Do you care about all three major components?? (disk/graphics/CPU) or just some of them?

Do you install a standard Linux distribution on each machine? Or do you plan to test with a Linux Live USB image, etc?

If you care about graphics, do you care about both 2D and 3D?

This is what we can do:

I can create a test for you, up on Openbenchmarking, and help you run against it.

You create an Openbenchmarking account for the tests, and select a password that your students would reuse with the same account.

Each time you have a new machine, someone goes into Openbenchmarking, locates the latest test in the stream of tests, and they run against it, using the account information for the test.

The results, if you use the Analyze tab in Openbenchmarking.org, would look like this (note that all these tests look very similar because I am using the same machine with different software/BIOS settings):

QUICKBENCH-V7 Benchmarks - OpenBenchmarking.org

http://openbenchmarking.org/result/1402157-MEND-140215901&obr_sgm=y&obr_nor=y&obr_hgv=Western+Digital+WD1001FALS-0

OpenBenchmarking.org, Phoronix Test Suite, Linux benchmarking, automated benchmarking, benchmarking results, benchmarking repository, open source benchmarking, benchmarking test profiles

You would get disk speed, single-threaded CPU performance, multi-threaded performance and graphics speed, and the average at the end, all normalized to 1 (I can run initially with a low end Laptop and you take over from there)

Sounds like a plan? Cheers!
Leave a comment:
Michael replied

24 May 2014, 10:00 AM
Originally posted by Tijok View Post

I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

How can I help, Mendieta?

Side note: The

Code:

phoronix-test-suite benchmark mendieta-4549-6954-342

gave me an invalid argument, but

Code:

phoronix-test-suite benchmark 1306113-MEND-QUICKBE80

after

Code:

aptitude install libxrender-dev

is looking good so far!

The former format was for Phoronix Global (since deprecated several years ago and not supported by modern versions of PTS) as has been replaced by http://OpenBenchmarking.org since Phoronix Test Suite 3.0.

Is there anything else you're looking out of the testing experience / needs?
Leave a comment:
Tijok replied

22 May 2014, 02:18 PM
Not to resurrect a dead horse, but...

I work at a small school district, and have access to a very large range of machines from Pentium 3's to dual Xeon workstations. This test is exactly what I am looking for to have a quick and standardized way for myself and my students to assess the quality of donations, older hardware, and new low power options under consideration.

How can I help, Mendieta?

Side note: The

Code:

phoronix-test-suite benchmark mendieta-4549-6954-342

gave me an invalid argument, but

Code:

phoronix-test-suite benchmark 1306113-MEND-QUICKBE80

after

Code:

aptitude install libxrender-dev

is looking good so far!
Leave a comment:
teotwawki replied

18 July 2013, 07:08 AM
PCGA pass / fail

A long time ago, the PC Gaming Alliance was formed to unify the PC industry in defense of consoles, such as by providing the consumer with some form of assurance that any given PC was up to the job of playing any given game. I.e. a benchmark much like WEI.

pcgamingalliance.org - contact with domain owner | Epik.com

http://pcgamingalliance.org/

Contact with an owner of pcgamingalliance.org domain name.

Unfortunately, they hide behind NDAs and to date have never published any recommended hardware specs. nor any compliance test.

I wonder if any Phoronix readers might also be PCGA members. In which case would it be possible to create a PTS profile that shows PCGA pass/fail compliance without you breaching the terms of the NDA?

I suspect the whole PGGA may become moot once Steam boxes start showing up as Valve seem pretty keen on keeping things fairly open so their "Good / Better / Best" grading scheme should be easy to define in a PTS profile.
Leave a comment:
mendieta replied

15 July 2013, 03:24 PM
Thank for the suggestion, I actually don't think you can trust Microsoft to set a standard and cooperate with others, when their whole business model relies on breaking compatibility to force you to upgrade, and refusing to cooperate with others. Actually, WEI was dropped already. It was never a good idea, because it had a hardcoded range. So, a supercomputer would get the same score as a decent desktop computer. Which is insane.

I think we are trying to get something more like geekbench, but based on a (geometric, to remove scaling issues) real world tests, and also including non-trivial disk and graphics components. For instance, I just upgraded my computer, and the geekbench score got 4 times higher. My quickbench tests for CPU are similar (about 4 times faster for single threaded, and 4.5 times faster for multithreaded). But I can also look at the disk speed up (besides faster CPU, faster RAM, and moving from Sata II to Sata III controller on a Sata III SSD, the disk scored 2.5 times higher)

Thanks, again.
Leave a comment:
teotwawki replied

11 July 2013, 04:34 AM
Hi all, I'm new to this thread so you may have already thought of this and rejected it for some reason but why not make this comparable to the Windows Experience Index score returned from the Windows System Assessment Tool? http://en.wikipedia.org/wiki/Windows...ssessment_Tool

It's aims were exactly what we're looking for - a test of each sub-system with an over-all score that's quick to understand and compare.
I'm sure Microsoft would have put in a lot of research into the weights required to yield a simple number that reasonably reflects the actual "feel" or "user experience" of diversely different machines. It's also pretty well documented (Google WSAT) and includes a command line tool that lets you run individual tests which would help in calibrating a PTS version to match it's scores in each sub-system.

E.g. For the 3D sub-system, use the new Unvanquished tests, then apply a calibrated weighting so it produces the same score that WEI does.

Microsoft introduced WEI with Vista but seem to have dropped it from Windows 8 and it was never ported to XP. But if they have abandoned it, then they may be willing to open up more details.

How does the Phoronix Test Suite - User Experience Index sound ?

P.S. Sorry if I double posted, but first try didn't seem to work & I had to re-type it all - Grr!!!
Leave a comment:

Announcement

Quick, overall system performance suite?

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: