Announcement

**bridgman** · 11 September 2016, 04:06 PM

Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

That would also work for games etc... where it's not so obvious which portions of which files will be accessed.

**dsmythies** · 11 September 2016, 05:07 PM

Originally posted by bridgman View Post

Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

That would also work for games etc... where it's not so obvious which portions of which files will be accessed.

Good idea. My original thinking was that there might be less wasted time for the copy the file to null method verses a full discard of the first run method (~4 seconds Verses ~14 seconds for my example case).

Here is an example using the default "user-config.xml" file.

**Michael** · 11 September 2016, 05:17 PM

Originally posted by dsmythies View Post

I have noticed that some tests have a poorer result for the first test only, apparently due to some file becoming cached in memory after the first run of the test. As a result, subsequent runs of the test execute faster. This can lead to an incorrect bias in the results, and perhaps incorrect conclusions.

Example:

Code:

Parallel BZIP2 Compression 1.1.12:
pts/compress-pbzip2-1.5.0
Test 1 of 1
Estimated Trial Run Count: 3
Estimated Time To Completion: 1 Minute (11:16 PDT)
Started Run 1 @ 11:16:18
Started Run 2 @ 11:16:33
Started Run 3 @ 11:16:44 [Std. Dev: 19.00%]
Test Results:
13.989195108414
10.206043958664
10.22295999527
Average: 11.47 Seconds

And again:

Code:

Parallel BZIP2 Compression 1.1.12:
pts/compress-pbzip2-1.5.0
Test 1 of 1
Estimated Trial Run Count: 3
Estimated Time To Completion: 1 Minute (11:29 PDT)
Started Run 1 @ 11:29:15
Started Run 2 @ 11:29:26
Started Run 3 @ 11:29:37 [Std. Dev: 0.14%]
Test Results:
10.145808935165
10.117418050766
10.133827924728
Average: 10.13 Seconds

And here is an example test run, where I might conclude that performance mode has some problem. Disclaimer: For increased dramatic effect, the times to run was set to 1.

While it is true that I have backed off the "StandardDeviationThreshold" in my "user-config.xml" file, there is still a result bias, albeit significantly reduced, using the default threshold. It also takes longer for the test to complete (twice as long in this case, as it takes 3 more runs in addition to the default 3 times). Example:

Code:

Parallel BZIP2 Compression 1.1.12:
pts/compress-pbzip2-1.5.0
Test 1 of 1
Estimated Trial Run Count: 3
Estimated Time To Completion: 1 Minute (11:23 PDT)
Started Run 1 @ 11:23:04
Started Run 2 @ 11:23:18
Started Run 3 @ 11:23:29 [Std. Dev: 17.75%]
Started Run 4 @ 11:23:41 [Std. Dev: 15.92%]
Started Run 5 @ 11:23:52 [Std. Dev: 14.58%]
Started Run 6 @ 11:24:03 [Std. Dev: 13.48%]
Test Results:
13.718590974808
10.133217811584
10.308124065399
10.142753839493
10.111504793167
10.14910197258
Average: 10.76 Seconds

Would it make sense to either flush the memory before each test run or to do a file copy to null (or something) before the first run so as to ensure the same starting conditions run to run?
I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.

Have you encountered this with any tests outside of compress BZIP2?

**Michael** · 11 September 2016, 05:18 PM

Originally posted by bridgman View Post

Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

That would also work for games etc... where it's not so obvious which portions of which files will be accessed.

Yep, PTS has long had such support for throwing out specific runs, e.g. first or last, etc, among other safeguards as options inside the test profile XML meta-data. Just a matter of looking into this bzip2 compress issue as I haven't had such behavior before but can easily increase the run count and/or drop the first run once looking at it.

**dsmythies** · 11 September 2016, 05:27 PM

Originally posted by Michael View Post

Have you encountered this with any tests outside of compress BZIP2?

Yes, from my original post:

I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.

And actually, I have been doing a dummy run of ffmpeg for a couple of years now.

Yep, PTS has long had such support for throwing out specific runs, e.g. first or last, etc, among other safeguards as options inside the test profile XML meta-data.

I did not know that. Forgive my ignorance. I'll learn how and try it.

**dsmythies** · 17 September 2016, 12:04 PM

I have searched and searched, and not been able to figure out how get my test profile to throw out the first sample. Can someone help with that.

To minimize wasted time, I still think that for tests where file caching is possible a dummy copy to null as a pre-test would be best. Example 1 (file not cached yet):

Code:

doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null

real    0m5.463s
user    0m0.008s
sys     0m0.336s

Example 2 (file already cached):

Code:

doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null

real    0m0.121s
user    0m0.000s
sys     0m0.124s

By the way, here is the script I use to flush memory, for testing (run as sudo):

Code:

#! /bin/bash
free
sync
echo 3 > /proc/sys/vm/drop_caches
free

**Michael** · 17 September 2016, 01:06 PM

Originally posted by dsmythies View Post

I have searched and searched, and not been able to figure out how get my test profile to throw out the first sample. Can someone help with that.

To minimize wasted time, I still think that for tests where file caching is possible a dummy copy to null as a pre-test would be best. Example 1 (file not cached yet):

Code:

doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null

real 0m5.463s
user 0m0.008s
sys 0m0.336s

Example 2 (file already cached):

Code:

doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null

real 0m0.121s
user 0m0.000s
sys 0m0.124s

By the way, here is the script I use to flush memory, for testing (run as sudo):

Code:

#! /bin/bash
free
sync
echo 3 > /proc/sys/vm/drop_caches
free

If to <PhoronixTestSuite><TestInformation> you add <IgnoreRuns>1</IgnoreRuns> that should do the trick. In the test-definition.xml file for that test.

So basically from XML it's PhoronixTestSuite/TestInformation/IgnoreRuns.

If that works happy to add that upstream to the relevant test profiles.

**dsmythies** · 18 September 2016, 02:29 AM

Originally posted by Michael View Post

If to <PhoronixTestSuite><TestInformation> you add <IgnoreRuns>1</IgnoreRuns> that should do the trick. In the test-definition.xml file for that test.
So basically from XML it's PhoronixTestSuite/TestInformation/IgnoreRuns.
If that works happy to add that upstream to the relevant test profiles.

O.K. thank you.
I did that to 4 definitions:

/home/doug/.phoronix-test-suite/test-profiles/pts/compress-pbzip2-1.5.0/test-definition.xml
/home/doug/.phoronix-test-suite/test-profiles/pts/ffmpeg-2.5.0/test-definition.xml
/home/doug/.phoronix-test-suite/test-profiles/pts/unpack-linux-1.0.0/test-definition.xml
/home/doug/.phoronix-test-suite/test-profiles/pts/x264-2.0.0/test-definition.xml

And it worked fine.
The other two tests I mentioned in my first post, encode-flac and encode-mp3, were inconclusive, not always demonstrating this first test bias effect, so I didn't do the modification on those test-definition.xml files. I assume there are other tests that would benefit from this change, but these are all I have come across so far.

Announcement

Memory caching biasing test results

Memory caching biasing test results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment