I have noticed that some tests have a poorer result for the first test only, apparently due to some file becoming cached in memory after the first run of the test. As a result, subsequent runs of the test execute faster. This can lead to an incorrect bias in the results, and perhaps incorrect conclusions.
Example:
And again:
And here is an example test run, where I might conclude that performance mode has some problem. Disclaimer: For increased dramatic effect, the times to run was set to 1.
While it is true that I have backed off the "StandardDeviationThreshold" in my "user-config.xml" file, there is still a result bias, albeit significantly reduced, using the default threshold. It also takes longer for the test to complete (twice as long in this case, as it takes 3 more runs in addition to the default 3 times). Example:
Would it make sense to either flush the memory before each test run or to do a file copy to null (or something) before the first run so as to ensure the same starting conditions run to run?
I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.
Example:
Code:
Parallel BZIP2 Compression 1.1.12: pts/compress-pbzip2-1.5.0 Test 1 of 1 Estimated Trial Run Count: 3 Estimated Time To Completion: 1 Minute (11:16 PDT) Started Run 1 @ 11:16:18 Started Run 2 @ 11:16:33 Started Run 3 @ 11:16:44 [Std. Dev: 19.00%] Test Results: 13.989195108414 10.206043958664 10.22295999527 Average: 11.47 Seconds
Code:
Parallel BZIP2 Compression 1.1.12: pts/compress-pbzip2-1.5.0 Test 1 of 1 Estimated Trial Run Count: 3 Estimated Time To Completion: 1 Minute (11:29 PDT) Started Run 1 @ 11:29:15 Started Run 2 @ 11:29:26 Started Run 3 @ 11:29:37 [Std. Dev: 0.14%] Test Results: 10.145808935165 10.117418050766 10.133827924728 Average: 10.13 Seconds
While it is true that I have backed off the "StandardDeviationThreshold" in my "user-config.xml" file, there is still a result bias, albeit significantly reduced, using the default threshold. It also takes longer for the test to complete (twice as long in this case, as it takes 3 more runs in addition to the default 3 times). Example:
Code:
Parallel BZIP2 Compression 1.1.12: pts/compress-pbzip2-1.5.0 Test 1 of 1 Estimated Trial Run Count: 3 Estimated Time To Completion: 1 Minute (11:23 PDT) Started Run 1 @ 11:23:04 Started Run 2 @ 11:23:18 Started Run 3 @ 11:23:29 [Std. Dev: 17.75%] Started Run 4 @ 11:23:41 [Std. Dev: 15.92%] Started Run 5 @ 11:23:52 [Std. Dev: 14.58%] Started Run 6 @ 11:24:03 [Std. Dev: 13.48%] Test Results: 13.718590974808 10.133217811584 10.308124065399 10.142753839493 10.111504793167 10.14910197258 Average: 10.76 Seconds
I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.
Comment