Announcement

Collapse
No announcement yet.

Memory caching biasing test results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memory caching biasing test results

    I have noticed that some tests have a poorer result for the first test only, apparently due to some file becoming cached in memory after the first run of the test. As a result, subsequent runs of the test execute faster. This can lead to an incorrect bias in the results, and perhaps incorrect conclusions.

    Example:

    Code:
    Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count:    3
        Estimated Time To Completion: 1 Minute (11:16 PDT)
            Started Run 1 @ 11:16:18
            Started Run 2 @ 11:16:33
            Started Run 3 @ 11:16:44  [Std. Dev: 19.00%]
        Test Results:
            13.989195108414
            10.206043958664
            10.22295999527
        Average: 11.47 Seconds
    And again:

    Code:
    Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count:    3
        Estimated Time To Completion: 1 Minute (11:29 PDT)
            Started Run 1 @ 11:29:15
            Started Run 2 @ 11:29:26
            Started Run 3 @ 11:29:37  [Std. Dev: 0.14%]
        Test Results:
            10.145808935165
            10.117418050766
            10.133827924728
        Average: 10.13 Seconds
    And here is an example test run, where I might conclude that performance mode has some problem. Disclaimer: For increased dramatic effect, the times to run was set to 1.

    While it is true that I have backed off the "StandardDeviationThreshold" in my "user-config.xml" file, there is still a result bias, albeit significantly reduced, using the default threshold. It also takes longer for the test to complete (twice as long in this case, as it takes 3 more runs in addition to the default 3 times). Example:
    Code:
    Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count:    3
        Estimated Time To Completion: 1 Minute (11:23 PDT)
            Started Run 1 @ 11:23:04
            Started Run 2 @ 11:23:18
            Started Run 3 @ 11:23:29  [Std. Dev: 17.75%]
            Started Run 4 @ 11:23:41  [Std. Dev: 15.92%]
            Started Run 5 @ 11:23:52  [Std. Dev: 14.58%]
            Started Run 6 @ 11:24:03  [Std. Dev: 13.48%]
        Test Results:
            13.718590974808
            10.133217811584
            10.308124065399
            10.142753839493
            10.111504793167
            10.14910197258
        Average: 10.76 Seconds
    Would it make sense to either flush the memory before each test run or to do a file copy to null (or something) before the first run so as to ensure the same starting conditions run to run?
    I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.

  • #2
    Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

    That would also work for games etc... where it's not so obvious which portions of which files will be accessed.
    Last edited by bridgman; 11 September 2016, 04:26 PM.
    Test signature

    Comment


    • #3
      Originally posted by bridgman View Post
      Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

      That would also work for games etc... where it's not so obvious which portions of which files will be accessed.
      Good idea. My original thinking was that there might be less wasted time for the copy the file to null method verses a full discard of the first run method (~4 seconds Verses ~14 seconds for my example case).

      Here is an example using the default "user-config.xml" file.

      Comment


      • #4
        Originally posted by dsmythies View Post
        I have noticed that some tests have a poorer result for the first test only, apparently due to some file becoming cached in memory after the first run of the test. As a result, subsequent runs of the test execute faster. This can lead to an incorrect bias in the results, and perhaps incorrect conclusions.

        Example:

        Code:
        Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count: 3
        Estimated Time To Completion: 1 Minute (11:16 PDT)
        Started Run 1 @ 11:16:18
        Started Run 2 @ 11:16:33
        Started Run 3 @ 11:16:44 [Std. Dev: 19.00%]
        Test Results:
        13.989195108414
        10.206043958664
        10.22295999527
        Average: 11.47 Seconds
        And again:

        Code:
        Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count: 3
        Estimated Time To Completion: 1 Minute (11:29 PDT)
        Started Run 1 @ 11:29:15
        Started Run 2 @ 11:29:26
        Started Run 3 @ 11:29:37 [Std. Dev: 0.14%]
        Test Results:
        10.145808935165
        10.117418050766
        10.133827924728
        Average: 10.13 Seconds
        And here is an example test run, where I might conclude that performance mode has some problem. Disclaimer: For increased dramatic effect, the times to run was set to 1.

        While it is true that I have backed off the "StandardDeviationThreshold" in my "user-config.xml" file, there is still a result bias, albeit significantly reduced, using the default threshold. It also takes longer for the test to complete (twice as long in this case, as it takes 3 more runs in addition to the default 3 times). Example:
        Code:
        Parallel BZIP2 Compression 1.1.12:
        pts/compress-pbzip2-1.5.0
        Test 1 of 1
        Estimated Trial Run Count: 3
        Estimated Time To Completion: 1 Minute (11:23 PDT)
        Started Run 1 @ 11:23:04
        Started Run 2 @ 11:23:18
        Started Run 3 @ 11:23:29 [Std. Dev: 17.75%]
        Started Run 4 @ 11:23:41 [Std. Dev: 15.92%]
        Started Run 5 @ 11:23:52 [Std. Dev: 14.58%]
        Started Run 6 @ 11:24:03 [Std. Dev: 13.48%]
        Test Results:
        13.718590974808
        10.133217811584
        10.308124065399
        10.142753839493
        10.111504793167
        10.14910197258
        Average: 10.76 Seconds
        Would it make sense to either flush the memory before each test run or to do a file copy to null (or something) before the first run so as to ensure the same starting conditions run to run?
        I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.
        Have you encountered this with any tests outside of compress BZIP2?
        Michael Larabel
        https://www.michaellarabel.com/

        Comment


        • #5
          Originally posted by bridgman View Post
          Alternatively add 1 to the requested # of runs and throw away the results from the first one ?

          That would also work for games etc... where it's not so obvious which portions of which files will be accessed.
          Yep, PTS has long had such support for throwing out specific runs, e.g. first or last, etc, among other safeguards as options inside the test profile XML meta-data. Just a matter of looking into this bzip2 compress issue as I haven't had such behavior before but can easily increase the run count and/or drop the first run once looking at it.
          Michael Larabel
          https://www.michaellarabel.com/

          Comment


          • #6
            Originally posted by Michael View Post

            Have you encountered this with any tests outside of compress BZIP2?
            Yes, from my original post:
            I have used the bzip2 test as my example, but have also observed the same thing with, ffmpeg, unpack-linux, x264, encode-flac, encode-mp3.
            And actually, I have been doing a dummy run of ffmpeg for a couple of years now.

            Yep, PTS has long had such support for throwing out specific runs, e.g. first or last, etc, among other safeguards as options inside the test profile XML meta-data.
            I did not know that. Forgive my ignorance. I'll learn how and try it.





            Comment


            • #7
              I have searched and searched, and not been able to figure out how get my test profile to throw out the first sample. Can someone help with that.

              To minimize wasted time, I still think that for tests where file caching is possible a dummy copy to null as a pre-test would be best. Example 1 (file not cached yet):
              Code:
              doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null
              
              real    0m5.463s
              user    0m0.008s
              sys     0m0.336s
              Example 2 (file already cached):
              Code:
              doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null
              
              real    0m0.121s
              user    0m0.000s
              sys     0m0.124s
              By the way, here is the script I use to flush memory, for testing (run as sudo):
              Code:
              #! /bin/bash
              free
              sync
              echo 3 > /proc/sys/vm/drop_caches
              free

              Comment


              • #8
                Originally posted by dsmythies View Post
                I have searched and searched, and not been able to figure out how get my test profile to throw out the first sample. Can someone help with that.

                To minimize wasted time, I still think that for tests where file caching is possible a dummy copy to null as a pre-test would be best. Example 1 (file not cached yet):
                Code:
                doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null
                
                real 0m5.463s
                user 0m0.008s
                sys 0m0.336s
                Example 2 (file already cached):
                Code:
                doug@s15:~/.phoronix-test-suite/installed-tests/pts/compress-pbzip2-1.5.0$ time cp linux-4.3.tar /dev/null
                
                real 0m0.121s
                user 0m0.000s
                sys 0m0.124s
                By the way, here is the script I use to flush memory, for testing (run as sudo):
                Code:
                #! /bin/bash
                free
                sync
                echo 3 > /proc/sys/vm/drop_caches
                free
                If to <PhoronixTestSuite><TestInformation> you add <IgnoreRuns>1</IgnoreRuns> that should do the trick. In the test-definition.xml file for that test.

                So basically from XML it's PhoronixTestSuite/TestInformation/IgnoreRuns.

                If that works happy to add that upstream to the relevant test profiles.
                Michael Larabel
                https://www.michaellarabel.com/

                Comment


                • #9
                  Originally posted by Michael View Post
                  If to <PhoronixTestSuite><TestInformation> you add <IgnoreRuns>1</IgnoreRuns> that should do the trick. In the test-definition.xml file for that test.
                  So basically from XML it's PhoronixTestSuite/TestInformation/IgnoreRuns.
                  If that works happy to add that upstream to the relevant test profiles.
                  O.K. thank you.
                  I did that to 4 definitions:

                  /home/doug/.phoronix-test-suite/test-profiles/pts/compress-pbzip2-1.5.0/test-definition.xml
                  /home/doug/.phoronix-test-suite/test-profiles/pts/ffmpeg-2.5.0/test-definition.xml
                  /home/doug/.phoronix-test-suite/test-profiles/pts/unpack-linux-1.0.0/test-definition.xml
                  /home/doug/.phoronix-test-suite/test-profiles/pts/x264-2.0.0/test-definition.xml

                  And it worked fine.
                  The other two tests I mentioned in my first post, encode-flac and encode-mp3, were inconclusive, not always demonstrating this first test bias effect, so I didn't do the modification on those test-definition.xml files. I assume there are other tests that would benefit from this change, but these are all I have come across so far.


                  Comment

                  Working...
                  X