Announcement

Collapse
No announcement yet.

stress run TOTAL_LOOP_TIME doesn't complete full loop

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • stress run TOTAL_LOOP_TIME doesn't complete full loop

    I am attempting to run a stress run for 8 hours, however it is only running for a part of the time set with TOTAL_LOOP_TIME and then stops (ran from 16:26-19:23). I have tried other times as well and have the same issue. Here is example output from a 6 hour test.

    2020-08-24 20:52:16:
    ###### STRESS RUN INTERIM REPORT ####
    AUGUST 24 21:22 EDT
    START TIME: August 24 20:52 EDT
    ELAPSED TIME: 30 Minutes
    TIME REMAINING: 5 Hours, 30 Minutes

    This is the time at the end of the report, which indicates it only ran for about 3 hours.

    This file was automatically generated via the Phoronix Test Suite benchmarking software on Monday, 24 August 2020
    23:51.


    So it's obvious that pts is aware of the TOTAL_LOOP_TIME=360 from the beginning of the report, but then it stops early. Does anyone have any idea why this might happen? Thanks.

  • #2
    Are you running PTS 9.8? Otherwise make sure you are on the latest.

    What else does it show in the report? I.e. any indications of say it only ran one test each time or any other combinations that may provide some hints at what is happening. If you try using TOTAL_LOOP_TIME=10 does it work for 10 minutes? etc.
    Michael Larabel
    https://www.michaellarabel.com/

    Comment


    • #3
      Originally posted by Michael View Post
      Are you running PTS 9.8? Otherwise make sure you are on the latest.

      What else does it show in the report? I.e. any indications of say it only ran one test each time or any other combinations that may provide some hints at what is happening. If you try using TOTAL_LOOP_TIME=10 does it work for 10 minutes? etc.
      Michael, yes this is with 9.8.0. Nothing stands out to me in the report that would point to the issue. I tried attaching a copy of the report, but I don't have permissions. Using the default loop time of 10 minutes works just fine. Only recently when I increased it to hours instead of minutes did I see this issue. Here is some of the output

      TESTS EXECUTED TIMES CALLED
      pts/iozone-1.9.5: 54
      pts/iperf-1.1.0: 21
      pts/stress-ng-1.3.0: 10
      pts/stressapptest-1.0.1: 27

      .....
      .....
      ###### STRESS RUN INTERIM REPORT ####
      AUGUST 24 21:52 EDT
      START TIME: August 24 20:52 EDT
      ELAPSED TIME: 1 Hour, 1 Second
      TIME REMAINING: 4 Hours, 59 Minutes, 59 Seconds
      .....
      ......
      TESTS EXECUTED TIMES CALLED
      pts/iozone-1.9.5: 102
      pts/iperf-1.1.0: 40
      pts/stress-ng-1.3.0: 24
      pts/stressapptest-1.0.1: 54

      .......
      ......
      TESTS EXECUTED TIMES CALLED
      pts/iozone-1.9.5: 265
      pts/iperf-1.1.0: 118
      pts/stress-ng-1.3.0: 76
      pts/stressapptest-1.0.1: 160


      Comment


      • #4
        Originally posted by pheider View Post

        Michael, yes this is with 9.8.0. Nothing stands out to me in the report that would point to the issue. I tried attaching a copy of the report, but I don't have permissions. Using the default loop time of 10 minutes works just fine. Only recently when I increased it to hours instead of minutes did I see this issue. Here is some of the output




        Hmmm by chance have you tried (or can try) a value of like 59 minutes and then 61 minutes to see if the hour threshold is what's breaking it? Though I do know users who use stress-run on 9.8 for 40+ hours with no issue, so this may not be it.
        Michael Larabel
        https://www.michaellarabel.com/

        Comment


        • #5
          Originally posted by Michael View Post

          Hmmm by chance have you tried (or can try) a value of like 59 minutes and then 61 minutes to see if the hour threshold is what's breaking it? Though I do know users who use stress-run on 9.8 for 40+ hours with no issue, so this may not be it.
          Sorry, it took me awhile to test the various scenarios. These were the outcomes.

          61 minutes = ran for 61 minutes
          120 minutes = ran for 120 minutes
          180 minutes = ran for 180 minutes
          240 minutes = ran for ~180 minutes
          300 minutes = ran for ~180 minutes

          So there seems to be something going on around the 180 minute threshold. I can try some scenarios between 180 and 240 minutes if that helps.

          Comment


          • #6
            Originally posted by pheider View Post

            Sorry, it took me awhile to test the various scenarios. These were the outcomes.

            61 minutes = ran for 61 minutes
            120 minutes = ran for 120 minutes
            180 minutes = ran for 180 minutes
            240 minutes = ran for ~180 minutes
            300 minutes = ran for ~180 minutes

            So there seems to be something going on around the 180 minute threshold. I can try some scenarios between 180 and 240 minutes if that helps.
            Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.
            Michael Larabel
            https://www.michaellarabel.com/

            Comment


            • #7
              Originally posted by Michael View Post

              Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.
              Sure I'll give it a shot. You may be right on it being system specific. I have run my tests on both a VM and bare metal, but both are imaged with the same kickstart process, so essentially the same operating environment.

              Comment


              • #8
                Originally posted by Michael View Post

                Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.
                I configured the loop time for 200 minutes, and it ended after 177 minutes. This was in my system log at the time.

                2020 Aug 27 00:21:47 mgi user info stress-ng: invoked with './stress-n' by user 0
                2020 Aug 27 00:21:47 localhost user info stress-ng: system: 'mgi' Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64
                2020 Aug 27 00:21:47 localhost user info stress-ng: memory (MB): total 3865.87, free 3569.15, shared 0.00, buffer 17.24, swap 3969.00, free swap 3967.46
                2020 Aug 27 00:23:11 localhost user info stress-ng: invoked with './stress-n' by user 0
                2020 Aug 27 00:23:11 localhost user info stress-ng: system: 'mgi' Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64
                2020 Aug 27 00:23:11 localhost user info stress-ng: memory (MB): total 3865.87, free 3511.62, shared 0.00, buffer 16.87, swap 3969.00, free swap 3967.46
                I can try to reproduce the issue on a vanilla CentOS installation, to make sure my custom image isn't causing the loop time issue.

                Comment


                • #9
                  I re-ran the 200 minute test on a plain CentOS7 VM. It completed after only 180 minutes. These are the tests in my test suite.

                  - stress-ng
                  - stressapptest
                  - iozone
                  - iperf

                  Comment


                  • #10
                    Originally posted by Michael View Post

                    Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.
                    Hey Michael, I just wanted to check back to see if you had a chance to do a similar test run on your end. Thanks.

                    Comment

                    Working...
                    X