Announcement

**Michael** · 25 August 2020, 10:03 AM

Are you running PTS 9.8? Otherwise make sure you are on the latest.

What else does it show in the report? I.e. any indications of say it only ran one test each time or any other combinations that may provide some hints at what is happening. If you try using TOTAL_LOOP_TIME=10 does it work for 10 minutes? etc.

**pheider** · 25 August 2020, 10:40 AM

Originally posted by Michael View Post

Are you running PTS 9.8? Otherwise make sure you are on the latest.

What else does it show in the report? I.e. any indications of say it only ran one test each time or any other combinations that may provide some hints at what is happening. If you try using TOTAL_LOOP_TIME=10 does it work for 10 minutes? etc.

Michael, yes this is with 9.8.0. Nothing stands out to me in the report that would point to the issue. I tried attaching a copy of the report, but I don't have permissions. Using the default loop time of 10 minutes works just fine. Only recently when I increased it to hours instead of minutes did I see this issue. Here is some of the output

TESTS EXECUTED TIMES CALLED
pts/iozone-1.9.5: 54
pts/iperf-1.1.0: 21
pts/stress-ng-1.3.0: 10
pts/stressapptest-1.0.1: 27

.....
.....
###### STRESS RUN INTERIM REPORT ####
AUGUST 24 21:52 EDT
START TIME: August 24 20:52 EDT
ELAPSED TIME: 1 Hour, 1 Second
TIME REMAINING: 4 Hours, 59 Minutes, 59 Seconds
.....
......
TESTS EXECUTED TIMES CALLED
pts/iozone-1.9.5: 102
pts/iperf-1.1.0: 40
pts/stress-ng-1.3.0: 24
pts/stressapptest-1.0.1: 54

.......
......
TESTS EXECUTED TIMES CALLED
pts/iozone-1.9.5: 265
pts/iperf-1.1.0: 118
pts/stress-ng-1.3.0: 76
pts/stressapptest-1.0.1: 160

**Michael** · 25 August 2020, 10:42 AM

Originally posted by pheider View Post

Michael, yes this is with 9.8.0. Nothing stands out to me in the report that would point to the issue. I tried attaching a copy of the report, but I don't have permissions. Using the default loop time of 10 minutes works just fine. Only recently when I increased it to hours instead of minutes did I see this issue. Here is some of the output

Hmmm by chance have you tried (or can try) a value of like 59 minutes and then 61 minutes to see if the hour threshold is what's breaking it? Though I do know users who use stress-run on 9.8 for 40+ hours with no issue, so this may not be it.

**pheider** · 26 August 2020, 07:55 PM

Originally posted by Michael View Post

Hmmm by chance have you tried (or can try) a value of like 59 minutes and then 61 minutes to see if the hour threshold is what's breaking it? Though I do know users who use stress-run on 9.8 for 40+ hours with no issue, so this may not be it.

Sorry, it took me awhile to test the various scenarios. These were the outcomes.

61 minutes = ran for 61 minutes
120 minutes = ran for 120 minutes
180 minutes = ran for 180 minutes
240 minutes = ran for ~180 minutes
300 minutes = ran for ~180 minutes

So there seems to be something going on around the 180 minute threshold. I can try some scenarios between 180 and 240 minutes if that helps.

**Michael** · 26 August 2020, 08:23 PM

Originally posted by pheider View Post

Sorry, it took me awhile to test the various scenarios. These were the outcomes.

61 minutes = ran for 61 minutes
120 minutes = ran for 120 minutes
180 minutes = ran for 180 minutes
240 minutes = ran for ~180 minutes
300 minutes = ran for ~180 minutes

So there seems to be something going on around the 180 minute threshold. I can try some scenarios between 180 and 240 minutes if that helps.

Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.

**pheider** · 26 August 2020, 09:33 PM

Originally posted by Michael View Post

Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.

Sure I'll give it a shot. You may be right on it being system specific. I have run my tests on both a VM and bare metal, but both are imaged with the same kickstart process, so essentially the same operating environment.

**pheider** · 27 August 2020, 11:10 AM

Originally posted by Michael View Post

Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.

I configured the loop time for 200 minutes, and it ended after 177 minutes. This was in my system log at the time.

2020 Aug 27 00:21:47 mgi user info stress-ng: invoked with './stress-n' by user 0
2020 Aug 27 00:21:47 localhost user info stress-ng: system: 'mgi' Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64
2020 Aug 27 00:21:47 localhost user info stress-ng: memory (MB): total 3865.87, free 3569.15, shared 0.00, buffer 17.24, swap 3969.00, free swap 3967.46
2020 Aug 27 00:23:11 localhost user info stress-ng: invoked with './stress-n' by user 0
2020 Aug 27 00:23:11 localhost user info stress-ng: system: 'mgi' Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64
2020 Aug 27 00:23:11 localhost user info stress-ng: memory (MB): total 3865.87, free 3511.62, shared 0.00, buffer 16.87, swap 3969.00, free swap 3967.46

I can try to reproduce the issue on a vanilla CentOS installation, to make sure my custom image isn't causing the loop time issue.

**pheider** · 27 August 2020, 06:09 PM

I re-ran the 200 minute test on a plain CentOS7 VM. It completed after only 180 minutes. These are the tests in my test suite.

- stress-ng
- stressapptest
- iozone
- iperf

**pheider** · 28 August 2020, 10:19 AM

Originally posted by Michael View Post

Maybe try like 200 minutes? Though as there is no logic really at all in the stress code besides calculating minutes to run, so not clear why there would be issues at ~180 minutes versus any other value.... Can you check your dmesg after one of the "~180" minute runs to see if there are any errors indicated or anything? Will do a run on my end as thinking you may be running into something more system specific.

Hey Michael, I just wanted to check back to see if you had a chance to do a similar test run on your end. Thanks.

Announcement

stress run TOTAL_LOOP_TIME doesn't complete full loop

stress run TOTAL_LOOP_TIME doesn't complete full loop

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment