Summary: Linux sees 3 temperature fields, and one of them (from an unknown source) is 86C, and I suspect that it's been causing my system to reboot. Is there any way to find out what component is that hot?
Longer version:
I am debugging a new system consisting of an AMD Phenom 9500, Gigabyte MA790FX-DS5 motherboard, and Corsair XMS2 DDR2-800 2GB RAM, and the system is unstable.
Unstable here means that when I run a compute-intensive 4-thread job (and I intend to do this a lot for work), the machine will work away on it for 10-25 minutes and then spontaneously restart.
The curious thing is that Linux reports three temperature readings from my motherboard, and I don't know for certain what they represent. Idling with stock BIOS settings they are 43C, 45C, 86C. Running a 4-thread load for a few minutes they are 45, 57, 86. At the moment the machine crashes they are 45, 61, 86. When I look in the BIOS under System Monitor, I see two reported temperatures: case temp (which reads ~43C) and CPU temp (which reads ~47). So, I suspect that Linux is reporting case temp, CPU temp, and a third mystery temperature.
When I point a desktop (human) fan at the open case, the temperatures dropped to 37, 39, 84C idling, and then to 32, 55, 82C after running a 4-thread job for a few minutes. But, the middle temp slowly climbed to 62/63C just before the machine cut out and restarted.
If you have a Phenom, what temperatures are you seeing idle and under load? If you have a Gigabyte MA790FX-DS5, what temperatures are you seeing?
Extra information:
0) The GPU temperature is also reported, and is specifically labelled as such. It runs at 51 or 52C most of the time. I did not run intense graphics tests.
1) With stock BIOS settings/speeds, I can run a single-thread job for hours and it never heats up over 55C, and never crashes.
2) With the whole machine de-rated to 1400 MHz, it can run the full-load 4-thread job and not crash. But, of course, my performance sucks. I didn't pay for a 2.2 GHz machine to run it at 1.4 GHz.
3) With the TLB workaround disabled, the machine runs slightly hotter (a few degrees C), and thus crashes a little earlier. It crashes very reliably (85% of the time I do this).
4) I ran MemTest86+ for a half-hour, and no errors showed up. I then ran it all night, and when I woke up, it had rebooted and was showing me a screen that said "PCI Parity Error!"
5) All parts were from Newegg, and I have only a few more days to return the processor and three weeks to return the motherboard.
Thank you for reading.
Longer version:
I am debugging a new system consisting of an AMD Phenom 9500, Gigabyte MA790FX-DS5 motherboard, and Corsair XMS2 DDR2-800 2GB RAM, and the system is unstable.
Unstable here means that when I run a compute-intensive 4-thread job (and I intend to do this a lot for work), the machine will work away on it for 10-25 minutes and then spontaneously restart.
The curious thing is that Linux reports three temperature readings from my motherboard, and I don't know for certain what they represent. Idling with stock BIOS settings they are 43C, 45C, 86C. Running a 4-thread load for a few minutes they are 45, 57, 86. At the moment the machine crashes they are 45, 61, 86. When I look in the BIOS under System Monitor, I see two reported temperatures: case temp (which reads ~43C) and CPU temp (which reads ~47). So, I suspect that Linux is reporting case temp, CPU temp, and a third mystery temperature.
When I point a desktop (human) fan at the open case, the temperatures dropped to 37, 39, 84C idling, and then to 32, 55, 82C after running a 4-thread job for a few minutes. But, the middle temp slowly climbed to 62/63C just before the machine cut out and restarted.
If you have a Phenom, what temperatures are you seeing idle and under load? If you have a Gigabyte MA790FX-DS5, what temperatures are you seeing?
Extra information:
0) The GPU temperature is also reported, and is specifically labelled as such. It runs at 51 or 52C most of the time. I did not run intense graphics tests.
1) With stock BIOS settings/speeds, I can run a single-thread job for hours and it never heats up over 55C, and never crashes.
2) With the whole machine de-rated to 1400 MHz, it can run the full-load 4-thread job and not crash. But, of course, my performance sucks. I didn't pay for a 2.2 GHz machine to run it at 1.4 GHz.
3) With the TLB workaround disabled, the machine runs slightly hotter (a few degrees C), and thus crashes a little earlier. It crashes very reliably (85% of the time I do this).
4) I ran MemTest86+ for a half-hour, and no errors showed up. I then ran it all night, and when I woke up, it had rebooted and was showing me a screen that said "PCI Parity Error!"
5) All parts were from Newegg, and I have only a few more days to return the processor and three weeks to return the motherboard.
Thank you for reading.
Comment