Announcement

Collapse
No announcement yet.

Which temperature sensors do mobos have? (Gigabyte MA790FX-DS5)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which temperature sensors do mobos have? (Gigabyte MA790FX-DS5)

    Summary: Linux sees 3 temperature fields, and one of them (from an unknown source) is 86C, and I suspect that it's been causing my system to reboot. Is there any way to find out what component is that hot?

    Longer version:

    I am debugging a new system consisting of an AMD Phenom 9500, Gigabyte MA790FX-DS5 motherboard, and Corsair XMS2 DDR2-800 2GB RAM, and the system is unstable.

    Unstable here means that when I run a compute-intensive 4-thread job (and I intend to do this a lot for work), the machine will work away on it for 10-25 minutes and then spontaneously restart.

    The curious thing is that Linux reports three temperature readings from my motherboard, and I don't know for certain what they represent. Idling with stock BIOS settings they are 43C, 45C, 86C. Running a 4-thread load for a few minutes they are 45, 57, 86. At the moment the machine crashes they are 45, 61, 86. When I look in the BIOS under System Monitor, I see two reported temperatures: case temp (which reads ~43C) and CPU temp (which reads ~47). So, I suspect that Linux is reporting case temp, CPU temp, and a third mystery temperature.

    When I point a desktop (human) fan at the open case, the temperatures dropped to 37, 39, 84C idling, and then to 32, 55, 82C after running a 4-thread job for a few minutes. But, the middle temp slowly climbed to 62/63C just before the machine cut out and restarted.

    If you have a Phenom, what temperatures are you seeing idle and under load? If you have a Gigabyte MA790FX-DS5, what temperatures are you seeing?

    Extra information:

    0) The GPU temperature is also reported, and is specifically labelled as such. It runs at 51 or 52C most of the time. I did not run intense graphics tests.

    1) With stock BIOS settings/speeds, I can run a single-thread job for hours and it never heats up over 55C, and never crashes.

    2) With the whole machine de-rated to 1400 MHz, it can run the full-load 4-thread job and not crash. But, of course, my performance sucks. I didn't pay for a 2.2 GHz machine to run it at 1.4 GHz.

    3) With the TLB workaround disabled, the machine runs slightly hotter (a few degrees C), and thus crashes a little earlier. It crashes very reliably (85% of the time I do this).

    4) I ran MemTest86+ for a half-hour, and no errors showed up. I then ran it all night, and when I woke up, it had rebooted and was showing me a screen that said "PCI Parity Error!"

    5) All parts were from Newegg, and I have only a few more days to return the processor and three weeks to return the motherboard.

    Thank you for reading.

  • #2
    I haven't got one of those processors (I moved over to the 'dark side' from AMD a few weeks ago) but somewhere in the BIOS there may be a setting that tells the system to slow down/reboot when the CPU temperature gets to a certain level. Try resetting this value to be slightly higher and see what happens.

    Also, I would check your cooling solution. In particular, your heat sink and case cooling fans. Are they working/pushing air in the right direction?

    Andy

    P.S. How many fans and what cooler are you using?

    Comment


    • #3
      Originally posted by Shielder View Post
      I haven't got one of those processors (I moved over to the 'dark side' from AMD a few weeks ago) but somewhere in the BIOS there may be a setting that tells the system to slow down/reboot when the CPU temperature gets to a certain level. Try resetting this value to be slightly higher and see what happens.

      Also, I would check your cooling solution. In particular, your heat sink and case cooling fans. Are they working/pushing air in the right direction?

      Andy

      P.S. How many fans and what cooler are you using?
      Hi, thanks for the quick reply!

      The BIOS has an entry for AMD's "Cool and Quiet", which throttles CPUs when they are not being used as heavily. But since I intend to use the CPUs heavily, it doesn't do much.

      There's also an entry that will sound an alarm when the CPU temperature hits a defined level. I have it set at 80C and it doesn't trigger, so apparently the 86C reading is not coming from the CPU.

      The heat sink is the stock sink and the fan is blowing air into the heat sink. With the BIOS "CPU Smart fan control" setting off, the CPU fan runs at 3000 RPM. With the setting on, it'll start at 2600, but then increase to 3500 as the 4-thread job progresses. It'll then spontaneously reboot.

      I've been running with an open case. My case fan blows out of the case from the back, and the PSU fan is on the inside of the case, and blows into the PSU.

      I presumed that a regular CPU running at stock CPU speeds with the stock CPU cooler should not overheat enough to spontaneously restart the computer. If it has to throttle down to run a 4-thread job, then it's not worth having.

      Are these temperature sensors reliable? Accurate?

      Comment


      • #4
        Are you sure that the PSU fan is blowing *into* the case? Usually, the PSU fan blows air out of the case.

        Have you run memtest (either 86 or 86+) to check your memory?

        What sort of errors (if any) are you getting in the kernel?

        Just a thought, but what speed is the mobo running at? I can't remember the exact terminology (HTT I think) but is it running at 200MHz? If not, and your memory is running at 800MHz (effective) then the memory multiplier may be your problem.

        As a quick and dirty test of the temperature of the heatsinks, carefully touch each one in turn when you are running your jobs and see whether any of them are hot to touch or not. Have you checked whether the CPU heatsink is correctly seated?

        Sometimes the stock heatsinks are not good enough, but I would be surprised in this case.

        Andy

        Edit: Just thought, what make and model PSU have you got? I'm just wondering if there are any connections to the PSU on the mobo (did you build the system yourself?) This could be the other temperature being reported. Is the PSU fan working correctly and if so, what does the air being exhausted out of the back of the PSU feel like?
        Last edited by Shielder; 07 April 2008, 09:04 AM. Reason: PSU questions

        Comment


        • #5
          Originally posted by technolope View Post
          Summary: Linux sees 3 temperature fields, and one of them (from an unknown source) is 86C, and I suspect that it's been causing my system to reboot. Is there any way to find out what component is that hot?
          Since it doesn't appear in the BIOS and the value is obviously wrong, this temperature corresponds to an unused input on the monitor chip (i.e. the chip can monitor 3 temps, and lm-sensors detects that, but only 2 are actually used). Edit your sensors.conf and add the relevant "ignore inX" line and forget about it.

          Your description does sound like overheating, although it might also be a MB/memory problem.

          4) I ran MemTest86+ for a half-hour, and no errors showed up. I then ran it all night, and when I woke up, it had rebooted and was showing me a screen that said "PCI Parity Error!"
          This could point out to a memory issue. You should re-run the test while keeping an eye on it, and see if any errors show up at all. Remember that memory problems don't necessarily mean that there is something wrong with the DIMMS, many times it's just an incompatibility between the MB and specific chips.

          In your place, I'd try the following:
          1. Different memory brand (quite cheap these days), try first with one DIMM, then with two and see if the behaviour remains. If not, keep the new DIMMS and sell the other ones, otherwise goto 2.
          2. The memory is fine, but most likely the processor is overheating. Keep in mind that the Phenom B2 line has been rushed into production and is mostly overclocked (sorry, I run 100% AMD here but that's the truth). Try either underclocking slightly (not all the way to 1.4 Ghz, but more like 2.0 Ghz) or buy a heavy-duty cooler that can keep the temps down.
          3. Sell everything and buy a different brand (I know, choice is quite limited).

          As per you other questions -- no, BIOS temperature readings are not that reliable and can be several degrees off. The best readings come from the on-chip sensor, the one you get with the k8temp module (they will show up separately in the sensors output, likely with a different temperature from the one reported by the BIOS).

          Anyway, you're on the cutting edge so... Good luck!
          Mihnea

          Comment


          • #6
            Originally posted by mgc8 View Post
            Since it doesn't appear in the BIOS and the value is obviously wrong, this temperature corresponds to an unused input on the monitor chip (i.e. the chip can monitor 3 temps, and lm-sensors detects that, but only 2 are actually used). Edit your sensors.conf and add the relevant "ignore inX" line and forget about it.
            So that high temp wouldn't be the Northbridge? I have read recently that that chip can withstand higher temperatures than a CPU. In that case, 86C doesn't sound too wrong.

            Originally posted by mgc8 View Post
            Your description does sound like overheating, although it might also be a MB/memory problem.

            This could point out to a memory issue. You should re-run the test while keeping an eye on it, and see if any errors show up at all. Remember that memory problems don't necessarily mean that there is something wrong with the DIMMS, many times it's just an incompatibility between the MB and specific chips.
            This is excellent advice. I wanted to upgrade to 8 GB anyways, so I'll get a second pair of 2GB sticks and test those as well. I'll probably go with Crucial---that's what I've used in all of my other computers, and have never had any problems with.

            Originally posted by mgc8 View Post
            Anyway, you're on the cutting edge so... Good luck!
            Mihnea
            Thanks!

            Comment


            • #7
              Wow, you folks on this forum are not only generous, you are also extremely helpful. Let me see if I can answer your questions.

              Originally posted by Shielder View Post
              Are you sure that the PSU fan is blowing *into* the case? Usually, the PSU fan blows air out of the case.
              The PSU, a Rosewill 950W (package deal with the mobo and CPU), has a large fan on the bottom of the PSU (tower case set-up) that sucks case air and pushes it through the PSU and out the back of the case. It's a slightly unusual setup, and I could have been more specific.

              Originally posted by Shielder View Post
              Have you run memtest (either 86 or 86+) to check your memory?
              Yes. I've run it several times. The times that I ran it for 1/2 hour and 3 hours, there were no errors. Then I ran it all night and woke up to a boot-up error message (not a Memtest86 error message).

              Originally posted by Shielder View Post
              What sort of errors (if any) are you getting in the kernel?
              I don't quite how to check this. The computer seems to run well right up until it spontaneously reboots. Where do I check for kernel errors?

              Originally posted by Shielder View Post
              Just a thought, but what speed is the mobo running at? I can't remember the exact terminology (HTT I think) but is it running at 200MHz? If not, and your memory is running at 800MHz (effective) then the memory multiplier may be your problem.
              I am pretty sure that was set properly. Yes, 200 MHz and 800 MHz, and the multiplier was 4x. (Or was it 100 and 400?)

              Originally posted by Shielder View Post
              As a quick and dirty test of the temperature of the heatsinks, carefully touch each one in turn when you are running your jobs and see whether any of them are hot to touch or not. Have you checked whether the CPU heatsink is correctly seated?
              It sure looks and feels properly-seated. When I took it apart to return it, it took a lot of force to separate the sink from the CPU, and the thermal compound was evenly spread over the chip. The CPU was pretty warm when under load, but not as hot as the northbridge (with attached heat tubes and fins)

              Originally posted by Shielder View Post
              Sometimes the stock heatsinks are not good enough, but I would be surprised in this case.
              I have been going on the assumption that AMD wouldn't sell a CPU combo if the heat sink was unable to cool the CPU, but maybe it's a brave new world?

              Originally posted by Shielder View Post
              Edit: Just thought, what make and model PSU have you got? I'm just wondering if there are any connections to the PSU on the mobo (did you build the system yourself?) This could be the other temperature being reported. Is the PSU fan working correctly and if so, what does the air being exhausted out of the back of the PSU feel like?
              I think I eliminated the PSU from fault---there are no temp sensors on the PSU (a Rosewill 950W monster), and the machine behaved identically (including the odd 85/86C temp reading) when I used an older ThermalTake 430W PSU.

              Thanks for all of your help, Andy! I'll see how the new mobo and CPU behave once I get them.

              Comment


              • #8
                Originally posted by technolope View Post
                So that high temp wouldn't be the Northbridge? I have read recently that that chip can withstand higher temperatures than a CPU. In that case, 86C doesn't sound too wrong.
                It is very unlikely. If it were, the BIOS would have reported it. 86C is a high temperature for any chip, especially recent ones (I've seen Prescotts running that hot, but they were throttling anyway). It is not unusual to have unused inputs on monitoring chips, if you scan through the sensors3.conf you'll find plenty of examples.

                This is excellent advice. I wanted to upgrade to 8 GB anyways, so I'll get a second pair of 2GB sticks and test those as well. I'll probably go with Crucial---that's what I've used in all of my other computers, and have never had any problems with.
                Crucial is certainly a recommended brand, of course for best results you should try to find one of the chips mentioned in the MB support list here:
                http://www.gigabyte.com.tw/FileList/...a790fx-ds5.pdf

                Although after reading your last post where you say you ran Memtest86 for 3 hours without errors, I would be more inclined to suspect processor overheating. Another thing to try -- disable all "overheating protection" options in the BIOS, and maybe update the BIOS to the latest version (if not already), maybe it's a bug in the BIOS that automatically reboots the system before it reaches the threshold...

                So I would first go with a big honking cooler, something with a large copper heatsink and a large quiet fan -- don't trust the boxed one, as I said the Phenom processors were rushed to the market, it's possible the coolers aren't quite up to running 4 cores at 100% -- plus, they usually only test in windoz while Linux has a habit of pushing the hardware a little further, triggering bugs that don't appear otherwise.

                Regards,
                Mihnea

                Comment


                • #9
                  Originally posted by mgc8 View Post
                  It is very unlikely. If it were, the BIOS would have reported it. 86C is a high temperature for any chip, especially recent ones (I've seen Prescotts running that hot, but they were throttling anyway). It is not unusual to have unused inputs on monitoring chips, if you scan through the sensors3.conf you'll find plenty of examples.

                  Crucial is certainly a recommended brand, of course for best results you should try to find one of the chips mentioned in the MB support list here:
                  http://www.gigabyte.com.tw/FileList/...a790fx-ds5.pdf

                  Although after reading your last post where you say you ran Memtest86 for 3 hours without errors, I would be more inclined to suspect processor overheating. Another thing to try -- disable all "overheating protection" options in the BIOS, and maybe update the BIOS to the latest version (if not already), maybe it's a bug in the BIOS that automatically reboots the system before it reaches the threshold...

                  So I would first go with a big honking cooler, something with a large copper heatsink and a large quiet fan -- don't trust the boxed one, as I said the Phenom processors were rushed to the market, it's possible the coolers aren't quite up to running 4 cores at 100% -- plus, they usually only test in windoz while Linux has a habit of pushing the hardware a little further, triggering bugs that don't appear otherwise.

                  Regards,
                  Mihnea
                  Rock. You are wise beyond your "Junior Member" status.

                  I just bought a big fatty heat sink (and I'm a mechanical engineer, so I'm pretty sure I know what I'm looking for), and I will do the BIOS update if the problem shows up in the new hardware.

                  Comment

                  Working...
                  X