Announcement

Collapse
No announcement yet.

SB700 high temperature problem

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by chefkoch View Post
    Ok.. I got curious so I taped a thermistor from an old mobo to the heatsink and attached a multimeter. I can't actually tell the temperature that way but it responds to smallish temperature differences pretty well.

    anyway .. according to page 17 in the rrg, bit 0 to 5 of sata controller register 0x42 determine which ports to enable. If I disable the unused 3 ports I gain about 200ohm on my "thermometer" ..I think that's about 3-5C. On a laptop unused ports may already be disabled though.

    I used this command: setpci -s 00:11.0 0x42.b=0x38

    I tried some of the other things on page 17
    "Enable Dynamic Interface Clock Power Saving" didn't make a difference
    "Enable Dynamic Sata Core Power Saving" seems to shut down the link completely until I disable it again.
    Maybe these need to be set during boot or they need driver support.. dunno
    That looks kind of what the revamped powertop does nowadays. See these screenshots to get an idea of the improvements since version 1.13:

    http://dl.dropbox.com/u/6565679/scre...p_overview.jpg
    http://dl.dropbox.com/u/6565679/scre...p_tunables.jpg

    Comment


    • #47
      Originally posted by chefkoch View Post
      Ok.. I got curious so I taped a thermistor from an old mobo to the heatsink and attached a multimeter. I can't actually tell the temperature that way but it responds to smallish temperature differences pretty well.

      anyway .. according to page 17 in the rrg, bit 0 to 5 of sata controller register 0x42 determine which ports to enable. If I disable the unused 3 ports I gain about 200ohm on my "thermometer" ..I think that's about 3-5C. On a laptop unused ports may already be disabled though.

      I used this command: setpci -s 00:11.0 0x42.b=0x38

      I tried some of the other things on page 17
      "Enable Dynamic Interface Clock Power Saving" didn't make a difference
      "Enable Dynamic Sata Core Power Saving" seems to shut down the link completely until I disable it again.
      Maybe these need to be set during boot or they need driver support.. dunno
      Umm... I got some more questions.
      1. Where did you get numbers 00 for the bus, 11 for the slot?
      2. .0 must mean all functions due to setpci manual, how did you get that?
      3. And, What about 0x38? You must tried to disable devices numbered 3,4 and 5 on this chip. How did you find out that?
      I need the answers to above questions, so that I can find out the desired numbers suitable for my laptop.
      Thanks again

      Comment


      • #48
        Originally posted by ario View Post
        Lazy? SB is somewhere accessible in my laptop, I can touch it, I can touch my HDD too, HDD is 62C, and SB is as hot as HDD. I also can use a thermometer, but... anyway!
        if you can touch your hdd it does not have 62C. At 62C you can fry eggs. If you use hddtemp - are you sure that those temps are real? So use a thermometer. And look at the specs. It might be that your hdd and your SB are well inside the thermal envelope - if you have finally real numbers. Because 62C - are you using a 15000rpm uscsi drive?

        Comment


        • #49
          Well .. even if it is within tolerances it might still be interesting see if the temperature can be lowered. Especially since it's suppoed to run way cooler under windows. (It's also an excuse for me to finally take a closer look at the documentation dumps :P)

          I compared the thermistor values with the ones from the cpu sensors. My 3-5C estimate was apparently bit on the high side, it's probably closer to 2C when the ports are disabled.

          You can get the device numbers from lspci or the register guide (it shows them in decimal though).

          # lspci | grep SATA
          00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
          Or select the device by its vender and device id, which you can get from lspci or the register guide aswell.

          # lspci -n | grep 00:11.0
          00:11.0 0106: 1002:4391
          0x1002 is AMD's vendor id

          The device id depends on the mode set by the BIOS. From register guide page 12:
          IDE 0x4390
          AHCI 0x4391
          RAID without RAID 5 option 0x4392
          RAID with RAID 5 option SATA + FC enabled 0x4393 or 0x4394

          silly example:

          setpci -d 1002:4391 0x0.l

          About the meaning of the bits. 0x38 in hex is 00111000 in binary.

          Google can do the translation for you:
          http://www.google.com/search?q=0x38+in+binary
          http://www.google.com/search?q=0b00111000+in+hex
          It leaves out leading zeros though.

          The ports count up from the right. Every bit that's set to one disables a port. The two left most bits are reserved so leave them alone.


          To figure out which device is attached to which port you can look through the dmesg output or into sysfs.

          dmesg is probably easier:

          # dmesg | grep ATA
          ...
          ata2.00: ATA-7: WDC WD1600JS-00NCB1, 10.02E02, max UDMA/133
          ata3.00: ATAPI: HL-DT-STDVD-RAM GH22NS30, 1.01, max UDMA/100
          ata1.00: ATA-7: SAMSUNG HD252HJ, 1AC01118, max UDMA7
          ...
          with sysfs it'd be something like this:

          # cat /sys/class/scsi_host/host0/device/target0\:0\:0/0\:0\:0\:0/model
          SAMSUNG HD252HJ

          or

          # ls /sys/class/scsi_device/
          0:0:0:0 1:0:0:0 2:0:0:0
          # cat /sys/class/scsi_device/0\:0\:0\:0/device/model
          SAMSUNG HD252HJ

          I looked into a few more things...
          This is something that powertop suggests and it really cools the chip down slightly.
          echo "min_power" > /sys/class/scsi_host/host0/link_power_management_policy
          If you set it from powertop it only sets the first host to min_power. You can also set that for the other hosts. But doing so didn't make a difference here. Either it really only needs to be set for the first one or it's because the other devices are a dvd drive and a fairly old hdd.

          This sets all hosts to min_power:

          for i in /sys/class/scsi_host/*; do if [ -e $i/link_power_management_policy ]; then echo min_power > $i/link_power_management_policy; fi; done
          Disabling some unused USB ports slightly cooled the chip too.. but it wasn't a whole lot.
          This disables all but the first 6 ports:
          setpci -s 0:14.0 0x68.b=0x0f
          There are three usb controllers with 14 ports, bit zero to two control the first six ports, bit four to six another six and bit seven the last two. Set any group to zero to disable that range. It's a bit confusing.. for the exact meaning of the bits check register programming requirements page 43 and register guide page 62 and 133. Not sure if it's worth fooling around with these. The difference is probably less than one degree.

          It's all a bit crude anyway. If anything here makes a noticeable difference it might be worth patching the drivers directly. Right now they get pretty unhappy when you try to acces the disabled ports.

          Comment


          • #50
            Originally posted by energyman View Post
            if you can touch your hdd it does not have 62C. At 62C you can fry eggs. If you use hddtemp - are you sure that those temps are real? So use a thermometer. And look at the specs. It might be that your hdd and your SB are well inside the thermal envelope - if you have finally real numbers. Because 62C - are you using a 15000rpm uscsi drive?
            Man! Man! Man! believe me! I can touch it but for no more than 3 seconds because it's hot like my room radiators. It can burn my fingers. Can you believe me now?
            My harddrive is Western Digital Scorpio Blue WDC WD 3200BEVT 22ZCT0 (S1).

            Comment


            • #51
              Originally posted by chefkoch View Post
              Well .. even if it is within tolerances it might still be interesting see if the temperature can be lowered. Especially since it's suppoed to run way cooler under windows. (It's also an excuse for me to finally take a closer look at the documentation dumps :P)

              I compared the thermistor values with the ones from the cpu sensors. My 3-5C estimate was apparently bit on the high side, it's probably closer to 2C when the ports are disabled.

              You can get the device numbers from lspci or the register guide (it shows them in decimal though).



              Or select the device by its vender and device id, which you can get from lspci or the register guide aswell.


              0x1002 is AMD's vendor id

              The device id depends on the mode set by the BIOS. From register guide page 12:
              IDE 0x4390
              AHCI 0x4391
              RAID without RAID 5 option 0x4392
              RAID with RAID 5 option SATA + FC enabled 0x4393 or 0x4394

              silly example:




              About the meaning of the bits. 0x38 in hex is 00111000 in binary.

              Google can do the translation for you:
              http://www.google.com/search?q=0x38+in+binary
              http://www.google.com/search?q=0b00111000+in+hex
              It leaves out leading zeros though.

              The ports count up from the right. Every bit that's set to one disables a port. The two left most bits are reserved so leave them alone.


              To figure out which device is attached to which port you can look through the dmesg output or into sysfs.

              dmesg is probably easier:



              with sysfs it'd be something like this:




              I looked into a few more things...
              This is something that powertop suggests and it really cools the chip down slightly.


              If you set it from powertop it only sets the first host to min_power. You can also set that for the other hosts. But doing so didn't make a difference here. Either it really only needs to be set for the first one or it's because the other devices are a dvd drive and a fairly old hdd.

              This sets all hosts to min_power:



              Disabling some unused USB ports slightly cooled the chip too.. but it wasn't a whole lot.
              This disables all but the first 6 ports:


              There are three usb controllers with 14 ports, bit zero to two control the first six ports, bit four to six another six and bit seven the last two. Set any group to zero to disable that range. It's a bit confusing.. for the exact meaning of the bits check register programming requirements page 43 and register guide page 62 and 133. Not sure if it's worth fooling around with these. The difference is probably less than one degree.

              It's all a bit crude anyway. If anything here makes a noticeable difference it might be worth patching the drivers directly. Right now they get pretty unhappy when you try to acces the disabled ports.
              Thank you again. I ran lspci -vvvxxxx to maximize verbosity and saved the output in a text file. Opened it and search for the AHCI part and this is it:
              Code:
              00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
              	Subsystem: Acer Incorporated [ALI] Device 0206
              	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
              	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
              	Latency: 64
              	Interrupt: pin A routed to IRQ 22
              	Region 0: I/O ports at 8420 [size=8]
              	Region 1: I/O ports at 8414 [size=4]
              	Region 2: I/O ports at 8418 [size=8]
              	Region 3: I/O ports at 8410 [size=4]
              	Region 4: I/O ports at 8400 [size=16]
              	Region 5: Memory at f0208000 (32-bit, non-prefetchable) [size=1K]
              	Capabilities: [60] Power Management version 2
              		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
              		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
              	Capabilities: [70] SATA HBA v1.0 InCfgSpace
              	Kernel driver in use: ahci
              	Kernel modules: ahci
              00: 02 10 91 43 07 00 30 02 00 01 06 01 00 40 00 00
              10: 21 84 00 00 15 84 00 00 19 84 00 00 11 84 00 00
              20: 01 84 00 00 00 80 20 f0 00 00 00 00 25 10 06 02
              30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00
              40: 10 00 3c 20 01 00 10 00 00 00 20 01 00 00 00 00
              50: 05 70 84 00 00 00 00 00 00 00 00 00 00 00 00 00
              60: 01 70 22 00 00 00 00 00 00 00 00 00 00 00 00 00
              70: 12 00 10 00 0f 00 00 00 00 00 00 00 00 00 00 00
              80: 00 00 00 00 06 00 00 2c 14 80 b4 01 14 80 b4 01
              90: 16 80 b4 01 16 80 b4 01 16 80 b4 01 16 80 b4 01
              a0: 7a a0 7a a0 7a a0 7a a0 7a a0 7a a0 00 00 00 00
              b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
              c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
              d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00
              e0: 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
              f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
              As you can see on memory dump part, the line 40 and the second column 42, is 0x3c which means unused ports are disabled by default. I can convert the 0x3c by gnome calculator and it's 111100. Ports 2,3,4 and 5 are disabled. Let's assume two first bit's for DVDROM and HDD on my laptop. I tried other registers, I can remember that somewhere, maybe in this topic, people suggested me to echo "min_power" to those files, I did, and no change. But, I will repeat that soon.
              But for now, I want to reverse engineer some driver files. I want to know that what the hell does the SB7xx driver from acer site which is originally written by AMD for my laptop. This is only thing I have which causes my HDD and SB700 chip to cool down and only on windows.
              Can you advice me a disassembler for windows .sys driver files?
              To make it more comfortable here is the address of driver files on my downloads archive:
              ACER 5536 Windows 7 32Bit drivers/VGA ATI(M86 M82 M96 M92)_v8.632.0.0_Win7x86x64/Packages/Drivers/SBDrv/SB7xx/AHCI/W7
              As you can see, the driver is for ACER 5536 Windows 7 32 bit, distributed as a part of VGA ATI drvier, under the name of SB7xx AHCI and the content of the folder is:
              amdsata.inf
              amdsata.msi
              amdsata.sys
              Amdsata.cat
              amdxata.sys
              Now, Please tell me which file do you think is better for me to start from, and which application to open file with? i.e if you think those magic commands for setting registers may based in amdsata.sys file how can I open that file and with which assembler/hex editor file? I tried almost every thing for more than a year and I'm ready to dirty my hands and my whole body!
              Thanks again

              Comment


              • #52
                With many thanks for all your helps, just to remind you, please notice the facts below:
                1. My HDD is cool on windows by touching and by using HDD Temperature software (38C) and it's hot on Linux by both touching and using hddtemp (62C). So it can be cool without any modification in fan and heat-sink structures.
                2. I installed AMD graphic proprietary driver for linux, It cooled down my GPU but no change on HDD temperature. So it's not because of GPU heat.
                3. My HDD was hot on windows too, but only before I install the SB7xx driver files there. So SB7xx driver does a black magic which cools down HDD and SB700 chip.
                4. My HDD is cool on linux when I remove it from my laptop, and connect it to my desktop using normal SATA and SATA power cables. So this is not because of linux kernel itself, or my hard-drive. It must be the lake of some driver special commands.

                Comment


                • #53
                  In another attempt, I ran hdparm -I /dev/sda on my laptop, then ran the same command with my laptop hard installed in my desktop. All features of my 320GB hard-drive were the same on both my laptop and my desktop, except one thing:
                  The DMA Setup Auto-Activate optimization was enabled when the HDD is installed in laptop (Hot) and disabled when it was installed in desktop (Cool).
                  The south bridge chip in my desktop is NVidia MCP65 and as you know it's AMD SB700 on my laptop.
                  So the MCP65 disables the DMA Setup Auto-Activate optimization. Disabling this feature maybe the key to cool down my hard drive, cause none of two hard drives installed already on my desktop have this feature and none of them is hot.
                  The question is, what is DMA Setup Auto-Activate optimization?
                  Thanks again.

                  Comment


                  • #54
                    and burning sensation starts with 45C. Which is high but not so exceptional for harddisks. So get a REAL thermometer and measure the temperature. It might be cooking, but also might be not such a big deal.

                    Comment


                    • #55
                      Originally posted by energyman View Post
                      and burning sensation starts with 45C. Which is high but not so exceptional for harddisks. So get a REAL thermometer and measure the temperature. It might be cooking, but also might be not such a big deal.
                      Ok. I will. Thanks for the tip.

                      Comment

                      Working...
                      X