Announcement

Collapse
No announcement yet.

My Intel Linux NICs Have Developed A Nasty Habit Of Becoming Hung

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • My Intel Linux NICs Have Developed A Nasty Habit Of Becoming Hung

    Phoronix: My Intel Linux NICs Have Developed A Nasty Habit Of Becoming Hung

    My NICs appear to enjoy sleeping in with the mornings being particularly brutal on the network hardware...

    http://www.phoronix.com/scan.php?pag...0E-Crazy-Hangs

  • #2
    I always have these problems with a Intel I218-V when I boot Windows and then into Linux at a later point. The only thing that works (but works reliably) is to

    modprobe -r e1000e
    then do a suspend (just through Unity/suspend system...)
    wake it up again and modprobe e1000e

    If you never bootet Windows between those benchmark sessions it's probably another issue though.

    Comment


    • #3
      Originally posted by d2kx View Post
      I always have these problems with a Intel I218-V when I boot Windows and then into Linux at a later point. The only thing that works (but works reliably) is to

      modprobe -r e1000e
      then do a suspend (just through Unity/suspend system...)
      wake it up again and modprobe e1000e

      If you never bootet Windows between those benchmark sessions it's probably another issue though.
      Yeah none of these systems have Windows installs.
      Michael Larabel
      http://www.michaellarabel.com/

      Comment


      • #4
        RTL8139. Just works. No troubles.
        I know this is just a 10/100 one. But I haven't experienced troubles with all the many Realtek chips here. Even the VIA Rhine stuff works.
        Maybe sophisticated tech is more vulnerable to hiccups.
        Any dmesg logs hinting towards a trouble?

        And how do you like these "predictable" network interface names? I found them everything but predictable. I understand the basic problem behind it but those names are... oh, well, not really handy or humanreadable. I prefer my old style eth0.

        I had my little 10/100 8port switch frozen once or twice per year until I reset it. Got a new one by now, sucks less power and is 8x 10/100/1000. Will see how that works over the year. But from all my NICs the only ones that messed up were some Edimax card with a strange chip (years ago) and an Atheros L1E soldered on a mainboard, but that one went crazy also in Windows and even without an OS driver (NIC not present after cold boot, warm reboot -> suddenly present; downloads okay til 50 MB size, corrupted above 50 MB both on Gentoo / Windows).
        Stop TCPA, stupid software patents and corrupt politicians!

        Comment


        • #5
          I've had similar problem with one of my servers at work.
          Turns out (in my case) it was a bug in the TSO driver code that over a certain load causes the driver to hang.
          To overcome the problem, I put this command at boot:
          Code:
          ethtool -K IFACE tso off
          where IFACE is the failing network interface.

          Comment


          • #6
            I used to have a laptop with a rtl8192 wifi adapter, if I boot to windows and then reboot to linux the kernel module would spit errors about hardware state. I had to boot to windows and then disable the adapter in device manager and then reboot to linux in order for it to work properly.

            EDIT: Now I'm using a rtl8812 wifi adapter, it works perfect in linux, but I have to boot windows once and then reboot windows for it to work there.
            duby229
            Senior Member
            Last edited by duby229; 22 April 2016, 08:38 AM.

            Comment


            • #7
              Same thing with my Thinkpad T450s /Intel I218-V ethernet / Ubuntu 15.10
              It randomly doesn't wake on boot.

              insmod / modprobe / rmmod doesn't help.

              Comment


              • #8
                OT: nice background in 2nd screenshot

                Comment


                • #9
                  Which motherboards and NIC models, specifically?

                  Last time I saw something fun like this across multiple motherboard models and Intel NIC models (back on e1000), it was because the motherboards this happened on lied about DMAs succeeding, so the NIC thought a given Tx/Rx Ring was free and it had told the OS, and the driver thought it was occupied...leading to eventual exhaustion.

                  Are there any e1000e machines you've got in service that aren't experiencing this bug?

                  Given that it's happening on multiple machines (particularly if it's happening at approximately the same time to all the machines), I wonder if it's not a bug being triggered by a particular packet or packet sequence - can you plug a non-e1000e NIC into the same device, doing nothing else, and save all the traffic it sees over 24 hours with timestamps, then compare the traffic in that interval, possibly trying to replay the traffic to see if the problem reproduces.

                  You could also play the git bisect dance on one of the machines, but that's likely to be time-consuming if it only happens once per day, and presumes there's a version in the past where this didn't happen, and not that this is a new piece of traffic on the network munging the data.

                  Probably also worth reporting a bug on kernel.org since I presume at least one of these machines is running something close enough to latest git that it's not already fixed.

                  e: One final note - in particular, the last time I saw NICs mysteriously hanging after an interval, it turned out to be that the power input was sagging; are all these machines on the same circuit/PDU, and do you have anything monitoring the line for stability?
                  rincebrain
                  Junior Member
                  Last edited by rincebrain; 22 April 2016, 09:02 AM.

                  Comment


                  • #10
                    Originally posted by d2kx View Post
                    I always have these problems with a Intel I218-V when I boot Windows and then into Linux at a later point. The only thing that works (but works reliably) is to

                    modprobe -r e1000e
                    then do a suspend (just through Unity/suspend system...)
                    wake it up again and modprobe e1000e

                    If you never bootet Windows between those benchmark sessions it's probably another issue though.
                    I also had this problem when dual booting with MS Windows. The Intel NIC version is slightly different but I'm using the same 'e1000e' driver:

                    Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
                    DeviceName: Onboard LAN
                    Subsystem: Dell Device 05cc
                    Flags: bus master, fast devsel, latency 0, IRQ 28
                    Memory at f7100000 (32-bit, non-prefetchable) [size=128K]
                    Memory at f7139000 (32-bit, non-prefetchable) [size=4K]
                    I/O ports at f040 [disabled] [size=32]
                    Capabilities: <access denied>
                    Kernel driver in use: e1000e

                    In my case, disabling the WoL functionality of the Intel NIC in Windows resolved the issue with the NIC not working in Linux after a reboot. Original post which helped me.
                    bylie
                    Junior Member
                    Last edited by bylie; 22 April 2016, 09:30 AM.

                    Comment

                    Working...
                    X