Announcement

Collapse
No announcement yet.

Linux 6.1 Will Try To Print The CPU Core Where A Seg Fault Occurs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by marlock View Post
    On your case, does processor Y stay the same across reboots or can the ultimately segfaulting process get reassigned to a new core frim time to time?

    In case of an actual defective core, it will always be the same core across reboots, different failing processes, etc.
    This is where corner cases get so horrible. Think real-time where a process will start up assigned to a particular core and never change cores even after reboots. So X core failing to create memory could be something like that creating a buffer to be filled in by another process to be returned.

    Processes don't always have dynamic assignment to cores it depends on what users have configured.

    Comment


    • #22
      If you're aware of this config, you could configure the task from core Y to core Z, see if the segfault reports core Z or if core Y keeps segfaulting other tasks?

      Also you can always just replace core Y and see what happens... if the new core segfaults you know the issue wasn't on core Y but on another core or the software itself (and put the core back to use)... if it fixes things, hurray, the feature helped prevent a blind hunt.

      Comment


      • #23
        Originally posted by marlock View Post
        If you're aware of this config, you could configure the task from core Y to core Z, see if the segfault reports core Z or if core Y keeps segfaulting other tasks?

        Also you can always just replace core Y and see what happens... if the new core segfaults you know the issue wasn't on core Y but on another core or the software itself (and put the core back to use)... if it fixes things, hurray, the feature helped prevent a blind hunt.
        I think the idea is that the thread <-> core mapping will naturally tend to get jumbled, from one run to the next. Over time, a picture should emerge from this stochastic process, when one core has a markedly higher failure rate than the rest.

        If disabling it causes the total failure rate to drop down to baseline, then you have your culprit. If not, then perhaps the entire CPU needs to be replaced (unless there's another core with disproportionately high errors).

        The key is simply to aggregate enough data.

        Comment


        • #24
          Originally posted by marlock View Post
          If you're aware of this config, you could configure the task from core Y to core Z, see if the segfault reports core Z or if core Y keeps segfaulting other tasks?

          Also you can always just replace core Y and see what happens... if the new core segfaults you know the issue wasn't on core Y but on another core or the software itself (and put the core back to use)... if it fixes things, hurray, the feature helped prevent a blind hunt.
          That is also the catch if you are aware. You can have cases where by pure luck a thread and process is always been assigned to the same core and this can be multi threads end up like this. Remember Linux kernel NUMA scheduler will attempt to avoid transferring thread between cores because there are costs transferring threads between cores. Order everything starts on the system could basically result in fixed core placement of threads by luck. And this by luck be every time you boot the system with a particular workload everything is landing on the same cores and nothing in the the workload gives the kernel reason to move anything between core with nothing configured to tell the kernel todo this.

          Setting configurations you can force things with thread placement but the reality is right workload can produce the same thing with the Linux kernel without any settings. True do you fell unlucky punk?

          Corner cases are true pain. The base of them SODs law "if something can go wrong, it will"​ or Murphy law "Anything that can go wrong will go wrong"​.

          Basically you need to be aware that the CPU core reporting segfault may not be the cause and deeper looking is required due to some of the wacky corner cases that are possible..
          Last edited by oiaohm; 10 October 2022, 09:46 AM.

          Comment


          • #25
            This can help with debugging hardware/compilers/interpreters. The most pain I have experienced in my professional (excluding politics and mismanagement) was debugging unexpected float overflows in PHP 5 on Core2 based Xeons.

            Re: Strange segfaults....

            Fast forward to 2020. I spent a long time debugging crashes when my system was under pressure. Turned out there was nothing wrong with my system.

            It was my UPS relay (no issues with UPS on battery or plugging directly into the wall plug).

            Comment


            • #26
              Constantly having web browsers restarting since kernel 6 range

              RAM tests fine 128 GB 50~80% used
              Ryzen 9 3900X that pretty much idles around 2ghz range in eco mode ~0.8v , turbo boost etc disabled

              Code:
              [1254695.932862] opera[4070372]: segfault at 0 ip 00007febc919895e sp 00007ffed64e41f8 error 4 in libc.so.6[7febc9028000+195000] likely on CPU 7 (core 9, socket 0)
              [1254695.932878] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
              [1254878.469478] opera[2138833]: segfault at 0 ip 00007f1d8619895e sp 00007fffba76fe78 error 4 in libc.so.6[7f1d86028000+195000] likely on CPU 4 (core 5, socket 0)
              [1254878.469496] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
              [1257013.938345] opera[2144996]: segfault at 0 ip 00007f6cd019895e sp 00007ffe19420658 error 4 in libc.so.6[7f6cd0028000+195000] likely on CPU 10 (core 13, socket 0)
              [1257013.938367] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
              [1261449.307547] traps: opera[2213514] trap invalid opcode ip:560300d9338b sp:7fffddd39e10 error:0 in opera[5602fe273000+663c000]
              [1300215.866684] qtdemux0:sink[3391830]: segfault at 0 ip 0000000000000000 sp 00007fe98fbf4cc8 error 14 in totem-video-thumbnailer[5594b5476000+3000] likely on CPU 2 (core 2, socket 0)
              [1300215.866700] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
              [1300215.986977] qtdemux0:sink[3391851]: segfault at 0 ip 0000000000000000 sp 00007f1ad5ff4cc8 error 14 in totem-video-thumbnailer[562589748000+3000] likely on CPU 11 (core 14, socket 0)
              [1300215.986996] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
              [1300216.287257] qtdemux0:sink[3391907]: segfault at 0 ip 00007ff0996907e7 sp 00007ff094eedc70 error 6 in libde265.so.0.1.1[7ff099660000+71000] likely on CPU 7 (core 9, socket 0)
              [1300216.287276] Code: 44 89 c9 41 89 d2 41 89 d4 41 89 c5 8b 96 e8 00 00 00 41 d3 e4 41 d3 e5 89 c1 0f af ca 45 89 e7 44 01 d1 48 63 c9 48 8d 0c 49 <66> 45 89 34 cb 8b 0f 45 89 ee 89 4c 24 0c 8b 8e e4 00 00 00 41 d3
              [1300216.651834] traps: multiqueue0:src[3391935] trap int3 ip:7f4562d49cef sp:7f451b9f3420 error:0 in libglib-2.0.so.0.7200.4[7f4562d0a000+8f000]
              [1300216.972089] traps: multiqueue0:src[3391956] trap int3 ip:7fcc7dd49cef sp:7fcc727f95c0 error:0 in libglib-2.0.so.0.7200.4[7fcc7dd0a000+8f000]
              [1334019.352750] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[4134895] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3889042]
              [1348419.441901] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[271230] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3889042]
              [1373696.868819] opera[2211957]: segfault at 11fd5493314c ip 0000556da0173fbc sp 00007ffd9b4658f0 error 4 in opera[556d99b43000+663c000] likely on CPU 1 (core 1, socket 0)
              [1373696.868840] Code: 8d 0c 1c 48 83 c1 0c 89 01 4c 01 e3 4c 8d 2d cb 33 c0 f9 49 c1 ed 0d 4d 8b 37 4d 31 ee 49 83 fe ff 49 f7 d6 0f 84 8e 00 00 00 <49> 8b 46 08 4c 31 e8 48 f7 d0 49 39 c7 74 56 c7 45 a0 04 00 00 00
              [1377219.681589] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[1178597] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3889042]
              [1377982.832820] traps: opera[1080230] trap invalid opcode ip:55fb1c2e038b sp:7fffb035a380 error:0 in opera[55fb197c0000+663c000]
              [1391619.768226] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[1491400] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3889042]
              [1594037.411321] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[2235306] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1622837.507574] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[3149774] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1658837.666474] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[4005885] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1673237.723846] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[152830] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1678900.810414] opera[1078244]: segfault at 0 ip 00007f7c9299895e sp 00007ffff282aa28 error 4 in libc.so.6[7f7c92828000+195000] likely on CPU 7 (core 9, socket 0)
              [1678900.810436] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
              [1680437.729055] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[338112] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1684055.710263] traps: opera[286419] trap invalid opcode ip:556b452c838b sp:7ffccf4dc980 error:0 in opera[556b427a8000+663c000]
              [1687637.730104] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[511060] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[3165333]
              [1767606.535019] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[2645131] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[1461933]
              [1813844.299776] opera[1452755]: segfault at 0 ip 00007fd4ad39895e sp 00007ffe570a9418 error 4 in libc.so.6[7fd4ad228000+195000] likely on CPU 8 (core 10, socket 0)
              [1813844.299795] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
              [1819311.968200] traps: opera[3975582] trap invalid opcode ip:55e978eca38b sp:7ffdd0ad72d0 error:0 in opera[55e9763aa000+663c000]
              [1832406.765903] ptrace attach of "/opt/google/chrome-beta/chrome --enable-crashpad"[220525] was attempted by "/opt/google/chrome-beta/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/home/aio/.config/google-chrome-beta/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel=beta --annotation=lsb-release=Ubuntu 22.04.3 LTS --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=99.0.4844.27 --initial-client-fd=5 --shared-client-connection"[1461933]
              [1894820.265623] opera[3973998]: segfault at 0 ip 00007fbce539895e sp 00007ffd1f56e628 error 4 in libc.so.6[7fbce5228000+195000] likely on CPU 3 (core 4, socket 0)
              [1894820.265641] Code: 00 00 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9

              Comment


              • #27
                is it always the same core when this happens?

                if yes, then you should look into a possible cpu core reliability issue (but it may be a false positive if a core-locked thread crashes because another core-independant thread is generating bad data for it, etc)

                if not, then it may be a false negative if a core failure doesn't crash directly but induces crashes in another core-independant thread

                crashes across varied workloads over extended periods still happening always in the same core are a stronger sign of actual cpu core hardware issues


                that's what this news was about...

                ...

                ...in 2021!

                (ps: in your log it clearly isn't always the same core... "likely on CPU 1 (core 1, socket 0)" and etc)
                Last edited by marlock; 23 February 2024, 06:04 PM.

                Comment

                Working...
                X