Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Originally posted by tholin View Post
    I've been following this problem on gentoo's forum and it's almost impossible to know what works and what doesn't. The crashes are nondeterministic so users might change bios setting and then get lucky for a while so they assume the bios change solved the problem. There could also be several separate problems with the same symptoms but requiring separate fixes.


    Can someone clarify this? Is is saying that returning from interrupt to a "high user %rip address near the end of the user address space (top of user stack)" sometimes crashes. Do dragonfly run programs that execute code from stack? Or is it saying that a crash might happen when the return pointer is read from stack when the stack is near the end of the user address space?

    If there is a hardware bug depending on specific address in user address space it would make sense that compile jobs triggers it. Linux use address space layout randomization to put memory segments on different addresses on each run. A compile job forks a lot of processes each with their own layout. Try compiling without ASLR. "echo 0 > /proc/sys/kernel/randomize_va_space".
    Thank you very much. This workaround seemed to work for me. Before this setting, I could reproduce this problem at least once per ten kernel build by make -j16.
    But that build worked fine during 100 times after this setting.

    Comment


    • #82
      Originally posted by dillon View Post
      Hi, Matt Dillon here. Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations.
      Thank you for the extensive clarification on this.

      Originally posted by dillon View Post
      The problem occurs more often with high %rip addresses such as near the top of the user stack, which is where DragonFly's signal trampoline traditionally resides.
      Is the trigger OS specific, as there doesn't seem to be any Visual Studio devs using Windows reporting this (yet)?

      Originally posted by dillon View Post
      Only IRETQ seems to trigger it in the manner described above, which means that AMD can probably fix it with a microcode update.
      I certainly hope so, as I'm planning on buying a Ryzen system and this made me postpone my planned upgrade until it's resolved.

      Comment


      • #83
        I think it can disabled in kernel config, but with less security :
        CONFIG_RANDOMIZE_MEMORY: │
        │ │
        │ Randomizes the base virtual address of kernel memory sections │
        │ (physical memory mapping, vmalloc & vmemmap). This security feature │
        │ makes exploits relying on predictable memory locations less reliable. │
        │ │
        │ The order of allocations remains unchanged. Entropy is generated in │
        │ the same way as RANDOMIZE_BASE. Current implementation in the optimal │
        │ configuration have in average 30,000 different possible virtual │
        │ addresses for each memory section. │
        │ │
        │ If unsure, say N.

        CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING

        Comment


        • #84
          Originally posted by Beherit View Post
          Thank you for the extensive clarification on this.


          Is the trigger OS specific, as there doesn't seem to be any Visual Studio devs using Windows reporting this (yet)?

          I certainly hope so, as I'm planning on buying a Ryzen system and this made me postpone my planned upgrade until it's resolved.
          This Problem exists for a long time in AMD processors.
          I have a [AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G], and it hapens at least some 5-10 times a week.
          For what I see described here, the error seems the same

          its a mess, they never fixed it!
          Processor halts, and you have to press reset and boot again...
          But its not in compile time, its during applications execution, like browsing, video, and some times, but less, happens even in a console session... a nightmare.

          Comment


          • #85
            Originally posted by tuxd3v View Post
            This Problem exists for a long time in AMD processors.
            I have a [AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G], and it hapens at least some 5-10 times a week.
            For what I see described here, the error seems the same
            I really hope you're wrong about this. But right now, awaiting the new Coffee Lake series from Intel and coughing up ~€500 more for it, is a more appealing "better safe than sorry"-alternative for me than a Ryzen.

            Comment


            • #86
              Originally posted by Beherit View Post
              I really hope you're wrong about this. But right now, awaiting the new Coffee Lake series from Intel and coughing up ~€500 more for it, is a more appealing "better safe than sorry"-alternative for me than a Ryzen.
              Why so glum? The reason I pointed Michael toward this issue is to increase the chance that it'd get solved; I've already seen a few useful developments in that direction. Shit happens with new platforms -- hardly like this is unique to AMD either, even if it may seem this way because all we've been seeing for the past 5-6y is Intel, iterating away.

              Comment


              • #87
                but the problem isn't totally disappeared (I mean with ASLR turned off); now is very sporadic and it happens always in gcc, it is not bash or whatever that segfaults, but once every 2-3 hours, gcc itself segfaults

                arg

                Comment


                • #88
                  Originally posted by tuxd3v View Post

                  This Problem exists for a long time in AMD processors.
                  Can't be, as Matt Dillon is speaking about a problem with SMT and Ryzen is the first CPU from AMD that implements this.

                  Comment


                  • #89
                    Originally posted by tuxd3v View Post

                    This Problem exists for a long time in AMD processors.
                    I have a [AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G], and it hapens at least some 5-10 times a week.
                    For what I see described here, the error seems the same

                    its a mess, they never fixed it!
                    Processor halts, and you have to press reset and boot again...
                    But its not in compile time, its during applications execution, like browsing, video, and some times, but less, happens even in a console session... a nightmare.
                    How the problem is supposed to be the same if you're mentioning the processor halts and instead here we're getting segfaults, also on very different scenarios?

                    Comment


                    • #90
                      Just a note about overclocking voltages:

                      MSI X370 SLI Plus BIOS contains a button that overclocks the CPU (Ryzen 5 1600) from 3.2GHz to 3.6GHz and changes the fan envelope. An unexpected issue is that turning the button off does not lower the core voltage (Vcore) back to normal levels, that is back from 1.464 Volt to 1.2 Volt. Normal voltage is restored back to 1.2 Volt by clearing the CMOS.

                      3.2GHz @ 1.464V is unstable (CPU hits 95℃ and gets automatically throttled from 3.2GHz to about 2.7GHz during stress testing in AIDA64), and 3.2GHz @ 1.2V is stable (max CPU temperature during AIDA64 stress test is 76℃). I didn't test 3.6GHz @ 1.464V, but I would expect the system to be unstable at this voltage as well.

                      Comment

                      Working...
                      X