Announcement

Collapse
No announcement yet.

Fedora 37 Weighing Change To Improve Profiling/Debugging But With Possible Performance Cost

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by archkde View Post

    No, the unwinder doesn't have to make a guess, it can unwind via DWARF (unless the program is playing weird games with the stack pointer or corrupting its stack, but in this case, all bets are off even with a frame pointer). And as said, the tool containing the unwinder needs to parse DWARF anyway for symbolization.

    The 5-10% comes from the cited benchmark and the fact that the cost for the frame pointer needs to be paid always, it's not something you can turn on when running the unwinder and turn off later.
    Unwinding is often performed when the program crash to debug it.
    So it's likely the stack is messed up.

    Regarding to 5-10% lost, it is often not visible except in micro benchmark.
    In real world, it's probably limited by something else, like poorly written softwares.

    Comment


    • #12
      Fedora is already slow out-of-the-box, hence I wouldn't recommend it for casual gamers who don't want to mess around with a custom Kernel at all. Making it further 5-10 % slower doesn't help here.

      Comment


      • #13
        Originally posted by archsway View Post
        The -fomit-frame-pointer option should never have been invented.
        Online access to you should have never been invented.

        Comment


        • #14
          Just leaving symbols in the binaries has been a huge game-changer for me and doesn't even affect performance. Seeing exactly where CPU time is spent in perf top from user to kernel space is a revelation. But 5% sounds like quite the regression. Not sure which archs would be affected since a few (among them x86_64, IIRC?) don't omit frame pointers at any optimization level unless explicitly enabled.

          Comment


          • #15
            Originally posted by NobodyXu View Post

            Unwinding is often performed when the program crash to debug it.
            So it's likely the stack is messed up.
            Read the proposal again. Facebook doesn't even pretend to propose using this for debugging, but rather as a workaround for making their profiler faster. And the frame pointer doesn't really help for debugging anyway, as you can see when you compare how the two unwinding methods determine a stack frame:
            • If there is a frame pointer, the current stack frame is determined using the stack pointer and the frame pointer. Then the frame pointer of the callee is read from the stack, and unwinding continues.
            • In absence of a frame pointer, the current stack frame is determined using the stack pointer and the instruction pointer (which allows calculating the stack depth via DWARF). Next the return address is read from the stack, and unwinding continues.
            As you see, both methods need to read exactly one pointer value from the stack per stack frame. Hence if the program corrupts its stack, both methods tend to break equally.


            Regarding to 5-10% lost, it is often not visible except in micro benchmark.
            In real world, it's probably limited by something else, like poorly written softwares.
            These 5-10% are on top of everything else. They come from having two or three extra instructions per function call, and from the reduction of usable registers by one. If anything, I would guess that the former is particularly noticeable in poorly written software, and the latter is particularly noticeable in software where every bit of performance matters (although admittedly, maintainers of software in the latter category would likely request their packages to be compiled with frame pointer off).

            Comment


            • #16
              Build better stack unwinders ffs?

              Comment

              Working...
              X