Announcement

Collapse
No announcement yet.

Fedora's FESCo Rejects The Idea Of "-fno-omit-frame-pointer" As Default Compiler Flag

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    They can have a slow distro if they like, and we can use other distros if we like. (Something arch based for convenience, or gentoo based if some extra performance is worth all the extra compiling.)

    A frame pointer makes some debugging "easier", but all that debugging is possible anyway. The "hard" parts about figuring out a stack layout without frame pointers, can conveniently be offloaded onto a computer and software. Somehow, anyone doing debugging work has access to a computer! These days computers tend to have many cores, so real-time debugging is helped by running this extra work on some other core. And even in the single-core case, the extra work of real-time stack unwinding will only be a linear slowdown.

    So instead of "why do this take 10min on a highly optimized build", you get "find out why this takes 20 min when also executing a ton of debugging hooks. And that is better than "this takes 15 min for everyone".

    Finally, lots of software is not such a complete mystery. Lots of debugging happens without debugging options, a few extra printf() and you figure out what is wrong.

    Comment


    • #12
      Originally posted by RahulSundaram View Post

      They have to rebuild the entire distro for their internal builds which they can certainly continue to do but they have been attempting to upstream their desired changes which is generally a good thing to do. These internal changes otherwise tend to become tech debt over time. At the data center scale, the net performance and developer efficiency gains are worth it for them. For other folks, the situation is less clear. The desktop team apparently wants it because it saves them time running their tools and several upstream projects are enabling the flags by default but other teams including the tooling team thinks it's not worth the tradeoffs.
      Exactly that!
      For them and at datacenter scale, this is worth it.

      Now things get a bit murky beyond that because fedora right now isn't the fedora that i used to know.
      Fedora that i used to know: desktop distribution! Plain and simple.
      Fedora right now: desktop/workstation distribution, cloud, server, ....

      For the fedora workstation (aka desktop users) forcing this change is senseless and only causes (slightly) bigger packages with degraded performance. Sure, all manageable and likely in the single percentage points but a degradation nevertheless. On desktop releases they should focus on user efficient releases, not developer efficient.

      For their other spins, specifically server and cloud, this change would make total sense!

      Usually in these discussion the money side wins. Which in this case would be server/cloud (desktop users don't pay). I'm glad this didn't turn out to be the case here.

      Comment


      • #13
        Originally posted by markg85 View Post

        Exactly that!
        For them and at datacenter scale, this is worth it.

        Now things get a bit murky beyond that because fedora right now isn't the fedora that i used to know.
        Fedora that i used to know: desktop distribution! Plain and simple.
        Fedora right now: desktop/workstation distribution, cloud, server, ....

        For the fedora workstation (aka desktop users) forcing this change is senseless and only causes (slightly) bigger packages with degraded performance. Sure, all manageable and likely in the single percentage points but a degradation nevertheless. On desktop releases they should focus on user efficient releases, not developer efficient.

        For their other spins, specifically server and cloud, this change would make total sense!

        Usually in these discussion the money side wins. Which in this case would be server/cloud (desktop users don't pay). I'm glad this didn't turn out to be the case here.
        I assume you missed the GNOME developers saying they wanted this too? Or that the KDE folks would have benefited from this as well? Both big desktops have tooling that massively benefits from working real-time tracing and profiling.

        Also, who do you think makes the software that gets shipped? Developers. And they have to get something nice out of the platform to keep doing it too.

        Comment


        • #14
          Originally posted by King InuYasha View Post
          Also, who do you think makes the software that gets shipped? Developers. And they have to get something nice out of the platform to keep doing it too.
          I'm a developer and there's no way in hell I'd tolerate performance drops on stuff I don't debug.

          Comment


          • #15
            Is this strictly an x86 thing? On AArch64, the performance impact of carrying a frame pointer should be negligible, due to double the general-purpose registers. I would like to see some benchmarks on that, but you'd obviously want to use a more recent core than something like the Pi's A72. Michael, do you have access to an Ampere Altra?

            Comment


            • #16
              Originally posted by King InuYasha View Post
              I assume you missed the GNOME developers saying they wanted this too? Or that the KDE folks would have benefited from this as well? Both big desktops have tooling that massively benefits from working real-time tracing and profiling.
              Oh, the GNOME devs are for this change? That settles the issue for me, it must be a bad idea.

              Comment


              • #17
                Originally posted by coder View Post
                Is this strictly an x86 thing? On AArch64, the performance impact of carrying a frame pointer should be negligible, due to double the general-purpose registers. I would like to see some benchmarks on that, but you'd obviously want to use a more recent core than something like the Pi's A72. Michael, do you have access to an Ampere Altra?
                It's enabled on AArch64 and POWER (ppc64le), it is not enabled on x86_64 and s390x.

                Comment


                • #18
                  Separate debug build. Don't tank release performance.

                  Comment


                  • #19
                    Originally posted by Hafting View Post
                    Finally, lots of software is not such a complete mystery. Lots of debugging happens without debugging options, a few extra printf() and you figure out what is wrong.
                    The point here is not general debugging, but profiling. This is (other than in the most extreme cases) not something you can do with printf.

                    Comment


                    • #20
                      Originally posted by archkde View Post

                      Yes, this approach is very much possible. In fact nearly every tool supports this. The only downside is that when using perf, the stack needs to be copied on every sample because Linux doesn't contain a DWARF parser.
                      Ah, I did not think of this. It indeed complicates thing, but I do not believe it to be impossible to solve. Here some ideas I came up with in a couple of minutes of thinking (so, likely it won't work):
                      • Construct (in advance, or JIT and cache) a look up table from instruction pointer address to frame address. This trades of memory for a quite possibly slow processing involving the DWARF debug info.
                      • This should solve finding the first stack frame in the chain. I have not examined how parent frames are found in the x86-64 ABI, but if that is also hard, you could store the relevant info on the stack. While this would still be more expensive than omitting it entirely, it should at least free up the register. I think it would have to be the caller who saves it, since the current stack frame is no longer in rbp, and only the caller can know the correct offset from rsp. (This obviously is an ABI breakage, as I believe RBP is currently callee-saved?)
                        • In fact, thinking about it, in this scheme, rbp can be written to stack once if the compiler only changes rsp on function entry (and uses moves relative to rsp rather than push and pops). The important thing it that rbp is saved at a fixed offset from child functions stack frames to allow it to be found. In practise this means that it should be saved at the lower end (as iirc the stack grows down) of the stack frame.
                        • EDIT: If all functions save the RBP at the bottom end of the stack frame all the time, you don't need a look up table based on the instruction pointer either. This does however preclude the use of alloca and similar and forces stack frames to be fixed size. I'm not sure this is a problem (except with alloca), as incrementing rsp is very cheap, and it should be fine to then just do everything relative to rsp. This effectively combines RSP/RBP into a single register.
                      Caveats: This is not my area of expertise and I have not done low level programming for several years, and I have never done ABI design.
                      Last edited by Vorpal; 01 December 2022, 05:17 PM. Reason: Clarify details.

                      Comment

                      Working...
                      X