Announcement

Collapse
No announcement yet.

Red Hat Developing "eu-stacktrace" For Profiling Without Frame Pointers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    So what is new about it? Getting stacktraces by eh frames is nothing new. And it worked never reliable for me.

    Comment


    • #12
      Originally posted by Kjell View Post

      Correct, my point remains the same:
      Debug flags don't belong in production binaries
      Until ofc the application breaks and the problem is not easily reproducible so you need the trace of that specific run.

      Comment


      • #13
        Originally posted by illwieckz View Post
        The Orbit profiler already knows how to profile without frame pointers, and this tool is truly awesome:

        C/C++ Performance Profiler. Contribute to google/orbit development by creating an account on GitHub.


        Though having more tools able to do this would be good.
        not-perf saved my life once I had to debug C application without frame pointers which was seeminlgy stuck in loop (turns out it wasn't infinite)
        It approximates perf CLI interface. Also to get flamegraph you don't need to clone yet another repo, you just type
        Code:
        nperf record -l 60 -o perf.data -p <PID>
        nperf flamegraph perf.data -o perf.svg
        Great tool, also easy to build.
        A sampling CPU profiler for Linux. Contribute to koute/not-perf development by creating an account on GitHub.

        Comment


        • #14
          Using frame pointer when not debugging, looks like the pinnacle of stupidity to me. It is not only the use few extra instructions per function call (which by itself is non-trivial), but the fact that it binds a register for that. This can have catastrophic performance effects for architectures that lack registers (like x86). It looks like Fedora, Ubuntu and arch all screwed up.

          Comment


          • #15
            Originally posted by marios View Post
            Using frame pointer when not debugging, looks like the pinnacle of stupidity to me. It is not only the use few extra instructions per function call (which by itself is non-trivial), but the fact that it binds a register for that. This can have catastrophic performance effects for architectures that lack registers (like x86). It looks like Fedora, Ubuntu and arch all screwed up.
            At least Fedora had some toolchain devs with enough brain cells that were vehemently against enabling frame pointers, which is why it had long discussions before the final decision to enable them. It seems Arch and Ubuntu just went ahead and enabled them without discussing/debating the decision. I'm currently on OpenSUSE and I'm so glad it hadn't enabled them (and hopefully will never). I also hope this eu-stacktrace thing will succeed so all the distros that currently enable frame pointers will disable them in the future.

            Comment


            • #16
              Originally posted by marios View Post
              Using frame pointer when not debugging, looks like the pinnacle of stupidity to me. It is not only the use few extra instructions per function call (which by itself is non-trivial), but the fact that it binds a register for that. This can have catastrophic performance effects for architectures that lack registers (like x86). It looks like Fedora, Ubuntu and arch all screwed up.
              Why you think frame pointers are only useful for debugging? It is useful to get reliable a stacktrace. It can be even useful for exceptions.

              Comment


              • #17
                Originally posted by patrick1946 View Post

                Why you think frame pointers are only useful for debugging? It is useful to get reliable a stacktrace. It can be even useful for exceptions.
                Stacktraces are mostly used for debugging. In the rare cases that they are needed, the can be enabled only when compiling those specific codes.

                Comment


                • #18
                  Originally posted by marios View Post
                  Using frame pointer when not debugging, looks like the pinnacle of stupidity to me. It is not only the use few extra instructions per function call (which by itself is non-trivial), but the fact that it binds a register for that. This can have catastrophic performance effects for architectures that lack registers (like x86). It looks like Fedora, Ubuntu and arch all screwed up.
                  There is a reason.

                  The Capture Options dialog lets you toggle between two methods of callstack unwinding: DWARF, the default method, and Frame pointers. The Frame pointers method has much less overhead and produces the same results as the DWARF method.​
                  Note the worse much less overhead.

                  And this redhat work.
                  It is important to note that, unlike with overhead due to profiling, slowdown due to frame pointers occurs regardless of whether a particular system is being profiled or will ever need to be profiled. Thus, approximately 1% overhead with eu-stacktrace only during profiling is a reasonable tradeoff over 0…2% overhead for frame pointer inclusion on every system, all of the time. The overhead could be further reduced by making eu-stacktrace accessible via a library API rather than a fifo, at the cost of requiring more complex modifications to the profiling tools that use it.​
                  Do note the framepointer cost is between 0 to 2 percent where the other method of profiling without framepointers end up with a constant overhead while profiling. There is a catch profile guide optimization can gain more performance that the cost of framepointers.

                  There is a catch here that 1% cost of eu-stacktrace is it done today on current CPU with more registers todo the "stack unwinding algorithms" using DWARF data than using frame-pointers would cost to get the profile information required for profile guide optimizations to make the final binaries.

                  When you wrote x86 I guess you mean 32 bit x86. Using stack pointers on 32 bit x86 with limited registers in fact makes more sense than doing what google orbit or eu-stacktrace are doing to get the profile data that you need for profile guide optimization..

                  Yes when you dig a real benchmark out the Redhat work.

                  To give an initial idea of the CPU overhead of eu-stacktrace unwinding compared to Sysprof’s default mode of operation, I used Sysprof with and without eu-stacktrace to profile a system that was running the stress-ng "matrix" benchmark, invoked with stress-ng --matrix 0 -t 30s. On a system that was otherwise lightly loaded, using Sysprof with the default frame pointer profiling resulted in 0.09% of the samples coming from the sysprof-cli profiler process, while profiling with eu-stacktrace resulted in 1.18% of the samples coming from sysprof-cli and eu-stacktrace.

                  The overhead of the elfutils unwinder scales with the number of distinct processes for which eh_frame data needs to be processed, rather than with the number of samples. After launching several desktop applications and re-running the benchmark, the profiling overhead rose to 1.39% of the total samples.
                  Yes framepointer while profiling has a 0.09 cpu load cost where the dwarf method(the method redhat developer is classing as new) goes up to a 1.18 cost. This gets worse go to lower performing CPUs for the dwarf method.

                  marios basically today it makes sense to take the dwarf method without framepointers CPU processing cost for profiling. 20 years ago it did not make sense to use the dwarf method for profiling because cost in binary building was too high.

                  I remember times when profiling with framepointers had a 5 to 10 percent CPU cost with the 13 times worse overhead of the dwarf method this was simple that much load you could not run applications if you attempted to profile that way.

                  Framepointer profiling makes sense on systems without the CPU power where this form of profiling costs a few percentage points of overhead and yes you have to live with the constant 0 to 2 percent overhead of having frame-pointers. . Dwarf based profiling using .eh_frame​ information makes sense on modern systems with the CPU power that the profiling is not consuming that much CPU time that it makes the profiling results incorrectly.

                  marios the trap here is you take framepointer profiling for profile guide optimization makes sense low performance systems. Yes you take the 0 to 2% all the time overhead on low performance systems so that you can in fact profile and use profile guide optimization that can gain you between 0 to 15% performance.. Because on those low end system running profiling application for framepointer will be consuming 5 to 10 percent of your cpu time. If your framepointer profiling is consuming 5 to 10 percent dwarf profiling will be consuming 65% to over 100% of the CPU processing power resulting in application being unable to run correctly while being profiled.

                  Yes older 32 bit x86 cpus using this dwarf method of profiling redhat just did is non workable option because when attempting to profile will not work. Yes you need profile to pass back into the compiler for profile guided optimizations to make the compiler set out the code paths close to ideal.

                  Yes 13 times more CPU load using the eu-stacktrace/dward method when profiling compared to stackpointer based profiling is not a small difference. Yes when you profiling using stackpointer is 0.09 percent CPU load this not a problem but when you profiling using stackpointer due to being on a low performance CPU being 5 to 10% this 13 times worse CPU load it total disaster of not being able to profile the application.

                  Dwarf .eh_frame​ when it used always has a high CPU cost. Basically Dwarf ,eh_frame profiling and stackpointer profiling both will have their places and it will directly be dependent on how powerful of a CPU you are working on for what the profiling CPU cost percentage is. If you can get away with Dwarf .eh_frame profiling the resulting binary will gain between 0 to 2 percent when in normal usage compared to having frame-pointers and profiling. Yes 0% gain is a possible outcome for vastly higher CPU usage while profiling..

                  Comment


                  • #19
                    Originally posted by marios View Post

                    Stacktraces are mostly used for debugging. In the rare cases that they are needed, the can be enabled only when compiling those specific codes.
                    Stacktraces are really useful for telemetry and crash dumper. If something wentbwrong, it is really useful to know the stacktrace. And that is very often only happening for the customer.

                    And you cannot only enable it for specific code, because you need it for all libraries.

                    Comment


                    • #20
                      Originally posted by oiaohm View Post

                      There is a reason.


                      Note the worse much less overhead.

                      And this redhat work.


                      Do note the framepointer cost is between 0 to 2 percent where the other method of profiling without framepointers end up with a constant overhead while profiling. There is a catch profile guide optimization can gain more performance that the cost of framepointers.

                      There is a catch here that 1% cost of eu-stacktrace is it done today on current CPU with more registers todo the "stack unwinding algorithms" using DWARF data than using frame-pointers would cost to get the profile information required for profile guide optimizations to make the final binaries.

                      When you wrote x86 I guess you mean 32 bit x86. Using stack pointers on 32 bit x86 with limited registers in fact makes more sense than doing what google orbit or eu-stacktrace are doing to get the profile data that you need for profile guide optimization..

                      Yes when you dig a real benchmark out the Redhat work.



                      Yes framepointer while profiling has a 0.09 cpu load cost where the dwarf method(the method redhat developer is classing as new) goes up to a 1.18 cost. This gets worse go to lower performing CPUs for the dwarf method.

                      marios basically today it makes sense to take the dwarf method without framepointers CPU processing cost for profiling. 20 years ago it did not make sense to use the dwarf method for profiling because cost in binary building was too high.

                      I remember times when profiling with framepointers had a 5 to 10 percent CPU cost with the 13 times worse overhead of the dwarf method this was simple that much load you could not run applications if you attempted to profile that way.

                      Framepointer profiling makes sense on systems without the CPU power where this form of profiling costs a few percentage points of overhead and yes you have to live with the constant 0 to 2 percent overhead of having frame-pointers. . Dwarf based profiling using .eh_frame​ information makes sense on modern systems with the CPU power that the profiling is not consuming that much CPU time that it makes the profiling results incorrectly.

                      marios the trap here is you take framepointer profiling for profile guide optimization makes sense low performance systems. Yes you take the 0 to 2% all the time overhead on low performance systems so that you can in fact profile and use profile guide optimization that can gain you between 0 to 15% performance.. Because on those low end system running profiling application for framepointer will be consuming 5 to 10 percent of your cpu time. If your framepointer profiling is consuming 5 to 10 percent dwarf profiling will be consuming 65% to over 100% of the CPU processing power resulting in application being unable to run correctly while being profiled.

                      Yes older 32 bit x86 cpus using this dwarf method of profiling redhat just did is non workable option because when attempting to profile will not work. Yes you need profile to pass back into the compiler for profile guided optimizations to make the compiler set out the code paths close to ideal.

                      Yes 13 times more CPU load using the eu-stacktrace/dward method when profiling compared to stackpointer based profiling is not a small difference. Yes when you profiling using stackpointer is 0.09 percent CPU load this not a problem but when you profiling using stackpointer due to being on a low performance CPU being 5 to 10% this 13 times worse CPU load it total disaster of not being able to profile the application.

                      Dwarf .eh_frame​ when it used always has a high CPU cost. Basically Dwarf ,eh_frame profiling and stackpointer profiling both will have their places and it will directly be dependent on how powerful of a CPU you are working on for what the profiling CPU cost percentage is. If you can get away with Dwarf .eh_frame profiling the resulting binary will gain between 0 to 2 percent when in normal usage compared to having frame-pointers and profiling. Yes 0% gain is a possible outcome for vastly higher CPU usage while profiling..

                      If you have a faster CPU you very often need more traces. If you increase the resolution you get a much better pictures of what is going on. So I am not really sold on the eh frames method.

                      Comment

                      Working...
                      X