Announcement

Collapse
No announcement yet.

OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

    Phoronix: OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

    For those wondering how say AVX heavy a particular program is being benchmarked or if a given program/benchmark supports making use of new instruction set extensions such as Vector AES or forthcoming AVX VNNI or AMX, the Phoronix Test Suite and OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Originally posted by atomsymbol

    In my opinion, this is a very nice idea, but it needs to further evolve in order to improve accuracy of how the detected SSE/AVX instructions are related to performance of a particular benchmark. For example, using ADDPD at runtime is most likely better than using ADDPS, but the c-ray data (https://openbenchmarking.org/test/pts/c-ray) is currently enable to show what percentage of additions executed by c-ray are ADDPD.
    With the prior implementation from 9 years ago I was keeping track of instruction count but with the new code it's not being exposed on Openbenchmarking. If there is enough interest or when finding the time to improve that aspect some more, will consider it, just don't want information overload.
    Michael Larabel
    https://www.michaellarabel.com/

    Comment


    • #3
      Wow! I would expect video encoding/decoding to show up on such a metric, but I haven't seen a quantification of this before.

      Comment


      • #4
        Originally posted by Michael View Post
        If there is enough interest or when finding the time to improve that aspect some more, will consider it, just don't want information overload.
        It's a nice addition, but sadly doesn't give one a sense of how instrumental the instructions are to the overall performance picture. I mean, a mere support library that's not used in any performance hotspots could be the only place using certain instructions, leading to a false impression. It would be ideal to know the frequency with which certain instructions are actually getting used.

        Anyway, it's better than we had before, so thanks for that.

        Also, I'm finding the CPU core-scaling metric a bit confusing. Is it aware of whether the test measures time-to-complete or ops/sec? Because they should scale inversely. Also, is the Y-axis of that graph the raw score, or the score per-core?
        Last edited by coder; 31 January 2021, 05:46 PM.

        Comment


        • #5
          Originally posted by phoronix View Post
          For those wondering ... OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions...
          Equally interesting would be an analysis of which security mitigation instructions are being used, and perhaps retpoline detection, to determine whether these benchmarks were compiled with/without mitigations. ie when we see OpenSuSE and Debian beating Clear - sometimes this is due to a newer compiler's performance regression, but I suspect this is sometimes because Clear has compiled with mitigations where the others have not, and sometimes the regression may simply be because the newer compiler supports more mitigations. You could be the first to expose these details. ;-)
          Last edited by linuxgeex; 31 January 2021, 07:09 PM.

          Comment


          • #6
            Originally posted by linuxgeex View Post
            Equally interesting would be an analysis of which security mitigation instructions are being used, and perhaps retpoline detection, to determine whether these benchmarks were compiled with/without mitigations.
            It's a good point, but AFAIK most mitigations do not simply boil down to a single instruction. You'd at least need to search for certain sequences and possibly even have to do some graph analysis.

            Comment


            • #7
              Michael, take a look at https://github.com/RRZE-HPC/likwid . I was doing this kind of avx and fp instruction profiling already back in 2012 or so. Very interesting stuff, gives you a better hands-on understanding of the roofline performance model applied to different applications.

              Comment


              • #8
                phoronix any comment on this?
                Original post: https://www.phoronix.com/forums/foru...62#post1235262 (https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1235232-kde-ends-out-january-with-a-lot-of-fixes-for-plasma-5-21?p=1235262#post1235262) Source: https://www.jwz.org/blog/2021/01/i-told-you-so-2021-edition/ Why hasn't this been


                We are not about censorship here, are we?

                Comment


                • #9
                  Originally posted by coder View Post
                  It's a good point, but AFAIK most mitigations do not simply boil down to a single instruction. You'd at least need to search for certain sequences and possibly even have to do some graph analysis.
                  Kernel-side you're right, but userspace-side, as far as I'm aware, only cache/speculation-control instructions and retpolines are in use. If you have evidence to the contrary, I'd be very interested to read about it.

                  Comment


                  • #10
                    Originally posted by pegasus View Post
                    Michael, take a look at https://github.com/RRZE-HPC/likwid . I was doing this kind of avx and fp instruction profiling already back in 2012 or so. Very interesting stuff, gives you a better hands-on understanding of the roofline performance model applied to different applications.
                    I also wrote a static analysis tool for binaries and libraries, https://github.com/baryluk/elf-opcode-stats , that can give you a sense of what instructions and registers are used in general (not how many times each is actually executed ). it is very crude, but I do find it quite useful when compiling complex projects. It probably doesn't work on ARM, but I hope to fix that soon.

                    Comment

                    Working...
                    X