Announcement

Collapse
No announcement yet.

OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

    Phoronix: OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

    For those wondering how say AVX heavy a particular program is being benchmarked or if a given program/benchmark supports making use of new instruction set extensions such as Vector AES or forthcoming AVX VNNI or AMX, the Phoronix Test Suite and OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions...

    http://www.phoronix.com/scan.php?pag...truction-Usage

  • #2
    Originally posted by phoronix View Post
    Phoronix: OpenBenchmarking.org / PTS Adds Automated Per-Test Analysis Of CPU Instruction Set Usage

    For those wondering how say AVX heavy a particular program is being benchmarked or if a given program/benchmark supports making use of new instruction set extensions such as Vector AES or forthcoming AVX VNNI or AMX, the Phoronix Test Suite and OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions...

    http://www.phoronix.com/scan.php?pag...truction-Usage
    In my opinion, this is a very nice idea, but it needs to further evolve in order to improve accuracy of how the detected SSE/AVX instructions are related to performance of a particular benchmark. For example, using ADDPD at runtime is most likely better than using ADDPS, but the c-ray data (https://openbenchmarking.org/test/pts/c-ray) is currently enable to show what percentage of additions executed by c-ray are ADDPD.

    Comment


    • #3
      Originally posted by atomsymbol View Post

      In my opinion, this is a very nice idea, but it needs to further evolve in order to improve accuracy of how the detected SSE/AVX instructions are related to performance of a particular benchmark. For example, using ADDPD at runtime is most likely better than using ADDPS, but the c-ray data (https://openbenchmarking.org/test/pts/c-ray) is currently enable to show what percentage of additions executed by c-ray are ADDPD.
      With the prior implementation from 9 years ago I was keeping track of instruction count but with the new code it's not being exposed on Openbenchmarking. If there is enough interest or when finding the time to improve that aspect some more, will consider it, just don't want information overload.
      Michael Larabel
      http://www.michaellarabel.com/

      Comment


      • #4
        Wow! I would expect video encoding/decoding to show up on such a metric, but I haven't seen a quantification of this before.

        Comment


        • #5
          Originally posted by Michael View Post
          If there is enough interest or when finding the time to improve that aspect some more, will consider it, just don't want information overload.
          It's a nice addition, but sadly doesn't give one a sense of how instrumental the instructions are to the overall performance picture. I mean, a mere support library that's not used in any performance hotspots could be the only place using certain instructions, leading to a false impression. It would be ideal to know the frequency with which certain instructions are actually getting used.

          Anyway, it's better than we had before, so thanks for that.

          Also, I'm finding the CPU core-scaling metric a bit confusing. Is it aware of whether the test measures time-to-complete or ops/sec? Because they should scale inversely. Also, is the Y-axis of that graph the raw score, or the score per-core?
          Last edited by coder; 31 January 2021, 05:46 PM.

          Comment


          • #6
            Originally posted by phoronix View Post
            For those wondering ... OpenBenchmarking.org can now provide that insight on a per-test basis with common CPU instruction set extensions...
            Equally interesting would be an analysis of which security mitigation instructions are being used, and perhaps retpoline detection, to determine whether these benchmarks were compiled with/without mitigations. ie when we see OpenSuSE and Debian beating Clear - sometimes this is due to a newer compiler's performance regression, but I suspect this is sometimes because Clear has compiled with mitigations where the others have not, and sometimes the regression may simply be because the newer compiler supports more mitigations. You could be the first to expose these details. ;-)
            Last edited by linuxgeex; 31 January 2021, 07:09 PM.

            Comment


            • #7
              Originally posted by linuxgeex View Post
              Equally interesting would be an analysis of which security mitigation instructions are being used, and perhaps retpoline detection, to determine whether these benchmarks were compiled with/without mitigations.
              It's a good point, but AFAIK most mitigations do not simply boil down to a single instruction. You'd at least need to search for certain sequences and possibly even have to do some graph analysis.

              Comment


              • #8
                Michael, take a look at https://github.com/RRZE-HPC/likwid . I was doing this kind of avx and fp instruction profiling already back in 2012 or so. Very interesting stuff, gives you a better hands-on understanding of the roofline performance model applied to different applications.

                Comment


                • #9
                  phoronix any comment on this?
                  https://www.phoronix.com/forums/foru...ative-projects

                  We are not about censorship here, are we?

                  Comment


                  • #10
                    Originally posted by coder View Post
                    It's a good point, but AFAIK most mitigations do not simply boil down to a single instruction. You'd at least need to search for certain sequences and possibly even have to do some graph analysis.
                    Kernel-side you're right, but userspace-side, as far as I'm aware, only cache/speculation-control instructions and retpolines are in use. If you have evidence to the contrary, I'd be very interested to read about it.

                    Comment

                    Working...
                    X