Announcement

Collapse
No announcement yet.

Clear Linux Now Riding On Linux 4.8.1, Ships AVX2-Optimized Python

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Serafean View Post
    How exactly are they enabling AVX instructions? Recompiling with -mavx(2)? or patching the code?
    So we compile python module's .so files twice, once with default (distro) flags, and once with -mavx2 (as well as turning on the vectorizor etc).
    So there are two files for each shared library (a .so and a .so.avx2) and at runtime the python interpreter checks if avx2 is supported, and if it is, it'll pick the .so.avx2 over the .so file, but if avx2 is not supported, the .so file is used (they come from the same source files of the python module so they're identical in functionality, just not in compiler flags).

    Most python module .so files are for python code thunking to performance sensitive code, usually math stuff, which is where avx2 can actually make a difference.

    Comment


    • #22
      Originally posted by arjan_intel View Post

      So we compile python module's .so files twice, once with default (distro) flags, and once with -mavx2 (as well as turning on the vectorizor etc).
      So there are two files for each shared library (a .so and a .so.avx2) and at runtime the python interpreter checks if avx2 is supported, and if it is, it'll pick the .so.avx2 over the .so file, but if avx2 is not supported, the .so file is used (they come from the same source files of the python module so they're identical in functionality, just not in compiler flags).

      Most python module .so files are for python code thunking to performance sensitive code, usually math stuff, which is where avx2 can actually make a difference.
      Thank you!
      Pretty much what I expected I guess I'm already covered with Gentoo using -march=native...

      Comment


      • #23
        Originally posted by Serafean View Post

        Thank you!
        Pretty much what I expected I guess I'm already covered with Gentoo using -march=native...
        make sure python modules as they are built use your flags as well; python modules are build... funky.

        Comment


        • #24
          Originally posted by uid313 View Post
          Does Ubuntu 16.10 have Linux 4.8.0, 4.8.1 or 4.8.2?

          It's name is something like 4.8.0-22.24. Not sure exactly how to interpret that.
          Means a base of 4.8.0 plus backported patches. You'd have to look at the package sources directly to figure out what got backported and what didn't.
          All opinions are my own not those of my employer if you know who they are.

          Comment


          • #25
            Originally posted by Tomin View Post

            Oh, you mean Zen. At first I thought you were actually talking about Xen and I could not understand much.

            I hope AMD will use Zen instead of FX in their marketing for Zen processors. That would sound much better, because FX is associated with something slow nowadays in the minds of computer builders (well, that's what I think anyway).
            I fixed it. That was a silly mistake. I guess in my head I would pronounce Xen and Zen the same and I confused it with the hypervisor.

            Comment


            • #26
              But how does it pick up the right version? Where is dispatching logic?

              Dispatching logic for features other than mmx and sse2 have been missing for some times in distros. If we could get that, it could be used more widely by more projects and distros.

              Comment


              • #27
                Originally posted by carewolf View Post
                But how does it pick up the right version? Where is dispatching logic?

                Dispatching logic for features other than mmx and sse2 have been missing for some times in distros. If we could get that, it could be used more widely by more projects and distros.
                Ahh: https://github.com/clearlinux-pkgs/g...like-tls.patch

                Though for general practicality we should probably also have a SSE4.1 version that is similar to the base optimization clear linux does

                Comment


                • #28
                  Originally posted by zboson View Post

                  As long as Clear Linux compiles everything with GCC then AMD probably does not have to worry with regards to Zen. But the Intel C/C++ compiler is known for creating a CPU dispatcher which checks for a genuine Intel tag in CPUID and using crippled code otherwise. AMD also has used XOP and FMA4 in some of their libraries which Intel does not use. That's not nearly as bad as vetoing based on checking if the processor in Intel or AMD rather than checking if the instructions are supported.

                  Zen from what I hear won't support XOP and maybe not FMA4. XOP is actually a good instruction set and FM4 makes more sense than FMA3 (which has to have several variants because it does not have a forth operator). It's embarrassing that Intel's instruction set up to AVX2 still does not have an unsigned 64-bit SIMD compare operator (unlike XOP) though AVX512 (if we ever get it) will.

                  But even without XOP and FMA4 there are still reasons you would want to compile different binaries for Intel and AMD. The Bulldozer architecture has several issues with AVX which means you need to treat AVX code for AMD and Intel differently for optimization. So there is a good reason an AMD optimized Linux would be good for the Bulldozer set.

                  In short as long as AMD fixes its problems with AVX in Zen and Clear Linux uses GCC I think there would be no reason for an AMD version of Linux for Zen (assuming Zen does not introduce any newer and better x86 instructions).

                  BTW there are some that suspect that the reason we don't have AVX512 now is because Intel has done so much damage to AMD (and AMD made some stupid decisions with the Bulldozer micro-architecture) that they are laying low waiting for AMD to improve with Zen.
                  Per Wikipedia:

                  AMD explicitly revealed that Zen, its 3rd-generation x86-64 architecture in its first iteration (znver1 – Zen, version 1); would drop support for FMA4 in a patch to the GNU Binutils package.[13] There has been initial confusion regarding whether FMA4 was implemented or not due to errata in the initial patch that has since then been rectified.

                  In March 2015, AMD explicitly revealed in the description of the patch for the GNU Binutils package that Zen, its third-generation x86-64 architecture in its first iteration (znver1 – Zen, version 1), will not support TBM, FMA4, XOP and LWP instructions developed specifically for the "Bulldozer" family of micro-architectures


                  Why did they do this?

                  All SSE5 instructions that were equivalent or similar to instructions in the AVX and FMA4 instruction sets announced by Intel have been changed to use the coding proposed by Intel. Integer instructions without equivalents in AVX were classified as the XOP extension.[1] The XOP instructions have an opcode byte 8F (hexadecimal), but otherwise almost identical coding scheme as AVX with the 3-byte VEX prefix.

                  Intel initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, Intel canceled FMA4 support and reverted to FMA3 in the AVX/FMA specification version 5


                  Geez, no wonder devs are scratching their heads and vendors have deer eyes.

                  Comment


                  • #29
                    Originally posted by starshipeleven View Post
                    Let's not confuse bdsm with satanism plz.
                    BSD, BDSM, Satanism, Ubuntu...

                    Aren't they the same in essence?

                    Comment


                    • #30
                      Originally posted by carewolf View Post
                      Ahh: https://github.com/clearlinux-pkgs/g...like-tls.patch

                      Though for general practicality we should probably also have a SSE4.1 version that is similar to the base optimization clear linux does
                      That's the generic glibc one for /usr/ib64/avx2; for python we had to do something in python itself:

                      Comment

                      Working...
                      X