Announcement

Collapse
No announcement yet.

Python Can Run Up To ~27% Faster On Fedora 32 With Optimization

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Weasel View Post
    This doesn't break only with LD_PRELOAD, lol. It breaks if the underlying implementation of the function ever changes, but not its interface (obviously).

    As an example, suppose the compiler decides to inline a specific implementation of 'free' but not 'malloc'. Later 'malloc' changes its implementation to use a more efficient data structure but keeping same interface, obviously. And you don't recompile your code.

    Your code will call the new malloc automatically since it called it before, but then it will use the old inlined 'free' with the old data structure and crash. It's a dynamic library, not a static library.

    It should only be safe with self-hosted libraries by a project or Python, which are always rebuilt only when the entire thing is rebuilt.
    This is incorrect. The optimization can only work if the compiler sees the implementation of free, obviously. Which means free (and so presumably malloc) must have been part of the code you compiled, not in an external library, and so you can never get into the situation you describe.

    What this prevents is overriding functions in the compiled library via ELF interposition.

    Comment


    • #12
      Originally posted by log0 View Post
      Python needs a jit compiler (even if just for popular archs like x64 and arm64). It would double the performance. Every friggin browser has one nowadays.
      Actually I’d rather see Python remain Python. There are better solutions coming online for people that need performance and a REPL. Think Swift. EvenRust has had various attempts at a REPL. The strength here is the idea that a language is designed from the ground up to support being compiled.

      Python never was designed to support being compiled and frankly was ever designed for fast execution. That doesn’t imply anything bad about Python, I like using it, but I don’t want to see it contorted into something it isn’t. The idiocy around the transition to Python 3 was bad enough, just imagine far more difficult code breakages.

      Comment


      • #13
        Originally posted by wizard69 View Post

        Actually I’d rather see Python remain Python. There are better solutions coming online for people that need performance and a REPL. Think Swift. EvenRust has had various attempts at a REPL. The strength here is the idea that a language is designed from the ground up to support being compiled.

        Python never was designed to support being compiled and frankly was ever designed for fast execution. That doesn’t imply anything bad about Python, I like using it, but I don’t want to see it contorted into something it isn’t. The idiocy around the transition to Python 3 was bad enough, just imagine far more difficult code breakages.
        Then use PyPy. It's implented in RPython which is essentially more or less Python. Regular Python is largely implemented in C. PyPy is in general a lot faster. It also supports embedding so you could probably run PyPy in a browser process.
        Last edited by nanonyme; 12 January 2020, 03:12 AM.

        Comment


        • #14
          I don't know about the performance increase stated within this article concerning that specific CFLAG, but I can say I definitely see a significant increase when compiling Gentoo's package manager with LTO optimizations! My only problem, I wasn't able to successfully compile the Pythong interpretor package with LTO optimizations.

          According to a recent past article here, Fedora is going to enable LTO optimizations by default supposedly very soon. I pressume this article is touching on LTO optimizations.

          Comment


          • #15
            Originally posted by nivedita View Post
            This is incorrect. The optimization can only work if the compiler sees the implementation of free, obviously. Which means free (and so presumably malloc) must have been part of the code you compiled, not in an external library, and so you can never get into the situation you describe.
            That's right, but it was just an example most people are familiar with. I've no idea what APIs python uses, so...

            Either way my point is, it's not just LD_PRELOAD, you won't be able to swap out any .so files when you build with this, or update them separately, even if they have the same ABI (major version).

            Another example is: hotpatching functions without LD_PRELOAD, like how Windows does. Wine, for example, probably won't work correctly. It builds tons of .so files (well, now also PE files if you have mingw installed), but the APIs themselves can be hot-patched by apps that "hook" them.

            So yeah, keep in mind it's not just LD_PRELOAD, it's everytime only part of the project/functions change or get updated, or hooked.
            Last edited by Weasel; 12 January 2020, 11:51 AM.

            Comment


            • #16
              Could Python be compiled down to WebAssembly?
              Could Python be fed into LLVM to make it fast?

              Comment


              • #17
                Originally posted by uid313 View Post
                Could Python be compiled down to WebAssembly?
                Could Python be fed into LLVM to make it fast?
                Experiments have been made with things like that (eg. the Shed Skin Python-to-C++ compiler) and PyPy's RPython but, fundamentally, Python doesn't give the computer enough information for ahead-of-time compilation. That's why the "R" in "RPython" stands for "Restricted Subset".

                For these sorts of things to be decidable at compile time, you need to give the computer more information. (That's also why languages with static types can be faster than languages where every variable has to be able to hold any type and the correct code path is decided at runtime.)

                JIT compilers like PyPy for Python or V8 for JavaScript speed things up by generating optimized code at runtime as they notice that you're doing the same thing again and again. (eg. a function that only ever gets called with the same data types)
                Last edited by ssokolow; 14 January 2020, 12:06 AM.

                Comment


                • #18
                  Originally posted by ssokolow View Post

                  Experiments have been made with things like that (eg. the Shed Skin Python-to-C++ compiler) and PyPy's RPython but, fundamentally, Python doesn't give the computer enough information for ahead-of-time compilation. That's why the "R" in "RPython" stands for "Restricted Subset".

                  For these sorts of things to be decidable at compile time, you need to give the computer more information. (That's also why languages with static types can be faster than languages where every variable has to be able to hold any type and the correct code path is decided at runtime.)

                  JIT compilers like PyPy for Python or V8 for JavaScript speed things up by generating optimized code at runtime as they notice that you're doing the same thing again and again. (eg. a function that only ever gets called with the same data types)
                  I always have thought of using PyPy, but after seeing that it is a Python-to-Python compiler, I was no longer sure if it would be a drop-in replacement to CPython...

                  Comment


                  • #19
                    Originally posted by tildearrow View Post

                    I always have thought of using PyPy, but after seeing that it is a Python-to-Python compiler, I was no longer sure if it would be a drop-in replacement to CPython...
                    PyPy is composed of two parts:
                    1. PyPy, a runtime for standard Python with a JIT compiler, written in a restricted subset of Python called RPython.
                    2. The RPython translator, which compiles RPython to native binaries
                    They've worked hard to make it a drop-in replacement for CPython.

                    The main downside to PyPy is that it has trouble matching CPython performance for bindings implemented from the C side using libcpython, rather than from the Python side using ctypes or the superior successor the PyPy guys wrote (which does also work for CPython) named cffi. (And, since rust-cpython is the only option for writing Rust-Python bindings for stable-channel Rust without having to trust my ability to write memory-safe C API stuff, I'm stuck writing libcpython-based bindings.)

                    Comment


                    • #20
                      Fun fact: Archlinux decided to follow same steps, and has rebuilt Python with -fno-semantic-interposition. 71MB less the size on update from 3.8.1-1 to 3.8.1-4, I presume due to this change.

                      Comment

                      Working...
                      X