If what you want is to add some fast equivalent of eval just embedding LuaJIT in your native, static application will probably result in far better performance than any interpreter calling eval (even if you call compile first).
Announcement
Collapse
No announcement yet.
Python 3.11 Performance Benchmarks Show Huge Improvement
Collapse
X
-
Originally posted by atomsymbol
There exists:
Code:$ python >>> help(compile)
- Likes 2
Comment
-
Originally posted by atomsymbol
​
After reading the following example, I hope that it will be clear what I mean by generating (and then using) Python code at run-time:
Code:$ python >>> a=1 >>> code=compile('a+2', '<generated-code>', 'eval') >>> eval(code) 3 >>> a=-10 >>> eval(code) -8 >>> print(code) <code object <module> at 0x7fa164ab1c60, file "<generated-code>", line 1> >>> print(code.co_code) b'e\x00d\x00\x17\x00S\x00' >>> import dis >>> dis.dis(code.co_code) 0 LOAD_NAME 0 (0) 2 LOAD_CONST 0 (0) 4 BINARY_ADD 6 RETURN_VALUE >>> print(code.co_consts) (2,)​
Instead, it is an internal representation of python bitcode, which in turn has to be interpreted by the CPython implementation.
And the biggest problem of CPython is not about interpreting, but rather the GIL which makes it impossible to benefit from multithreading in python except for performing I/O or calling external FFI functions that release GIL.
That makes python code hard to scale.
Even if you have 32 cores, your python code would execute in single thread.
Creating multiple threads would not only speed it up, but slow it down due to presence of GIL...
- Likes 2
Comment
-
Originally posted by NobodyXu View PostAnd the biggest problem of CPython is not about interpreting, but rather the GIL which makes it impossible to benefit from multithreading in python except for performing I/O or calling external FFI functions that release GIL.
There are many places to look at for causes because high level languages hide a lot of complexity below them (that's why we love them!). Lots of allocations, reference counting, dictionary accesses for most fields, in some cases implicit dictionary creation when passing arguments, etc...
Removing the GIL would (mostly) fix the parallelization problem, but the serial speed is bad and can only get worse with the GIL removal alone.
- Likes 1
Comment
-
Originally posted by atomsymbol
Please re-read my 1st post in this forum thread. When I wrote "Dynamic programming languages", I meant dynamic programming languages. If I was to mean JIT then I would have written "JIT". I mentioned Java because there exists software which is generating Java bytecode at run-time (see for example https://asm.ow2.io/ and articles on Google Scholar).
About the original post, what you talking about requires JIT.
Without JIT, there will be no performance benefit.
While python can generate code at runtime and compile it to python's internal bitcode, there's currently no JIT so it cannot run faster than AOT language.
- Likes 1
Comment
-
Originally posted by sinepgib View Post
Note CPython's serial performance is also quite bad (compared to other languages, usage dictates whether it is good enough), so it is not just the GIL.
There are many places to look at for causes because high level languages hide a lot of complexity below them (that's why we love them!). Lots of allocations, reference counting, dictionary accesses for most fields, in some cases implicit dictionary creation when passing arguments, etc...
Removing the GIL would (mostly) fix the parallelization problem, but the serial speed is bad and can only get worse with the GIL removal alone.
Sometimes this might be good enough.
And yes, the GIL simplifies the python interpreter, supports multithreading without hurting single-thread speed.
IMO removing it probably requires JIT or some new language constructs.
- Likes 1
Comment
-
Originally posted by atomsymbol
The bottleneck is that CPython isn't analysing/tracing the object graph while the Python program has multiple threads.
It means only one thread can interpret and run python's internal bitcode at one time, which is essentially a global mutex and execute the code on a single CPU but run different thread in turn to simulate multi-cpus.
To make it even worse, this is built on the fact the OS-scheduled thread, so even less efficient.
- Likes 1
Comment
-
Originally posted by atomsymbol
It is obvious that there do exist cases in which specialized interpreted code runs faster than AOT-compiled code. (I am only claiming that such cases do exist - not claiming how many such cases there are. Computing whether specialized interpreted code would run faster than AOT is impossible if the specialization is using domain-specific knowledge.)
After all, handling of specialized cases need to present in the code when you write them.
- Likes 1
Comment
-
Originally posted by atomsymbol
GIL's bottleneck isn't in code/bytecode, because Python bytecode (similarly to binary code stored in CPU's L1I cache) is mostly immutable. GIL's bottleneck is in data (Python objects).
https://en.wikipedia.org/wiki/Component_(graph_theory)
External ffi code can certainly release GIL and uses any thread without contention, but for python code, they can only access a single thread at any time.
- Likes 1
Comment
Comment