Announcement

**jacob** · 06 February 2019, 06:22 PM

Originally posted by Michael_S View Post

I thought the default Python implementation uses reference counting? That's not really dynamic garbage collecting, but it is very slow.

Correction: reference counting is not slow at all, that's precisely the reason why it's often preferred to a full blown GC. I don't know what CPython uses internally but somehow I doubt it's pure reference counting, firstly because the well known limitation of reference counting is that it can't handle reference cycles and to my knowledge, Python does not suffer from that problem. And also because a well implemented reference counting memory manager can use atomic instructions to increment/decrement the counters and shouldn't require thread locking, much less a GIL.

**Michael_S** · 06 February 2019, 08:40 PM

Originally posted by jacob View Post

Correction: reference counting is not slow at all, that's precisely the reason why it's often preferred to a full blown GC. I don't know what CPython uses internally but somehow I doubt it's pure reference counting, firstly because the well known limitation of reference counting is that it can't handle reference cycles and to my knowledge, Python does not suffer from that problem. And also because a well implemented reference counting memory manager can use atomic instructions to increment/decrement the counters and shouldn't require thread locking, much less a GIL.

I'm not an expert on this stuff, so I may be mixing things up a lot. I'm enjoying the discussion and appreciate everyone's inputs. This is my understanding:

As ssokolow states further up, regular Python actually uses a mix of reference counting plus a tracing GC to detect cycles.

As Guest also stated further up, "optimized reference counting isn't necessarily slow because an advanced compiler can sometimes elide refcount updates."

But since Python is a scripting language that isn't precompiled into executables, so all it can apply is simple reference counting, and that's inescapably slow.

So maybe we have, very broadly, a three level speed hierarchy? Simple reference counting as in Python is generally slower than in tracing garbage collection as in the JVM or .NET, which in turn are generally slower than optimized reference counting as in Objective-C or Swift. Or I suppose you could call manual memory management in C, C++, or the compiler-enforced memory management in Rust a form of "optimized reference counting".

**ssokolow** · 07 February 2019, 12:07 AM

Originally posted by Michael_S View Post

So maybe we have, very broadly, a three level speed hierarchy? Simple reference counting as in Python is generally slower than in tracing garbage collection as in the JVM or .NET, which in turn are generally slower than optimized reference counting as in Objective-C or Swift. Or I suppose you could call manual memory management in C, C++, or the compiler-enforced memory management in Rust a form of "optimized reference counting".

It's not so much the refcounting that makes Python slow, it's that the language is so dynamic that you're very limited in what optimizations you can apply (thought it is useful for runtime metaprogramming and exploratory programming), which is made worse by CPython not having a JIT compiler.

Languages like Java, C#, and Swift can be faster because they've got JIT compilers, static type systems, and they don't allow you to rewrite pretty much anything about the program on the fly as it runs. Python has to dispatch everything dynamically and needs the GIL to ensure that one thread doesn't pull the rug out from under another by doing something like redefining a function.

Refcounting vs. tracing GC actually isn't a clear-cut thing when all else is equal. Reference counts are individually much lighter than a GC pass, but a full-blown tracing GC is often much faster overall because it amortizes the cost of memory management. (ie. The longer you go between GC collection passes, the better the performance but the worse the peak memory consumption.)

**ssokolow** · 07 February 2019, 12:15 AM

Originally posted by atomsymbol

Just a note:

In case of emulator-like programs, such as the RustPython interpreter written in Rust, reachability analysis on the implementation of RustPython when Rust is compiling the implementation is impossible because of the halting problem. Once a program written in Rust allows general-purpose computation to be input as data to the program (= bytecode input to interpreters, emulators), compile-time analysis is impossible because it depends on the behavior of the unknown universal input.

Makes sense. That's why Rust's borrow checker imposes the limitations it does and, for certain types of systems, you have to fall back to runtime reference counting using the Rc<T> and Arc<T> types.

If you can impose constraints on the programmers, you can prove some things in a restricted problem space.

That's also why a borrow checker is the kind of thing that had to prove its worth in a new language, rather than by getting retrofitted onto an existing language. It requires code to be written with its limitations in mind, which means it gives a poor showing when you have a large body of preexisting code.

**Alliancemd** · 07 February 2019, 02:28 AM

Ooh, that's an interesting idea.
Write an interpreter or VM in Rust and then you can cross-compile it to WebAssembly -> result: the language now runs native and in the browser

**rmoog** · 07 February 2019, 08:01 AM

Ra ra ra reimplement everything in rust

**ssokolow** · 08 February 2019, 06:33 AM

Originally posted by atomsymbol

That seems too pessimistic to me about Python. In my opinion, in a loose sense of safety for concurrency, the interpreter can perform tasks in parallel as long as it does not crash by itself because of that. No pure Python code should in theory crash the interpreter/VM. However, this allows for Python codes to behave non-deterministically or "unexpectedly" - and if the programmer wants for the program to behave according to his/her intentions the programmer is responsible for inserting calls to concurrency primitives such as locking/unlocking mutexes, communication via channels (if the language supports channels), performing atomic updates, or making per-thread copies of data structures.

In my opinion, it is wrong to demand for concurrent Python codes to behave just like sequential ones. As long as parallel execution does not crash the interpreter/VM then it should be considered a valid Python implementation.

Concurrent Python implementation would need to choose a memory model, like Java or Go did.

The biggest problem is that you can do things like reassigning methods and functions at runtime. That has made it so that any attempts to make CPython concurrent without either being scarily unsafe or breaking compatibility with the existing library ecosystem have resulted in such a performance hit that they were abandoned.

**blackshard** · 08 February 2019, 02:53 PM

Originally posted by jacob View Post

Correction: reference counting is not slow at all, that's precisely the reason why it's often preferred to a full blown GC. I don't know what CPython uses internally but somehow I doubt it's pure reference counting, firstly because the well known limitation of reference counting is that it can't handle reference cycles and to my knowledge, Python does not suffer from that problem. And also because a well implemented reference counting memory manager can use atomic instructions to increment/decrement the counters and shouldn't require thread locking, much less a GIL.

Reference counting is not slow at all: it is just a matter of increasing and decreasing a counter and releasing memory when the reference counter reaches zero, but it has a wide range of side effects you have to deal with.
Python uses pure reference counting and the "atomicity" is obtained through the GIL.

Also python suffers from reference cycles as any other implementation of software that uses reference counters: it has the garbage collector which heuristically traverses through those objects which are called containers (ie: can contain other objects). Tuples, lists, dictionaries, etc... are containers and in their implementation they have a flag that declares them as so; also they have special methods to traverse the contained objects in search of reference cycles. Integers, strings, floats, etc... cannot contain other objects so they don't need to be tracked by a garbage collector, but they are just handled by the reference counting.

**ssokolow** · 08 February 2019, 05:28 PM

Originally posted by blackshard View Post

Reference counting is not slow at all: it is just a matter of increasing and decreasing a counter and releasing memory when the reference counter reaches zero

That's actually the point I was making when I said GC can be faster because it amortizes costs. All those little increments and decrements add up to the point that a well-tuned GC can be faster because the cost of one GC pass can be less than the cost of all the increments and decrements that it's an alternative to.

**blackshard** · 09 February 2019, 12:35 PM

Originally posted by ssokolow View Post

That's actually the point I was making when I said GC can be faster because it amortizes costs. All those little increments and decrements add up to the point that a well-tuned GC can be faster because the cost of one GC pass can be less than the cost of all the increments and decrements that it's an alternative to.

I don't think I understood your reasoning: reference counting and garbage collecting are not alternatives, instead they must work together. Reference counting keep tracks of the life of an object, but reference counting introduces the reference cycles problem. The garbage collector is a way to solve the reference cycles problem.

Announcement

RustPython Is Implementing Python 3 Within Rust

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment