Announcement

**ldo17** · 25 May 2017, 09:08 PM

Originally posted by DanLamb View Post

... reference counting seems awfully simplistic next to the more advanced strategies that are discussed.

Nevertheless, it is an important first resort. It’s not the only strategy that Python uses: it can still fall back on GC if it has to.

For Java 9, one of the big features, is that you can build natively compiled, self-contained executables, that embed the portions of the JDK that you use inside of the final binary and do not require a system JDK to run. This should help with a variety of scenarios.

Only ones where you might want to hide the source code. I did say Python was good for scientific uses, didn’t I? That involves being able to publish details of what you did, including your algorithms, so others can verify that you haven’t made some stupid mistake.

**Khrundel** · 27 May 2017, 04:57 AM

Originally posted by ldo17 View Post

But it still has to put that memory on a free list.

No, it hasn't. GC doesn't use list of free small memory blocks. It uses a number of big heap areas each with stack-like allocation (just increments free pointer). During sweep phase of GC it just copies alive objects to a different heap area and then mark whole area as free. So, when all objects in area are dead, GC doesn't need to touch it untill it reallocate and initialize that memory again.

Originally posted by ldo17 View Post

Modern OSes go to a lot of trouble to keep caches from becoming “useless”. If you really think it is “useless”, try running your CPU with caches turned off, and see what a performance difference it makes.

Modern CPUs work on 2+GHz, executing 2+ instruction per cycle on each core. Even for least tick length (1ms) this mean more than 4 million instructions between thread switch. That is far from working without caches.

Originally posted by ldo17 View Post

But when it does happen, you notice the entire system stuttering. To try to offset this, you do GC more often, but that causes its own problems.

Yes, long stop-the-world phase is a problem. That's why they introduced G1GC with less performance but shorter stop-the-world periods. BTW go runtime work on GC phase shortening too.

Originally posted by ldo17 View Post

Still better than Java for high-performance scientific work, though.

Far worse actually.

**ldo17** · 27 May 2017, 04:43 PM

Originally posted by Khrundel View Post

No, it hasn't. GC doesn't use list of free small memory blocks. It uses a number of big heap areas each with stack-like allocation (just increments free pointer). During sweep phase of GC it just copies alive objects to a different heap area and then mark whole area as free. So, when all objects in area are dead, GC doesn't need to touch it untill it reallocate and initialize that memory again.

But that is at odds with your earlier claim that GC only needs to touch “live” memory areas. If it has to relocate live objects, then it is moving them to unused memory areas, which means faulting that memory wholesale into the cache.

So you see, everything you suggest to improve the performance of GC only makes things worse.

Modern CPUs work on 2+GHz, executing 2+ instruction per cycle on each core. Even for least tick length (1ms) this mean more than 4 million instructions between thread switch. That is far from working without caches.

I notice you didn’t actually try what I suggested: run with the caches off and measure the performance difference. Remember, you were the one saying cached memory was “useless”.

Far worse actually.

Feel free to quote some performance results with the Java equivalent of NumPy/SciPy. Only it doesn’t exist, does it?

**Khrundel** · 28 May 2017, 06:23 AM

Originally posted by ldo17 View Post

But that is at odds with your earlier claim that GC only needs to touch “live” memory areas. If it has to relocate live objects, then it is moving them to unused memory areas, which means faulting that memory wholesale into the cache.

facepalm.

Originally posted by ldo17 View Post

So you see, everything you suggest to improve the performance of GC only makes things worse.

I suggest nothing to improve GC performance. You suggesting some stupid idea and I'm trying to explain, why this idea won't work.

Originally posted by ldo17 View Post

I notice you didn’t actually try what I suggested: run with the caches off and measure the performance difference. Remember, you were the one saying cached memory was “useless”.

You don't see differences between "make all cached data useless each 1 (or 100) millisecond" and "caches are useless"?
As I've explained, CPU has working caches for a period of millions instructions, so It is possible to tolerate loosing caches when switching threads or collecting garbage.

Originally posted by ldo17 View Post

Feel free to quote some performance results with the Java equivalent of NumPy/SciPy. Only it doesn’t exist, does it?

NumPy/SciPy is not a python, so I suggest you to stop posting this shit. Current Python is slow. Way slower than even modern javascript engines, which has same disadvantages of dynamic scripting language, but far better VM. And modern JS engines are using GC.

**DanLamb** · 30 May 2017, 03:27 PM

Originally posted by ldo17 View Post

Still better than Java for high-performance scientific work, though.

This may be true, but it is an obnoxious flame-war style comment on a contentious issue.

First, use the tool set that you want or what is wanted by your employer, or your school, or whatever.

Personally, for a lot of more advanced science/stats, the best libs + tools are in Python, so I'd use that. For data engineering work, particularly any work where I'd want to use Spark or Kafka, I think the JVM ecosystem is a better tool for the job. I would prefer Scala or maybe Java in some narrow use cases. Or, I use what my employer wants, which in my current case is Python, and I'm happy with that.

**DanLamb** · 30 May 2017, 03:33 PM

Originally posted by Khrundel View Post

NumPy/SciPy is not a python, so I suggest you to stop posting this shit. Current Python is slow.

NumPy/SciPy are mostly not implemented in Python, but that's not a problem. They are designed for higher level use in Python. That's the Python way.

If you are doing some project with NumPy/SciPy with Python, and your end performance is acceptable, it doesn't matter at all that internally much of those libs were written in C or Fortran,

The place where I see Python performance as a problem is in writing high performance high-concurrency REST services. There, it's not as easy to encapsulate the heavy-lifting in a high performance library written in some other toolset. I'm doing a high volume REST service in Python, and Java/Scala/Go all perform approx ~10x faster in both throughput and latency measurements. And there is little I can do within the Python ecosystem to close that gap.

**ldo17** · 30 May 2017, 07:30 PM

Originally posted by Khrundel View Post

facepalm.

Run out of coherent responses? Try it again, this time harder.

I suggest nothing to improve GC performance. You suggesting some stupid idea and I'm trying to explain, why this idea won't work.

No, I was the one pointing out why your ideas to improve GC performance wouldn’t work. You were the one trying to claim GC was good enough, you didn’t need reference-counting as well.

NumPy/SciPy is not a python...

Still cannot offer a Java equivalent, can you? Why do you think it doesn’t exist?

Because nobody would take Java seriously for high-performance scientific/numerical work.

**ldo17** · 30 May 2017, 07:31 PM

Originally posted by DanLamb View Post

The place where I see Python performance as a problem is in writing high performance high-concurrency REST services. There, it's not as easy to encapsulate the heavy-lifting in a high performance library written in some other toolset. I'm doing a high volume REST service in Python, and Java/Scala/Go all perform approx ~10x faster in both throughput and latency measurements. And there is little I can do within the Python ecosystem to close that gap.

Multithreading with high CPU usage? This will run into Python’s Global Interpreter Lock.

Have you tried multiple processes instead of multiple threads?

**Khrundel** · 30 May 2017, 11:22 PM

Originally posted by DanLamb View Post

NumPy/SciPy are mostly not implemented in Python, but that's not a problem. They are designed for higher level use in Python. That's the Python way.

You've missed the context. NumPy/SciPy were given as example that Python interprier can do something hiperf and thus everybody should borrow CPython's memory "know-how". Being native, these libraries can tell nothing about CPython performance.

**Khrundel** · 30 May 2017, 11:54 PM

Originally posted by ldo17 View Post

Run out of coherent responses? Try it again, this time harder.

No, just tired to explain you simple idea again and again. OK, I can spell it one more time: it is nothing terrible to invalidate cached data once a second (what GC does) and it is very bad to lose a part of a cache each time you have to copy a struct, containing a couple of references. And it is a catastrophe to do it thread safe way, because atomic updates mess with caches of every CPU/Core.

Originally posted by ldo17 View Post

No, I was the one pointing out why your ideas to improve GC performance wouldn’t work. You were the one trying to claim GC was good enough, you didn’t need reference-counting as well.

Actually, all hiperf runtimes are using pure GC without RC, so I've invented nothing and suggesting no improvement. I actualy don't even suggest to improve CPython by throwing out RCs because it have a bigger problem with GIL.

Originally posted by ldo17 View Post

Still cannot offer a Java equivalent, can you? Why do you think it doesn’t exist?

You are kidding? I've already gave you example of such lib, MTJ (matrix toolkit java). Actually, there are many of them.

Originally posted by ldo17 View Post

Because nobody would take Java seriously for high-performance scientific/numerical work.

No, because Python for scientific work is just like calculator or excel for computation. You can do simple computation much faster using some handwritten program, but to sum several numbers you are using a calculator or electronic spreadsheet, if there are more numbers. Simply because performance benefit of a program will never pay for a coding time spent.

Announcement

Java 9 Tech Preview Planned For Fedora 27

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment