Announcement

**bug77** · 24 June 2022, 08:57 AM

Originally posted by gggeek View Post

In my own very limited experience (running a heavily trafficked Solr app with around 6 to a dozen GB memory allocated), newer versions of OpenJDK soundly trounce version 8. Of course it all comes down to what is the bottleneck in the application, but it seems that for many real-world scenarios, java daemons are mostly hampered by the GC implementation, tuning of which is as close to black arts as one can get. Moving from JDK 8 to version 11 and 17 got us better perfs (higher throughput and lower response times) than all the manual tuning of VM startup parameters that we had come up with, most likely because of the improvements in the garbage collector...

That's a different aspect: what was benchmarked here is out-of-the-box performance. And nobody goes to prod using default settings, the JVM is highly configurable. It can be a full-time job to properly tailor the runtime for each application.

**gggeek** · 25 June 2022, 05:53 AM

Originally posted by bug77 View Post

That's a different aspect: what was benchmarked here is out-of-the-box performance. And nobody goes to prod using default settings, the JVM is highly configurable. It can be a full-time job to properly tailor the runtime for each application.

True. I was just giving a possible explanation for the fact that older JDK seem to be more performant than recent ones, in benchmarks.
I am not a java guru, but I wonder: are there not any benchmarks stressing explicitly the GC? Or would that be something that varies too much based on each app's code and runtime execution to make meaningful comparisons virtually impossible?

**Wibowit** · 25 June 2022, 06:15 AM

Scala dev here (so we're running every application on JVM of course). Java memory management is pretty inflexible, so you need to predict what your application's memory usage limit should be and set it beforehand. Actually, if you're running multiple application in single VM then setting memory limit per app will prevent one app killing the entire VM, so in some way that's actually good for servers.

You can safely assume that out-of-the-box memory limits are wrong and that manually setting -Xmx (and -Xms) is the bare minimum of JVM tuning (i.e. otherwise the default settings will be rather poor). Some Java authors are contemplating making the memory limits flexible, but that's (a bit shockingly) not a high priority stuff right now, see e.g.: https://mail.openjdk.org/pipermail/z...er/thread.html Instead of inventing flexible memory usage limits management, there are new options for manual tuning, e.g. https://mail.openjdk.org/pipermail/z...ay/001014.html

Another thing (already mentioned) is that the default GC changed when going from Java 8 to Java 9. Default in Java 8 (and earlier) is ParallelGC, default in Java 9 (and later) is G1GC. If you want to exclude the impact of GC settings, explicitly set the GC (e.g. use XX:+UseG1GC or -XX:+UseParallelGC) and memory limits (both -Xms and -Xmx and set -Xms to be in the same order of magnitude as -Xms).

If you want to see some apples-to-apples comparisons between Java versions then here are some:

Stefan Johansson’s Blog Garbage Collection, OpenJDK and Java

https://kstefanj.github.io/

A blog about Garbage Collection, OpenJDK and Java.

https://tschatzl.github.io/

Interestingly, one article by Java author https://kstefanj.github.io/2020/04/1...rformance.html (and subsequent changes to OpenJDK) was done after some previous tests on Phoronix:

A few weeks back a set of benchmark results comparing JDK 8 and JDK 14 were published by Phoronix. The SPECjbb® 2015 results presented in that report really caught our eyes. They don’t compare to what we have seen in our own testing and this needed some investigation.

Java is tested using similar benchmarks as Phoronix used, but Java authors are using fixed heap size:

SPECjbb®¹ 2015 is a good benchmark to measure the overall performance of Java and also the impact of different GC algorithms. We run it continuously in our testing. Most of the time we tune it to run with a fixed heap size, because this is a well known and good practice to get stable and reproducible results. Setting a fixed heap, using JVM options -Xmx4g -Xms4g, we get results like this.

They claim they pushed already a fix to Java 15...

We decided to address this problem right away and a change to improve the behavior has already been pushed to JDK 15 (JDK-8241670).
(...)
A really nice improvement to the G1 out-of-the-box performance.

...but somehow the out-of-the-box performance in Phoronix tests is still as bad as it was before.

**bug77** · 25 June 2022, 06:50 PM

Originally posted by gggeek View Post

True. I was just giving a possible explanation for the fact that older JDK seem to be more performant than recent ones, in benchmarks.
I am not a java guru, but I wonder: are there not any benchmarks stressing explicitly the GC? Or would that be something that varies too much based on each app's code and runtime execution to make meaningful comparisons virtually impossible?

Such benchmarks would be meaningless. The GC divides memory into several areas and provides default sizes that do not choke the JVM with typical apps. It's all too easy to write a benchmark that would run into the limits of one garbage collector implementation, but not the next.
It is actually up to you to benchmark the GC, by running stress tests on your app, see where it chokes and adjust GC params accordingly. Many people or companies don't do that, but it should be part of the standard delivery pipeline, if you care about performance. In many cases, people blindly increase the heap size (-Xmx above), which has the side effect of increasing the sizes of all the memory zones. Needless to say it's the most wasteful way of fixing a performance problem.

**darkonix** · 25 June 2022, 07:12 PM

Originally posted by bug77 View Post

Such benchmarks would be meaningless. The GC divides memory into several areas and provides default sizes that do not choke the JVM with typical apps. It's all too easy to write a benchmark that would run into the limits of one garbage collector implementation, but not the next.
It is actually up to you to benchmark the GC, by running stress tests on your app, see where it chokes and adjust GC params accordingly. Many people or companies don't do that, but it should be part of the standard delivery pipeline, if you care about performance. In many cases, people blindly increase the heap size (-Xmx above), which has the side effect of increasing the sizes of all the memory zones. Needless to say it's the most wasteful way of fixing a performance problem.

Also blindly increasing -Xmx not understanding Java memory management can easily cause more out of memory errors, instead of less. The reason is that Java uses more memory than only the heap, though not as easy to predict or limit. It also doesn't consider that plus the rest of the system, it is kind of promising more memory than the system really will have available under pressure.

**Wibowit** · 26 June 2022, 03:31 AM

Originally posted by darkonix View Post

Also blindly increasing -Xmx not understanding Java memory management can easily cause more out of memory errors, instead of less. The reason is that Java uses more memory than only the heap, though not as easy to predict or limit.

Yep, we had such situations. My colleagues were increasing -Xmx while the problem was that Java was trying to use more memory than container was giving to it. The solution is to use NMT (native memory tracking) , log it continuously, pinpoint the moment where memory was about to be exhausted and find what really caused allocation failure.

Article about NMT: https://shipilev.net/jvm/anatomy-qua...mory-tracking/
Example raport:

Code:

Native Memory Tracking:

Total: reserved=1373921KB, committed=74953KB
- Java Heap (reserved=16384KB, committed=16384KB)
(mmap: reserved=16384KB, committed=16384KB)

- Class (reserved=1066093KB, committed=14189KB)
(classes #391)
(malloc=9325KB #148)
(mmap: reserved=1056768KB, committed=4864KB)

- Thread (reserved=19614KB, committed=19614KB)
(thread #19)
(stack: reserved=19532KB, committed=19532KB)
(malloc=59KB #105)
(arena=22KB #38)

- Code (reserved=249632KB, committed=2568KB)
(malloc=32KB #297)
(mmap: reserved=249600KB, committed=2536KB)

- GC (reserved=10991KB, committed=10991KB)
(malloc=10383KB #129)
(mmap: reserved=608KB, committed=608KB)

- Compiler (reserved=132KB, committed=132KB)
(malloc=2KB #23)
(arena=131KB #3)

- Internal (reserved=9444KB, committed=9444KB)
(malloc=9412KB #1373)
(mmap: reserved=32KB, committed=32KB)

- Symbol (reserved=1356KB, committed=1356KB)
(malloc=900KB #65)
(arena=456KB #1)

- Native Memory Tracking (reserved=38KB, committed=38KB)
(malloc=3KB #41)
(tracking overhead=35KB)

- Arena Chunk (reserved=237KB, committed=237KB)
(malloc=237KB)

Note that "Java heap" corresponds to the memory pool controlled by -Xms and -Xmx, while other memory pools are separate and they use memory on top of what "Java heap" needs. "GC" is probably the GC overhead (this can be minimized by using slow SerialGC). "Class" is for loaded classes (i.e. much of the class metadata is not residing on "Java heap") - this can't be minimized by JVM switches, you need to reduce number of loaded classes (e.g. by trimming dependencies). "Thread" is for thread stacks - that can be controlled by -Xss and by minimizing number of created threads. And so on...

Originally posted by darkonix View Post

It also doesn't consider that plus the rest of the system, it is kind of promising more memory than the system really will have available under pressure.

You can use -XX:+AlwaysPreTouch (disabled by default) to pre-touch managed heap (i.e. where Java objects are allocated).

Announcement

Java Benchmarks: OpenJDK 8 Through OpenJDK 19 EA, OpenJ9, GraalVM CE

Comment

Comment

Comment

Comment

Comment

Comment