Announcement

Collapse
No announcement yet.

Java Benchmarks: OpenJDK 8 Through OpenJDK 19 EA, OpenJ9, GraalVM CE

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wibowit
    replied
    Originally posted by darkonix View Post
    Also blindly increasing -Xmx not understanding Java memory management can easily cause more out of memory errors, instead of less. The reason is that Java uses more memory than only the heap, though not as easy to predict or limit.
    Yep, we had such situations. My colleagues were increasing -Xmx while the problem was that Java was trying to use more memory than container was giving to it. The solution is to use NMT (native memory tracking) , log it continuously, pinpoint the moment where memory was about to be exhausted and find what really caused allocation failure.

    Article about NMT: https://shipilev.net/jvm/anatomy-qua...mory-tracking/
    Example raport:
    Code:
    Native Memory Tracking:
    
    Total: reserved=1373921KB, committed=74953KB
    - Java Heap (reserved=16384KB, committed=16384KB)
    (mmap: reserved=16384KB, committed=16384KB)
    
    - Class (reserved=1066093KB, committed=14189KB)
    (classes #391)
    (malloc=9325KB #148)
    (mmap: reserved=1056768KB, committed=4864KB)
    
    - Thread (reserved=19614KB, committed=19614KB)
    (thread #19)
    (stack: reserved=19532KB, committed=19532KB)
    (malloc=59KB #105)
    (arena=22KB #38)
    
    - Code (reserved=249632KB, committed=2568KB)
    (malloc=32KB #297)
    (mmap: reserved=249600KB, committed=2536KB)
    
    - GC (reserved=10991KB, committed=10991KB)
    (malloc=10383KB #129)
    (mmap: reserved=608KB, committed=608KB)
    
    - Compiler (reserved=132KB, committed=132KB)
    (malloc=2KB #23)
    (arena=131KB #3)
    
    - Internal (reserved=9444KB, committed=9444KB)
    (malloc=9412KB #1373)
    (mmap: reserved=32KB, committed=32KB)
    
    - Symbol (reserved=1356KB, committed=1356KB)
    (malloc=900KB #65)
    (arena=456KB #1)
    
    - Native Memory Tracking (reserved=38KB, committed=38KB)
    (malloc=3KB #41)
    (tracking overhead=35KB)
    
    - Arena Chunk (reserved=237KB, committed=237KB)
    (malloc=237KB)
    Note that "Java heap" corresponds to the memory pool controlled by -Xms and -Xmx, while other memory pools are separate and they use memory on top of what "Java heap" needs. "GC" is probably the GC overhead (this can be minimized by using slow SerialGC). "Class" is for loaded classes (i.e. much of the class metadata is not residing on "Java heap") - this can't be minimized by JVM switches, you need to reduce number of loaded classes (e.g. by trimming dependencies). "Thread" is for thread stacks - that can be controlled by -Xss and by minimizing number of created threads. And so on...


    Originally posted by darkonix View Post
    It also doesn't consider that plus the rest of the system, it is kind of promising more memory than the system really will have available under pressure.
    You can use -XX:+AlwaysPreTouch (disabled by default) to pre-touch managed heap (i.e. where Java objects are allocated).

    Leave a comment:


  • darkonix
    replied
    Originally posted by bug77 View Post

    Such benchmarks would be meaningless. The GC divides memory into several areas and provides default sizes that do not choke the JVM with typical apps. It's all too easy to write a benchmark that would run into the limits of one garbage collector implementation, but not the next.
    It is actually up to you to benchmark the GC, by running stress tests on your app, see where it chokes and adjust GC params accordingly. Many people or companies don't do that, but it should be part of the standard delivery pipeline, if you care about performance. In many cases, people blindly increase the heap size (-Xmx above), which has the side effect of increasing the sizes of all the memory zones. Needless to say it's the most wasteful way of fixing a performance problem.
    Also blindly increasing -Xmx not understanding Java memory management can easily cause more out of memory errors, instead of less. The reason is that Java uses more memory than only the heap, though not as easy to predict or limit. It also doesn't consider that plus the rest of the system, it is kind of promising more memory than the system really will have available under pressure.
    Last edited by darkonix; 26 June 2022, 12:03 AM.

    Leave a comment:


  • bug77
    replied
    Originally posted by gggeek View Post

    True. I was just giving a possible explanation for the fact that older JDK seem to be more performant than recent ones, in benchmarks.
    I am not a java guru, but I wonder: are there not any benchmarks stressing explicitly the GC? Or would that be something that varies too much based on each app's code and runtime execution to make meaningful comparisons virtually impossible?
    Such benchmarks would be meaningless. The GC divides memory into several areas and provides default sizes that do not choke the JVM with typical apps. It's all too easy to write a benchmark that would run into the limits of one garbage collector implementation, but not the next.
    It is actually up to you to benchmark the GC, by running stress tests on your app, see where it chokes and adjust GC params accordingly. Many people or companies don't do that, but it should be part of the standard delivery pipeline, if you care about performance. In many cases, people blindly increase the heap size (-Xmx above), which has the side effect of increasing the sizes of all the memory zones. Needless to say it's the most wasteful way of fixing a performance problem.

    Leave a comment:


  • Wibowit
    replied
    Scala dev here (so we're running every application on JVM of course). Java memory management is pretty inflexible, so you need to predict what your application's memory usage limit should be and set it beforehand. Actually, if you're running multiple application in single VM then setting memory limit per app will prevent one app killing the entire VM, so in some way that's actually good for servers.

    You can safely assume that out-of-the-box memory limits are wrong and that manually setting -Xmx (and -Xms) is the bare minimum of JVM tuning (i.e. otherwise the default settings will be rather poor). Some Java authors are contemplating making the memory limits flexible, but that's (a bit shockingly) not a high priority stuff right now, see e.g.: https://mail.openjdk.org/pipermail/z...er/thread.html Instead of inventing flexible memory usage limits management, there are new options for manual tuning, e.g. https://mail.openjdk.org/pipermail/z...ay/001014.html

    Another thing (already mentioned) is that the default GC changed when going from Java 8 to Java 9. Default in Java 8 (and earlier) is ParallelGC, default in Java 9 (and later) is G1GC. If you want to exclude the impact of GC settings, explicitly set the GC (e.g. use XX:+UseG1GC or -XX:+UseParallelGC) and memory limits (both -Xms and -Xmx and set -Xms to be in the same order of magnitude as -Xms).

    If you want to see some apples-to-apples comparisons between Java versions then here are some:
    https://kstefanj.github.io/
    https://tschatzl.github.io/

    Interestingly, one article by Java author https://kstefanj.github.io/2020/04/1...rformance.html (and subsequent changes to OpenJDK) was done after some previous tests on Phoronix:
    A few weeks back a set of benchmark results comparing JDK 8 and JDK 14 were published by Phoronix. The SPECjbb® 2015 results presented in that report really caught our eyes. They don’t compare to what we have seen in our own testing and this needed some investigation.
    Java is tested using similar benchmarks as Phoronix used, but Java authors are using fixed heap size:
    SPECjbb®1 2015 is a good benchmark to measure the overall performance of Java and also the impact of different GC algorithms. We run it continuously in our testing. Most of the time we tune it to run with a fixed heap size, because this is a well known and good practice to get stable and reproducible results. Setting a fixed heap, using JVM options -Xmx4g -Xms4g, we get results like this.
    They claim they pushed already a fix to Java 15...
    We decided to address this problem right away and a change to improve the behavior has already been pushed to JDK 15 (JDK-8241670).
    (...)
    A really nice improvement to the G1 out-of-the-box performance.
    ...but somehow the out-of-the-box performance in Phoronix tests is still as bad as it was before.

    Leave a comment:


  • gggeek
    replied
    Originally posted by bug77 View Post

    That's a different aspect: what was benchmarked here is out-of-the-box performance. And nobody goes to prod using default settings, the JVM is highly configurable. It can be a full-time job to properly tailor the runtime for each application.
    True. I was just giving a possible explanation for the fact that older JDK seem to be more performant than recent ones, in benchmarks.
    I am not a java guru, but I wonder: are there not any benchmarks stressing explicitly the GC? Or would that be something that varies too much based on each app's code and runtime execution to make meaningful comparisons virtually impossible?

    Leave a comment:


  • bug77
    replied
    Originally posted by gggeek View Post

    In my own very limited experience (running a heavily trafficked Solr app with around 6 to a dozen GB memory allocated), newer versions of OpenJDK soundly trounce version 8. Of course it all comes down to what is the bottleneck in the application, but it seems that for many real-world scenarios, java daemons are mostly hampered by the GC implementation, tuning of which is as close to black arts as one can get. Moving from JDK 8 to version 11 and 17 got us better perfs (higher throughput and lower response times) than all the manual tuning of VM startup parameters that we had come up with, most likely because of the improvements in the garbage collector...
    That's a different aspect: what was benchmarked here is out-of-the-box performance. And nobody goes to prod using default settings, the JVM is highly configurable. It can be a full-time job to properly tailor the runtime for each application.

    Leave a comment:


  • sinepgib
    replied
    Originally posted by zamroni111 View Post
    Java coder can always nullify unused objects and call garbage collector to reduce memory usage.
    At the bargain price of a stop the world and losing the main advantage of using a GC in the first place. I mean, if you'll go for a GC'd language you need very good reasons to try to outsmart it. There are, of course, scenarios where this might be worth it, but if it's the general case you just picked the wrong language.

    Originally posted by cynic View Post
    How many "app developer" do you think would be able to write their apps in Rust?
    Most of them are not even actual programmers and don't have any CS education. (not that it is a requirement for Rust programming, but is at least helpful.)

    If Rust were the only option for writing apps, the whole mobile world (including software, harware and services) would be 1/100 of what it is today.

    (and the borrow checker surely would find something to bother me for every time I try to make a call )
    I only said "allow". I can choose native applications on my desktop or bs Electron ones (and yes, I'm 100% dogmatic about Electron, Java I can tolerate, Electron is an insult). But I can pretty much only pick between web apps and Java apps on phones. It's still in the territory of "will never happen" also for the reasons you listed tho.

    Leave a comment:


  • FireBurn
    replied
    I can't believe you wasted your time benchmarking such old point releases of Java - or at least without doing it side by side the latest ones

    Leave a comment:


  • gggeek
    replied
    Originally posted by archkde View Post
    Why is OpenJDK 8 so fast, counter to the general trend "newer is faster"?
    In my own very limited experience (running a heavily trafficked Solr app with around 6 to a dozen GB memory allocated), newer versions of OpenJDK soundly trounce version 8. Of course it all comes down to what is the bottleneck in the application, but it seems that for many real-world scenarios, java daemons are mostly hampered by the GC implementation, tuning of which is as close to black arts as one can get. Moving from JDK 8 to version 11 and 17 got us better perfs (higher throughput and lower response times) than all the manual tuning of VM startup parameters that we had come up with, most likely because of the improvements in the garbage collector...

    Leave a comment:


  • bug77
    replied
    Originally posted by archkde View Post

    The only thing of those introduced in Java 9 are modules, and I don't see why they would cause this large regression (or any, for that matter). Another Java 9 change that may impact performance may be the new GC, but that doesn't explain why the regression seems to be particularly pronounced on I/O heavy benchmarks.
    There was another important change in Java 9: Internal classes (com.sun.* and friends) were finally hidden from the outside world. This was in conjunction with the work on modules, it's likely it has also added some overhead internally.
    But you are right, I cannot explain the differences in detail. On the other hand, the difference is rather small. One other area where the performance has regressed (not caught in these tests), is startup performance in resource constrained environments (e.g. embedded).
    Last edited by bug77; 24 June 2022, 08:55 AM.

    Leave a comment:

Working...
X