Announcement

Collapse
No announcement yet.

Java 9 Tech Preview Planned For Fedora 27

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by DanLamb View Post

    Reference counting is a form of garbage collection.
    No, it is more memory-efficient, and possibly also less cache-hostile.

    I don't think that is the cutting edge or what people want.
    Languages like Python and Perl seem to do quite well with it, only falling back to garbage collection when they have no choice.

    Why can’t Java do the same?

    That is such a petty comment to make.
    Yet you are the one saying this new feature is not necessary.

    Comment


    • #42
      Originally posted by ldo17 View Post
      No, it is more memory-efficient,
      Gain isn't so big. As you've seen, 3x times at most.
      Originally posted by ldo17 View Post
      and possibly also less cache-hostile.
      unlike Rust's Rc, Java can't guarantee that only one thread will have access to a reference at specific time, so it will have to do atomic updates of counters, which are not cache friendly. Unlike Rust, Java doesn't move references, so it can't skip redundant counter updates, so, if you just copying reference, CPU will have to access referenced memory and this pollutes caches.
      So no, GC is less cache hostile than reference counting. To make rcs cache friendly, you have to modify language. And if you agree to change Java so much, you don't need to add rcs, you can significantly decrease problem by adding custom value types.

      Originally posted by ldo17 View Post
      Languages like Python and Perl seem to do quite well with it, only falling back to garbage collection when they have no choice.

      Why can’t Java do the same?
      as I already mentioned, hybrid scheme will have disadvantages of both. If you don't trust me, look at reality: no high performance languages are using that scheme. It is either rcs or GC, not both.

      Comment


      • #43
        Originally posted by Khrundel View Post
        Gain isn't so big. As you've seen, 3x times at most.
        You don’t think that’s a worthwhile saving?!?

        unlike Rust's Rc, Java can't guarantee that only one thread will have access to a reference at specific time, so it will have to do atomic updates of counters, which are not cache friendly.
        Still better than being actively cache-hostile, like garbage collection is.

        as I already mentioned, hybrid scheme will have disadvantages of both. If you don't trust me, look at reality: no high performance languages are using that scheme. It is either rcs or GC, not both.
        Python is used quite heavily for scientific work. If that’s not “high-performance”, I don’t know what is.

        Comment


        • #44
          Originally posted by ldo17 View Post
          You don’t think that’s a worthwhile saving?!?
          No. Memory is cheap. If I can buy additional memory or I can wait 20x more time for my tasks to complete (in case of python), I will buy a memory.
          Originally posted by ldo17 View Post
          Still better than being actively cache-hostile, like garbage collection is.
          I, actually, think that GC is cache friendlier than thread-aware rcs. "Everything is object" makes Java cache-hostile, not GC. Java just stores data sparse way, and nothing can be done, without changing language itself.
          Yes, VM pollutes caches during GC phase, but rc pollutes it during user code execution, when cache is needed most. Imagine, you have some global piece of data (string, for example) references to which used everywhere in app and multiple threads are copying these references as insane. That isn't some unusual, this string can be a key in some maps and it will be passed as parameter to many functions and stored in multiple hashmaps. And each time a copy is created or destroyed current CPU core will issue signal, forcing each other cores and CPUs to invalidate cache string containing this counter. Isn't this cache-hostile?
          Originally posted by ldo17 View Post
          Python is used quite heavily for scientific work. If that’s not “high-performance”, I don’t know what is.
          You must be kidding. Python, unable to use SIMD instructions and simultaneous multithreading (see "global interprier lock"), just can't do anything high performance. Python is used in scientific work for low throughput work and as a glue between high performance parts, written using other languages. You can run heavy calculation from shell script, but this won't make the bash to become a high performance interprier.

          Comment


          • #45
            Originally posted by Khrundel View Post
            Memory is cheap.
            That’s the kind of thing that leads to the bloated apps that Java is (in)famous for.

            Memory is cheap, but accessing memory is expensive. That is the paradox of today’s cache-heavy high-performance CPUs.

            I, actually, think that GC is cache friendlier than thread-aware rcs.
            Think about it: reference-counting hits memory that is already likely to be in the cache. Whereas garbage collection is all about whacking memory belonging to long-dead objects, that would have long since fallen out of the cache. Not to mention a whole lot of pointer-chasing going on. That’s what I mean by “cache-hostile”.

            "Everything is object" makes Java cache-hostile, not GC.
            Java does primitive types in a stupid way--look at the whole boxing/unboxing thing. C++ does a better job of unifying PODs and object types while avoiding overhead. Or go the other way with Python, where everything really is an object. Trying to be in-between the two is not a happy place to be.

            You must be kidding. Python, unable to use SIMD instructions and simultaneous multithreading (see "global interprier lock"), just can't do anything high performance.
            Yes, it is used for high-performance stuff--look at NumPy/SciPy.

            Feel free to give any example of high-performance stuff done using Java.

            Comment


            • #46
              Originally posted by ldo17 View Post
              Memory is cheap, but accessing memory is expensive.
              Sure, that's why rcs are evil. In JVM you just passing references as simple numbers and access referenced regions only when you need to. In case of rcs you have to do atomic updates of counters each time you create or destroy a copy of reference. Dunno why you ignore this aspect.
              Originally posted by ldo17 View Post
              Think about it: reference-counting hits memory that is already likely to be in the cache.
              No, most cases you will do a shallow copy of a pointer. Imagine simple arraylist of objects. Let assume it contains 10000 references. You trying to add one object, but underlying array is full and library have to reallocate it. In case of current JVM, it just copies 40000 bytes. In case of rcs JVM will have to do atomic updates of 10000 integers, located in different places in unknown order, then old array will be deallocated and JVM will have to do this update again. Of course, JVM can optimize out these updates, especially for library type such as arraylist, but I think you got the idea.
              Originally posted by ldo17 View Post
              Whereas garbage collection is all about whacking memory belonging to long-dead objects, that would have long since fallen out of the cache. Not to mention a whole lot of pointer-chasing going on. That’s what I mean by “cache-hostile”.
              That's why generational GC was invented. Most time GC is doing its dirty work within youngest generation area, where most dead objects are located. Long dead objects lie within old generation heap, and no one goes there for a long time.
              Originally posted by ldo17 View Post
              Yes, it is used for high-performance stuff--look at NumPy/SciPy.
              And both of them implemented not using Python.
              Originally posted by ldo17 View Post
              Feel free to give any example of high-performance stuff done using Java.
              There are number of math libraries for Java, some of them relatively low performance (still faster than anything written using Python), other are high performance, but, as numpy/scipy, are just bindings to something native. MTJ for example.

              Comment


              • #47
                Originally posted by Khrundel View Post
                Sure, that's why rcs are evil.
                And why garbage collection is even more evil.

                In JVM you just passing references as simple numbers and access referenced regions only when you need to.In case of rcs you have to do atomic updates of counters each time you create or destroy a copy of reference. Dunno why you ignore this aspect.
                Strange, I keep pointing it .out: the garbage collector needs to go back and hit the memory after it has become dead, while the reference counter is more likely to hit it while it’s still in the cache.

                Imagine simple arraylist of objects. Let assume it contains 10000 references. You trying to add one object, but underlying array is full and library have to reallocate it. In case of current JVM, it just copies 40000 bytes. In case of rcs JVM will have to do atomic updates of 10000 integers, located in different places in unknown order, then old array will be deallocated and JVM will have to do this update again.
                Luckily, Python doesn’t need to do it that way. If you look at its C API, you will see things called “stolen references”. So you just need to copy the old array to the new one, free the old one, and you’re done.

                That's why generational GC was invented. Most time GC is doing its dirty work within youngest generation area, where most dead objects are located. Long dead objects lie within old generation heap, and no one goes there for a long time.
                But it doesn’t matter where the objects are located in memory, they still pollute the same CPU caches.

                And both of them implemented not using Python.
                And Java is not implemented using Java. Your point being...?

                Comment


                • #48
                  Originally posted by ldo17 View Post
                  Strange, I keep pointing it .out: the garbage collector needs to go back and hit the memory after it has become dead, while the reference counter is more likely to hit it while it’s still in the cache.
                  No. GC hits only alive objects. It traces living object graph. If some memory occupied by dead object, JVM won't touch it until it will allocate that memory again.

                  Look. As you know, all modern oses utilize preemptive multitasking. That mean each tick (which longs from 1 to 100 ms, depending on a platform) CPU switches to another thread or even a process, making all cached memory useless. Garbage collections occurs less frequently and longs more time than 1 tick, so caches pollution of GC is negligible.
                  Contrary, rc issues atomic write ops right within user's code, when caches are required to perform well.
                  Originally posted by ldo17 View Post
                  Luckily, Python doesn’t need to do it that way. If you look at its C API, you will see things called “stolen references”. So you just need to copy the old array to the new one, free the old one, and you’re done.
                  For a library type you can do that. You even can try to add moves detection to VM, to make this work even in user's code. But there are some cases (branching or virtual method calls) when VM can't guarantee that source reference will not be touched again, so such problems will occur. Especially in Java, where all methods are virtual.

                  Originally posted by ldo17 View Post
                  But it doesn’t matter where the objects are located in memory, they still pollute the same CPU caches.
                  ​​​​No, atomic write operations tempers with caches of all cpus. And you don't want to loose caches even on same core.


                  Originally posted by ldo17 View Post
                  And Java is not implemented using Java. Your point being...?
                  looks like you've forgot previous step of this argue. You tried to proof that Python can do a hiperf job and gave scipy/numpy as example. This is incorrect argument because these libraries are just binding to native code. So Python is slow. No, actually it is SLooooooooooooooooooooow.

                  Comment


                  • #49
                    Originally posted by ldo17 View Post
                    Memory is cheap, but accessing memory is expensive. That is the paradox of today’s cache-heavy high-performance CPUs.
                    Memory isn't necessarily cheap. 256GB+ server configurations are really expensive. And it's more expensive to scale up.

                    Sure, lots of memory optimization issues that used to be more important are not usually as important in 2017.

                    Regarding Garbage Collection, here is a good post:


                    There are different things your GC system can optimize for: throughput, pause times, compaction, warmup. I haven't had the interest to really deep dive into this, but reference counting seems awfully simplistic next to the more advanced strategies that are discussed. And I doubt such an obvious answer is some holy grail after researchers have done far more work into fancier solutions.

                    Originally posted by ldo17 View Post
                    Python is used quite heavily for scientific work. If that’s not “high-performance”, I don’t know what is.
                    Python is awesome, it's wildly successful in scientific computing, but that's because of it's programmer-side elegance and simplicity, not its performance.

                    The pythonic way is your performance sensitive pieces are written in C or high performance Fortran, but all the higher level logic and code can be done in Python. The Python code itself generally doesn't run fast and that's usually ok.


                    For Java 9, one of the big features, is that you can build natively compiled, self-contained executables, that embed the portions of the JDK that you use inside of the final binary and do not require a system JDK to run. This should help with a variety of scenarios.

                    Comment


                    • #50
                      Originally posted by Khrundel View Post
                      No. GC hits only alive objects. It traces living object graph. If some memory occupied by dead object, JVM won't touch it until it will allocate that memory again.
                      But it still has to put that memory on a free list. And touching every live object isn’t very helpful, since most of them would not be in the cache.

                      Whichever way you look at it, GC is cache-hostile.

                      Look. As you know, all modern oses utilize preemptive multitasking. That mean each tick (which longs from 1 to 100 ms, depending on a platform) CPU switches to another thread or even a process, making all cached memory useless.
                      Modern OSes go to a lot of trouble to keep caches from becoming “useless”. If you really think it is “useless”, try running your CPU with caches turned off, and see what a performance difference it makes.

                      Garbage collections occurs less frequently and longs more time than 1 tick, so caches pollution of GC is negligible.
                      But when it does happen, you notice the entire system stuttering. To try to offset this, you do GC more often, but that causes its own problems.

                      You tried to proof that Python can do a hiperf job and gave scipy/numpy as example. This is incorrect argument because these libraries are just binding to native code. So Python is slow. No, actually it is SLooooooooooooooooooooow.
                      Still better than Java for high-performance scientific work, though.

                      Comment

                      Working...
                      X