Announcement

Collapse
No announcement yet.

Glibc 2.39 Should Be Out On 1 February & Might Drop Itanium IA64 Linux Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Glibc 2.39 Should Be Out On 1 February & Might Drop Itanium IA64 Linux Support

    Phoronix: Glibc 2.39 Should Be Out On 1 February & Might Drop Itanium IA64 Linux Support

    A release plan has been drafted for the upcoming GNU C Library "glibc" 2.39 release as well as some possible last minute changes...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Wave of "lol itanic, so shitty and pathetic" from people who have never touched an IPF system and don't understand what they were used for in 3... 2... 1...

    Comment


    • #3
      Originally posted by Dawn View Post
      Wave of "lol itanic, so shitty and pathetic" from people who have never touched an IPF system and don't understand what they were used for in 3... 2... 1...
      The fact that Itanium systems had good uses doesn’t change the fact that architecture was shitty.

      Comment


      • #4
        Originally posted by dragon321 View Post

        The fact that Itanium systems had good uses doesn’t change the fact that architecture was shitty.
        Seems reasonable. I'm sure you have specific criticisms with that in mind, not just "muh compilers" or "EPIC is bad."

        What is it? Advanced loads? RSE? Rotation for loop pipelining?

        Comment


        • #5
          Originally posted by Dawn View Post

          Seems reasonable. I'm sure you have specific criticisms with that in mind, not just "muh compilers" or "EPIC is bad."

          What is it? Advanced loads? RSE? Rotation for loop pipelining?
          "I love my OpenVMS!" said at least one person, probably

          Comment


          • #6
            Originally posted by AlanTuring69 View Post

            "I love my OpenVMS!" said at least one person, probably
            Presumably, someone in the tiny fraction of Itanium users who ran OpenVMS. (Hint: Very, very few.)

            Comment


            • #7
              Originally posted by dragon321 View Post

              The fact that Itanium systems had good uses doesn’t change the fact that architecture was shitty.
              The architecture itself was not "shitty". The compilers of the era were not up to producing good code for such an architecture. And the number of individuals competent to writing good assembler for such explicit parallelism was never in large supply(*). "A bridge too far".

              (*) FD: I wrote some assembly level code for DSPs which, in some ways, share some of the explicit parallelism that Itanium offers. It is not for the feint of heard.
              Last edited by CommunityMember; 03 January 2024, 10:37 PM.

              Comment


              • #8
                Originally posted by CommunityMember View Post

                The architecture itself was not "shitty". The compilers of the era were not up to producing good code for such an architecture. And the number of individuals competent to writing good assembler for such explicit parallelism was never in large supply(*). "A bridge too far".

                (*) FD: I wrote some assembly level code for DSPs which, in some ways, share some of the explicit parallelism that Itanium offers. It is not for the feint of heard.
                Actually, it was far worse than that.

                The wished-for compilers simply couldn't be written (this is what Knuth said first, btw).

                It's not that Intel didn't try - they actually put together a pretty good team to get it done (I personally know 1, maybe 2 people, both top notch, who were on that team). But they failed miserably.

                Why? Simple really, if one is actually willing to consider mathematical evidence (in the political sphere, of course, everything that is not convenient can and will be summarily dismissed).

                The issue is Rice's theorem. Now everyone around here and their dog knows about the Halting Problem - given a program in a Turing-complete programming language, another program can't tell if it stops on any input, or not. Ok, so far so good. Rice's theorem is like the halting problem on steroids - any (whatsoever) dynamic behavior is undecidable for programs in a Turing-complete language. Well, to be 100% clear, the theorem states that the only way to classify programs based on semantic properties (i.e., their dynamic behavior) is trivial - i.e., empty set of programs, or the set containing all programs.

                Unfortunately, optimizing compilers need to make such decisions all the time. E.g. - do two operations (program statements or sub-statements) depend on each other, or can they be executed in parallel? Ummm, that's in the general sense undecidable. Can two references/pointers overlap? Undecidable. Can two subsequent iterations of a loop be executed in parallel (to facilitate automatic vectorization)? Guess what, undecidable again.

                Can the compiler figure out some "common cases" where it can prove that certain things are fine and optimizations can proceed? Sure, but those will necessarily be a subset of all cases, and in practice that also means benchmark speed results will likely be cherrypicked (i.e., they produce a compiler which handles the published, open-source benchmarks really well, but when you try your own code, it stops working well).

                Why does DSP work with VLIW in general? Simple really: (1) you get really large basic blocks, and control flow doesn't matter, and (2) dataflow is highly regular as well (the data just gets streamed in, you don't need a fancy cache hierarchy).

                Now is DSP representative of general purpose workloads, like SPECint/fp or server workloads? F no.

                BTW, to preempt certain "arguments" - people who summarily dismiss a math theorem are to me like the followers of Lysenko/Lysenkoism.
                Last edited by vladpetric; 04 January 2024, 12:29 AM.

                Comment


                • #9
                  Originally posted by vladpetric View Post
                  Unfortunately, optimizing compilers need to make such decisions all the time. E.g. - do two operations (program statements or sub-statements) depend on each other, or can they be executed in parallel? Ummm, that's in the general sense undecidable. Can two references/pointers overlap? Undecidable. Can two subsequent iterations of a loop be executed in parallel (to facilitate automatic vectorization)? Guess what, undecidable again.

                  Can the compiler figure out some "common cases" where it can prove that certain things are fine and optimizations can proceed? Sure, but those will necessarily be a subset of all cases, and in practice that also means benchmark speed results will likely be cherrypicked (i.e., they produce a compiler which handles the published, open-source benchmarks really well, but when you try your own code, it stops working well).
                  TBH modern systems mainly use the concept of thread for utilizing multiple cores (now: the default scenario). Threads are also nondeterministic and have similar kinds of data dependency issues. We're often just using very basic patterns for multi-threading like fork-join, work-stealing queues etc. We're not doing "real" multi-processing where there can be control/data dependencies between every single instruction. Often the algorithms aren't even trying to use an optimal number of threads. Also SIMD instructions are difficult to use. Many compilers (like oracle java's jit) still only use very basic add/mul instructions, maybe up to four parallel values. They can't build a compiler that produces optimal h.266 encoding code from primitive adds, muls, and bit shifts.

                  Just like we have E and P cores in modern systems, there could be VLIW cores for some stuff. Similar to GPGPU offloading and other accelerators. VLIW is perfectly fine for those use cases where an efficient algorithm can be written. For general purpose code the current OoOE cores might be better. But they're wasting lots of space on chip for guessing stuff. Anyway, our computers are often fast enough for ordinary mundane computing. VLIW accelerator cores could help a lot. We can prove they require less memory bandwidth and can have higher IPC.
                  Last edited by caligula; 04 January 2024, 12:52 AM.

                  Comment


                  • #10
                    Originally posted by caligula View Post

                    TBH modern systems mainly use the concept of thread for utilizing multiple cores (now: the default scenario). Threads are also nondeterministic and have similar kinds of data dependency issues. We're often just using very basic patterns for multi-threading like fork-join, work-stealing queues etc. We're not doing "real" multi-processing where there can be control/data dependencies between every single instruction. Often the algorithms aren't even trying to use an optimal number of threads. Also SIMD instructions are difficult to use. Many compilers (like oracle java's jit) still only use very basic add/mul instructions, maybe up to four parallel values. They can't build a compiler that produces optimal h.266 encoding code from primitive adds, muls, and bit shifts.

                    Just like we have E and P cores in modern systems, there could be VLIW cores for some stuff. Similar to GPGPU offloading and other accelerators. VLIW is perfectly fine for those use cases where an efficient algorithm can be written. For general purpose code the current OoOE cores might be better. But they're wasting lots of space on chip for guessing stuff. Anyway, our computers are often fast enough for ordinary mundane computing. VLIW accelerator cores could help a lot. We can prove they require less memory bandwidth and can have higher IPC.
                    CPU Threads are done manually, pretty much all the time (as opposed to determined automatically by the compiler). There are many cases where the programming language/compiler gives you some nicer interface than just spinning threads by hand (e.g., openmp where you can say that a loop should be run on multiple threads), but at the end of the day it's still manual thread delineation.

                    Same thing for threads in a GPU - yes, you have thousands of threads, but you have to bundle them up in waves of ~100 elements, which execute in lockstep.

                    Over there, the main reason that NVidia is ahead, is that they recognized that universal compilers can't be written, so they're better off customizing their drivers for each and every graphics-intensive AAA game (yes, NVidia actually does that ... ).

                    Sure, if we're to write/customize algorithms directly for the specific architecture, then that architecture will work reasonably well. But that to a large degree is cherrypicking btw.

                    Comment

                    Working...
                    X