Announcement

Collapse
No announcement yet.

OpenBLAS 0.3.20 Adds Support For Russia's Elbrus E2000, Arm Neoverse N2/V1 CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by jabl View Post

    I'm referring to your statements about M1 being somehow VLIW-like.
    Yes, I got that, I just didn't get what you think is the difference between M1 long instruction words that contain multiple RISC instructions that run in parallel and Elbrus VLIW that contain multiple RISC instructions that run in parallel.

    You keep mentioning out of order, but these are all parallel instructions, the whole point is there is no order.

    Comment


    • #62
      FWIW, I just want to express my best wishes for the people of Ukraine and Russia.

      Let's consider that minds are not going to be changed on this situation, in these forums. So, I don't see any point in making statements someone is certain to take issue with. Also, I can tell you that when I hear people criticize my own country, it makes me a bit defensive, even when they're saying things similar to views I've expressed, myself!

      I'm interested in the tech that Russians are building. Whether you're for or against Russia, I think you should be curious, as well. So, I hope we can continue to have coverage of the Russian tech sector without so much heated debate.

      Let's hope for the best outcome for Ukraine, because hoping is all any of us can really do about it. And try to remember that we're all geeks, here.

      Comment


      • #63
        Originally posted by mshigorin View Post
        First, let's have a look at the tech.
        Thanks for that info. The performance of more recent iterations sounds very promising!

        Comment


        • #64
          Originally posted by Khrundel View Post
          Being a VLIW performance will scale worse with frequency than OoO,
          Why do you say that? Is it because there's more memory latency to hide, or are you referencing something about the micro-architecture, itself?

          Edit: based on your later post, it sounds like you're worried about having to hide greater memory latencies. However, mshigorin said it's got a sort of prefetcher:

          "APB (automatic prefetch buffer) programmable to deliver RAM contents at given patterns into L2 cache predictably"

          Prefetching is essential, even for modern, out-of-order cores. Because even they don't have big enough reorder buffers to hide the latency of a read that has to go all the way out to DRAM. And deep reorder buffers presume you can even find enough work to do that doesn't depend on the missing data.
          Last edited by coder; 23 February 2022, 10:03 PM.

          Comment


          • #65
            Originally posted by coder View Post
            FWIW, I just want to express my best wishes for the people of Ukraine and Russia.
            Amen to that
            Originally posted by coder View Post
            FWIW, I just want to express my best wishes for the people of Ukraine and Russia.
            Let's consider that minds are not going to be changed on this situation, in these forums. So, I don't see any point in making statements someone is certain to take issue with. Also, I can tell you that when I hear people criticize my own country, it makes me a bit defensive, even when they're saying things similar to views I've expressed, myself!

            I'm interested in the tech that Russians are building. Whether you're for or against Russia, I think you should be curious, as well. So, I hope we can continue to have coverage of the Russian tech sector without so much heated debate.

            Let's hope for the best outcome for Ukraine, because hoping is all any of us can really do about it. And try to remember that we're all geeks, here.
            .
            Speaking out against the war propaganda anywhere online actually matters more than you think.
            Not just because it gives those who have fallen for it a small opportunity to rethink their world view.
            But also because Echelon, or whatever name/version number it goes under these days also reads these posts to report back on how well the propaganda is working in terms of sentiment towards Russia, and if people are accepting that CNN/FOX/BBC/ABC etc agenda that its OK for their puppet government in the Ukraine to shell its civilians because they didn't accept the military coup they sponsored.
            And the best thing we can do to promote peace is have that system overwhelmingly tell them the internet doesn't believe them, doesn't support them, and they can go f' themselves with a BBC (Big Black C...)

            Comment


            • #66
              Originally posted by mSparks View Post
              and no, caches don't help, because cache misses can be literally life threatening in RT systems.
              I've seen a CPU that enabled you to lock specific cachelines. Granted, it was a single-core CPU with no multi-processor support. Cache-locking seems like it can get more problematic in multi-CPU scenarios, but if you know the data is truly private, then it's still potentially workable.
              Last edited by coder; 23 February 2022, 10:04 PM.

              Comment


              • #67
                Originally posted by jabl View Post
                Encoding instruction dependencies in the ISA is the idea behind 'dataflow' architectures, which was a hot research topic a couple of decades ago. I think one reason they never really caught on is that encoding the dependencies bloats the code to the point that what you win in avoiding OoO (or only doing OoO per basic block instead of for every single instruction) you lose in spending that same power on bigger instruction caches and on instruction bandwidth.
                For sure that should be there any pros and cons, for both sides.
                The Idea I have from the Itanium fallout in arguments against VLIW was that the compiler is very difficult to implement if you want to have good performance, because you don't have runtime input..

                Intel released its last Itanium CPU around 2018( ...just to test the waters ),
                In my opinion one of the key points why it didn't worked out is that it was a vendor lock in ISA..and the market didn't wanted a monopoly..

                With amd64, you have Intel and AMD releasing CPUs to the market, and if one put its prices up, you can go around and buy from the other company..
                That big advantage for the clients didn't existed for Itanium..

                Em relation to Elbrus, for what I read, they have a lot of time to create a compiler, they had previously lcc, I believe?they are now using a gcc version, I think, I would love to know in what state they are with it..
                Does the compiler already supports Elbrus 16S with also simd instructions and such?

                Another CPU of my interest is multiclet S2 operating at 2.5Ghz, 16nm, which seems to be a monster, but I don't know in what state it is, for what I read about, they were adding support for llvm compiler.
                I believe no Linux port exist for it, or is it planned?

                Comment


                • #68
                  Originally posted by jabl View Post
                  Encoding instruction dependencies in the ISA is the idea behind 'dataflow' architectures, which was a hot research topic a couple of decades ago. I think one reason they never really caught on is that encoding the dependencies bloats the code to the point that what you win in avoiding OoO (or only doing OoO per basic block instead of for every single instruction) you lose in spending that same power on bigger instruction caches and on instruction bandwidth.
                  Thanks for this.

                  Yes, the promise of EPIC was runtime scheduling that could potentially include OoO, branch-prediction, and speculative execution. Intel never pursued these, possibly because they had already given up on IA64 by the time they would have.

                  Also, IA64 encoded dependencies between instruction word triplets, not basic blocks. The dependency window was 64 of these packets, IIRC. And the fundamental reason they had to do this was for backwards compatibility with binaries compiled for older architectures. Each generation of CPU could have different widths, pipeline latencies, and other constraints (e.g. limits on # of concurrent register writes). So, you must do some amount of runtime scheduling.

                  Comment


                  • #69
                    Originally posted by tuxd3v View Post
                    in arguments against VLIW was that the compiler is very difficult to implement if you want to have good performance, because you don't have runtime input..
                    JIT + PGO could help with this. Basically, you can do like GPUs and dynamically recompile code if it's performing poorly. How to know if it's performing poorly? Periodically check some performance counters that indicate things like % of missed branches, memory stalls, etc.

                    Originally posted by tuxd3v View Post
                    Intel released its last Itanium CPU around 2018( ...just to test the waters ),
                    No, I'm pretty sure the last several iterations of IA64 CPUs were only by contractual obligation. I think they gave up on IA64 after about the second generation CPU.

                    Originally posted by tuxd3v View Post
                    In my opinion one of the key points why it didn't worked out is that it was a vendor lock in ISA..and the market didn't wanted a monopoly..

                    With amd64, you have Intel and AMD releasing CPUs to the market, and if one put its prices up, you can go around and buy from the other company..
                    That big advantage for the clients didn't existed for Itanium..
                    Yes, I think this was a big factor working against it.

                    Comment


                    • #70
                      Originally posted by mSparks View Post
                      Speaking out against the war propaganda anywhere online actually matters more than you think.
                      Not just because it gives those who have fallen for it a small opportunity to rethink their world view.
                      But also because Echelon, or whatever name/version number it goes under these days also reads these posts to report back on how well the propaganda is working in terms of sentiment towards Russia,
                      Let's say you're right. If the point of your posts is primarily to feed their algorithms, it might be self-defeating. You could just end up informing them of which facts to refute.

                      I'm not trying to censor you. I'm just offering another point of view on this.

                      What I really want is for Michael to keep covering all tech, whether it's from China, Russia, or anywhere else. I don't think he's easily dissuaded by arguments in the forums, but it would be nice if we can also have intelligent discussions about this tech, and not get bogged down in politics.

                      Thanks to you and other Russians for being here and helping us understand your cool CPUs.

                      Comment

                      Working...
                      X