Announcement

Collapse
No announcement yet.

Arm Neoverse N2 Support Added To The LLVM Clang 12 Compiler

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Arm Neoverse N2 Support Added To The LLVM Clang 12 Compiler

    Phoronix: Arm Neoverse N2 Support Added To The LLVM Clang 12 Compiler

    In September Arm began adding Neoverse N2 support to the open-source compilers initially with GCC and now the support has been merged into LLVM Clang 12 as well...

    http://www.phoronix.com/scan.php?pag...-LLVM-Clang-12

  • #2
    Yes !! Another step to the forward march this decade ushering in the Age of ARM.

    Here is some info about why SVE or Scalable Vector Extensions on top of ARMs existing NEON SIMD extensions are so important and innovative. There is nothing like it in the x86 world and it is beginning to show.

    Here is some boilerplate from ARM


    The Arm Scalable Vector Extension, or SVE, is an extension for the AArch64 instruction set of the Armv8 architecture. It is a key technology furthering the ability of Arm processors to efficiently address the computation requirements of HPC, Data Analytics, Machine Learning, and other applications. With the arrival of the first SVE-enabled hardware platform from Fujitsu, we are gaining experience with SVE. And we are finding it is also applicable in areas such as media processing, encryption/decryption and network processing such as the HPC-focused Message Passing Interface (MPI). We also see a role at the infrastructure edge supporting DSP & 5G L1 packet processing and enabling use cases like autonomous vehicles. This is a technology that truly spans from supercomputers to edge devices.

    SIMD and vector extensions are not new – GPUs use SIMD for machine learning training and inference operations – but Arm’s SVE implements these capabilities in a unique way. One of the main features of SVE is its flexibility. Unlike traditional SIMD extensions, vector width is not hard coded in SVE. Applications compiled for SVE on one platform will run on any valid SVE implementation, no matter the width. Valid implementations are 128b to 2048b in increments of 128b. With the announcement of the Arm Neoverse V1 platform (codenamed Zeus), we are disclosing the implementation of SVE 2x256 as a part of the Neoverse V1 IP core.

    Thus far, a number of partners are developing Neoverse V1-based solutions optimized for High Performance Computing applications such as simulation, data analytics, and deep learning. You can expect to see the results of this work hitting the market soon.

    One of these partners is SiPearl who has previously disclosed a design using Zeus. "SiPearl has chosen Arm's Zeus core to power its first generation High Performance Server process. The addition of SVE on the Neoverse roadmap brings a lot of potential to HPC, cloud, and machine learning workloads, and we look forward to contributing to the Arm ecosystem as it grows for technical computing and beyond," noted Craig Prunty, VP of Marketing and Business Development for SiPearl.

    The Arm ecosystem is indeed growing rapidly in HPC, on the edge and in the cloud. An example of this is our partner NVIDIA® who recently announced support for Arm platforms. Their software stack takes advantage of Arm’s strengths in variety of implementation and brings Arms servers in-line with alternatives using NVIDIA GPUs. With Neoverse V1, we are excited to see how NVIDIA software and GPUs will fully take advantage of this platform.

    With the recent unveiling of the world's fastest supercomputer – Fugaku from Riken, based on Fujitsu’s A64FX processor - the first hardware implementation of SVE has hit the market. Fujitsu has done their own implementation of the Arm architecture, and has implemented HBM (High Bandwidth Memory), providing 1 TB/s of memory bandwidth to help fill the SVE pipelines. As we gain experience with this amazing platform, we are learning of the incredible power available at the high end of supercomputing. With the #1 spot on the Top500.org, supercomputer Fugaku has broken performance records on 4 of 5 key HPC metrics. The system is delivering results earlier than expected and is impacting important issues such as COVID-19 research (ref: https://asia.nikkei.com/Business/Tec...ugaku-says-yes).

    “Fugaku developed by Riken and Fujitsu has achieved four supercomputer crowns, largely thanks to strong floating-point calculation engine with SVE and high throughput cache and memory structure in Fujitsu’s A64FX processor. We welcome the expansion of SVE technology and ecosystem in the future," said Takumi Maruyama, Principal Expert, Fujitsu.



    And this also from The Next Platform......
    Wombat’s new Fujitsu A64FX processors, developed by Fujitsu, are the first to use the Arm Scalable Vector Extension (SVE) technology, a unique instruction set that offers flexibility for processing vectors—strings of numbers treated as a coherent unit in memory. SVE can process vectors as short as 128 bits and as long as 2,048 bits without forcing users to recompile their scientific codes for different vector lengths.

    For those following SVE questions around Arm in HPC in more broadly, look no further than this presentation recorded for the Arm HPC User Group from John Stone (of NAMD fame, among other things). He walks through it in the slides but in essence, he took kernels from NAND that had been hand-optimized for Intel AVX and other architectures and tuned them to SVE to see what kind of performance he might get. What stands out here is what Arm obviously did to architect the SVE instruction set to make it easy for compilers to automatically vectorize. He shows this by providing an example of an AVX optimized kernel and the code doesn’t even fit on the slide, even with reduced font. The SVE was a nice, neat block, showcasing how the SVE instructions are architecture overall, again, in this example of a classic HPC code.

    https://www.nextplatform.com/2020/11...s-in-the-wild/





    x86 is looking increasingly primitive in this age of heterogeneity, AI, ML, Perf per Watt and flexibility. The Age of ARM is here.
    Last edited by Jumbotron; 27 November 2020, 04:49 PM.

    Comment


    • #3
      Well innovative maybe is the wrong word, old Cray mainframes used to work this way , but I do agree that true vector extensions are far superior to packed SIMD, besides the main benefit of future proof binaries they enable for a much cleaner instruction set

      Comment


      • #4
        Neoverse N2 has SVE and SVE2? That is sweet.

        Comment


        • #5
          Originally posted by baryluk View Post
          Neoverse N2 has SVE and SVE2? That is sweet.
          Yes it is !!

          What we are about to see is a complete Battle Royale, Cage Match, Fight to the Death in the Silicon world. Intel vs AMD vs Nvidia. Why Nvidia? Well...of course...they're a CPU company now with the purchase of ARM. And all THREE are GPU companies now what with Intel buying out half of AMD's best engineers 3 years ago to produce their first REAL perfomant GPUs.

          They are all going heterogeneous computing architecture, ARM first, followed by AMD and now Intel. Intel and AMD both now own FPGA companies, Altera for Intel and Xilinx for AMD. ARM has no internal FPGA division as of yet, but they make ARM cores that are used for prototyping IP for FPGAs that work with Xilinx and Gowan. But ARM has a metric crap ton of IP and designs for DSPs, Neural Net Processors, Advanced Math Matrix processors, Visual Cores, Audo Cores, Sensor Hubs, Storage and Network traffic controllers....pretty much every IP and architecture that's NOT FPGA. Also an interconnect fabric CCIX that is more capable and flexible than either CXL for Intel and Infinity Fabric for AMD.

          There is going to be SO much innovation for the rest of this decade we can hardly imagine it now. But my bet is on ARM. x86 is just too old, too primitive, too costly and WAY too slow to innovate at the speed that the heterogeneous compute workloads with AI, ML, Vision, VR, AR, IoT and Power per Watt cost structures call for now all the way from Supercomputers and HPC, to Banking and Finance and all the way down to your Smart Watch or Smart Thermostat or even Drone. But that's the reason Intel bought an FPGA company and finally decided after 50 years of making shitty GPUs that they were not going to convince the market nor could they successfully engineer a many core CPU based accelerator card and decided to throw in the towel and pay a bunch of ex AMD engineers to make decent GPGPUs like Nvidia and AMD. And AMD saw the writing on the wall and bought the other big FPGA player, Xilinx, to get a foothold in accelerated AI and ML workloads that need more specialized "oomph" than a GPU could give.

          And SVE and SVE2 vector extensions by ARM are really going to give them a leg up. It already has. FUGAKU for both the Raw Compute Power win AND Power per Watt win.

          Below is a snippet from an article from The Next Platform going over the recent deal with AMD and Xilinx. It's kind of humorous as they use the metaphor of 3 large armies about to do battle. Makes for interesting reading....


          Now, we have three big camps in the war over datacenter and edge compute. It is fall, the day is partly cloudy, and the economic weather is uncertain. But the troops are well fed and eager, their weapons glisten in the sun and the horses stamp their feet impatiently. Banners wave in the chilly wind that warns that winter will not wait. And so the war cannot, either. There will never be a better time to fight, or a better way to settle the architectural disputes.

          On one hillside, we have Nvidia with its Mellanox and Cumulus networking plus its hegemony in datacenter GPU compute for HPC and AI and the potential to add Arm server chips and the Arm Holdings licensing method and 500-plus licensees to augment the Nvidia army.

          Down in the lush valley is Intel, which has lots of provisions and plenty of money to fund the war, but it has nothing to gain from a fight except the loss of future fruitfulness and ease. Intel has added to its compute engine and networking arsenal and has given up on storage media to brace itself for the coming attack, and its engineers have struggled to build its engines of war and fortify its moat.

          And on the other hillside is the combination of AMD and Xilinx. AMD, unlike Nvidia, believes in FPGAs as well as CPUs and GPUs. Much as Intel did five years ago as it spent $16.7 billion to acquire Xilinx rival Altera and scrapped its “Knights” massively parallel processor dreams to create the Xe line of discrete GPU accelerators.



          https://www.nextplatform.com/2020/10...h-xilinx-deal/

          Comment


          • #6
            Slightly off topic but still in the realm of ARM news, comes word that a developer has now patched QEMU in order to run 64bit Windows for ARM in a hypervisor on Apple's M1 ARM based chip in an Apple Macbook Air.

            Not only that but once Windows for ARM was running on the ARM M1 Mac he was able to then run x86 Windows programs as well. And he says the performance is pretty zippy. Here's a snippet from the article....


            Developer Alexander Graf, however, took to Twitter today to share his achievement: successfully being able to virtualize ARM Windows on Apple Silicon.
            Who said Windows wouldn't run well on #AppleSilicon? It's pretty snappy here 😁. #QEMU patches for reference: https://t.co/qLQpZgBIqI pic.twitter.com/G1Usx4TcvL

            — Alexander Graf (@_AlexGraf) November 26, 2020

            Note that he was able to virtualize the ARM version of Windows and not the x86 version. Virtualizing an x86 version of Windows might have been much difficult as compared to the ARM version as Apple’s M1 chip has a 64-bit ARM architecture.

            Although, Graf also mentions in one of his tweets that “Windows ARM64 can run x86 applications really well. It’s not as fast as Rosetta 2, but close.”

            He was able to achieve this by running the Windows ARM64 Insider Preview by virtualizing it through the Hypervisor.framework. This framework allows users to interact with virtualization technologies in user space without having to write kernel extensions (KEXTs), according to Apple.

            Moreover, this wouldn’t have been possible without applying a custom patch to the QEMU virtualizer. QEMU is an open-source machine emulator and virtualizer. It’s known for “achieving near-native performance” by executing the guest code directly on the host CPU. So it goes without saying that only ARM guests can be perfectly virtualized on an ARM machine like the M1-supported Macs.


            Below you will find links to the entire article on The 8-Bit along with the link to his patches to QEMU and his Twitter feed detailing further things about his accomplishment.


            https://the8-bit.com/developer-succe...on-m1-macbook/

            https://lists.gnu.org/archive/html/q.../msg06499.html

            https://twitter.com/_AlexGraf/status...81983879569415

            Comment


            • #7
              Originally posted by Jumbotron View Post

              Yes it is !!

              What we are about to see is a complete Battle Royale, Cage Match, Fight to the Death in the Silicon world. Intel vs AMD vs Nvidia. Why Nvidia? Well...of course...they're a CPU company now with the purchase of ARM. And all THREE are GPU companies now what with Intel buying out half of AMD's best engineers 3 years ago to produce their first REAL perfomant GPUs.

              They are all going heterogeneous computing architecture, ARM first, followed by AMD and now Intel. Intel and AMD both now own FPGA companies, Altera for Intel and Xilinx for AMD. ARM has no internal FPGA division as of yet, but they make ARM cores that are used for prototyping IP for FPGAs that work with Xilinx and Gowan. But ARM has a metric crap ton of IP and designs for DSPs, Neural Net Processors, Advanced Math Matrix processors, Visual Cores, Audo Cores, Sensor Hubs, Storage and Network traffic controllers....pretty much every IP and architecture that's NOT FPGA. Also an interconnect fabric CCIX that is more capable and flexible than either CXL for Intel and Infinity Fabric for AMD.

              There is going to be SO much innovation for the rest of this decade we can hardly imagine it now. But my bet is on ARM. x86 is just too old, too primitive, too costly and WAY too slow to innovate at the speed that the heterogeneous compute workloads with AI, ML, Vision, VR, AR, IoT and Power per Watt cost structures call for now all the way from Supercomputers and HPC, to Banking and Finance and all the way down to your Smart Watch or Smart Thermostat or even Drone. But that's the reason Intel bought an FPGA company and finally decided after 50 years of making shitty GPUs that they were not going to convince the market nor could they successfully engineer a many core CPU based accelerator card and decided to throw in the towel and pay a bunch of ex AMD engineers to make decent GPGPUs like Nvidia and AMD. And AMD saw the writing on the wall and bought the other big FPGA player, Xilinx, to get a foothold in accelerated AI and ML workloads that need more specialized "oomph" than a GPU could give.

              And SVE and SVE2 vector extensions by ARM are really going to give them a leg up. It already has. FUGAKU for both the Raw Compute Power win AND Power per Watt win.

              Below is a snippet from an article from The Next Platform going over the recent deal with AMD and Xilinx. It's kind of humorous as they use the metaphor of 3 large armies about to do battle. Makes for interesting reading....


              Now, we have three big camps in the war over datacenter and edge compute. It is fall, the day is partly cloudy, and the economic weather is uncertain. But the troops are well fed and eager, their weapons glisten in the sun and the horses stamp their feet impatiently. Banners wave in the chilly wind that warns that winter will not wait. And so the war cannot, either. There will never be a better time to fight, or a better way to settle the architectural disputes.

              On one hillside, we have Nvidia with its Mellanox and Cumulus networking plus its hegemony in datacenter GPU compute for HPC and AI and the potential to add Arm server chips and the Arm Holdings licensing method and 500-plus licensees to augment the Nvidia army.

              Down in the lush valley is Intel, which has lots of provisions and plenty of money to fund the war, but it has nothing to gain from a fight except the loss of future fruitfulness and ease. Intel has added to its compute engine and networking arsenal and has given up on storage media to brace itself for the coming attack, and its engineers have struggled to build its engines of war and fortify its moat.

              And on the other hillside is the combination of AMD and Xilinx. AMD, unlike Nvidia, believes in FPGAs as well as CPUs and GPUs. Much as Intel did five years ago as it spent $16.7 billion to acquire Xilinx rival Altera and scrapped its “Knights” massively parallel processor dreams to create the Xe line of discrete GPU accelerators.



              https://www.nextplatform.com/2020/10...h-xilinx-deal/
              I do somewhat agree with you, but man, your post is full of over-exaggerations...

              Comment


              • #8
                Originally posted by Jumbotron View Post
                Slightly off topic but still in the realm of ARM news, comes word that a developer has now patched QEMU in order to run 64bit Windows for ARM in a hypervisor on Apple's M1 ARM based chip in an Apple Macbook Air.

                Not only that but once Windows for ARM was running on the ARM M1 Mac he was able to then run x86 Windows programs as well. And he says the performance is pretty zippy. Here's a snippet from the article....


                Developer Alexander Graf, however, took to Twitter today to share his achievement: successfully being able to virtualize ARM Windows on Apple Silicon.
                Who said Windows wouldn't run well on #AppleSilicon? It's pretty snappy here 😁. #QEMU patches for reference: https://t.co/qLQpZgBIqI pic.twitter.com/G1Usx4TcvL

                — Alexander Graf (@_AlexGraf) November 26, 2020

                Note that he was able to virtualize the ARM version of Windows and not the x86 version. Virtualizing an x86 version of Windows might have been much difficult as compared to the ARM version as Apple’s M1 chip has a 64-bit ARM architecture.

                Although, Graf also mentions in one of his tweets that “Windows ARM64 can run x86 applications really well. It’s not as fast as Rosetta 2, but close.”

                He was able to achieve this by running the Windows ARM64 Insider Preview by virtualizing it through the Hypervisor.framework. This framework allows users to interact with virtualization technologies in user space without having to write kernel extensions (KEXTs), according to Apple.

                Moreover, this wouldn’t have been possible without applying a custom patch to the QEMU virtualizer. QEMU is an open-source machine emulator and virtualizer. It’s known for “achieving near-native performance” by executing the guest code directly on the host CPU. So it goes without saying that only ARM guests can be perfectly virtualized on an ARM machine like the M1-supported Macs.


                Below you will find links to the entire article on The 8-Bit along with the link to his patches to QEMU and his Twitter feed detailing further things about his accomplishment.


                https://the8-bit.com/developer-succe...on-m1-macbook/

                https://lists.gnu.org/archive/html/q.../msg06499.html

                https://twitter.com/_AlexGraf/status...81983879569415
                Not impressive; virtualizing is easy.

                I would like to see Windows running on bare-metal M1...

                Comment


                • #9
                  Originally posted by tildearrow View Post

                  I would like to see Windows running on bare-metal M1...
                  That would have to be Microsoft's work, nobody can do that in their stead (as you don't allow any code wrapping the native Windows). Also who would write the drivers, e.g. graphics? Apple sells it with macOS and its users are happy with other OSes in VM. MS doesn't write drivers and hardware makers won't write drivers for this (as the hardware here is from Apple itself). Opensource community write drivers only for opensource OSes (this allows native Linux in the future, like we had Linux on every previous Mac - even with the T2 chip not allowing to access the internal storage, but you can still use external storage or SD card).

                  Comment

                  Working...
                  X