Announcement

Collapse
No announcement yet.

ClearFog ARM Workstation Speed Even More Compelling But Now Called HoneyComb LX2K

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Michael View Post

    Can't wait to see them, someone should send them over to Cavium... They won't send me hardware to review on the basis that the Phoronix Test Suite is somehow biased against ARM but I guess they don't understand PTS all that well with using actual upstream programs/sources for benchmarking.
    There are definitely some tests that skew heavily to x86 based CPUs. That isn't a fault of the benchmark but a state of the software eco-system. SolidRun sees that as the problem and instead of complaining that it "isn't fair" we want to put the hardware in the hands of the community to start addressing the issues. I have already asked for a few changes that you have happily integrated and will send pull requests for some more low hanging fruit.

    In the spirit of OSS, the change isn't just going to happen, somebody still has to put the time in and do the work. That is the spirit of HoneyComb. It is a place where lot's of workers bees come together to make something delicious.

    Comment


    • #12
      Originally posted by linux4kix View Post

      Those were quick snippets I pulled from my logs to illustrate a point that we are testing the full capabilities of the board and the results we are finding. Ultimately we will write this up as a white paper, but as an engineer who has been working on SBC ARM chips for many years I was intrigued by the results and wanted to post them in the easiest manner possible. I admitted in the post that these results while not isolated, were picked to prove that ARM based CPUs were not inherently inferior to modern x86. I did not even pick the biggest wins, nor the biggest losses, but some medium results that showed there is validation to our classification that this is a workstation class platform.

      These results are also an overclock which obviously production customers will not use, but we are open to making this available to the developer community. This is partially the reason for the branding change. We will focus to release the full benchmarks very soon that include base dev relase (2.0GHZ) - production (2.2GHz) - recommended OC (2.4 GHz) and max stable OC (2.5 GHz).

      While the results are exciting, we also can't guaranty every chip will achieve this performance. These are lab results only. As the production cycle progresses we will do internal testing and be as transparent as possible on results.
      As much as I like the 2160A (NXP was right to drop development on their T4240+ offerings).
      The 2160A tops out at 2.2GHz. Overclocked results from a manufacturer looks like a desperate move. Much like Intel Computex announcements.
      I have since long stopped comparing ISA:s. To me only 3 things factor general performance nowdays.
      Since most manufacturers building SoCs have excellent hardware teams that can balance transistor budget for performance spending the question becomes:
      1. How large power budget?
      2. How many transistors?
      3. Which process node?

      So if you take a 30-35W TDP CPU made in 14-22nm in 2014 (The LX2160A is a 2014-2015 design base, 30-35W 16nm FinFET probably) with the same general base design, cores, caches etc. And a reasonable amount of spending on optimization for code (which ARM has). Then performance is going to be same ballpark figures.

      That being said, a x86 on 7nm spending 35W is going to rip the 2160A to shreds by fiat. There really is no magic to it anymore.
      That is why the churn rate for at least the core-complex in the 2160A is to slow.
      The 2160A, being a ultra top-end ARM offering from NXP needs A75 or A76s to keep up the pace with recent x86 offerings.

      Comment


      • #13
        Originally posted by milkylainen View Post

        As much as I like the 2160A (NXP was right to drop development on their T4240+ offerings).
        The 2160A tops out at 2.2GHz. Overclocked results from a manufacturer looks like a desperate move. Much like Intel Computex announcements.
        I have since long stopped comparing ISA:s. To me only 3 things factor general performance nowdays.
        Since most manufacturers building SoCs have excellent hardware teams that can balance transistor budget for performance spending the question becomes:
        1. How large power budget?
        2. How many transistors?
        3. Which process node?

        So if you take a 30-35W TDP CPU made in 14-22nm in 2014 (The LX2160A is a 2014-2015 design base, 30-35W 16nm FinFET probably) with the same general base design, cores, caches etc. And a reasonable amount of spending on optimization for code (which ARM has). Then performance is going to be same ballpark figures.

        That being said, a x86 on 7nm spending 35W is going to rip the 2160A to shreds by fiat. There really is no magic to it anymore.
        That is why the churn rate for at least the core-complex in the 2160A is to slow.
        The 2160A, being a ultra top-end ARM offering from NXP needs A75 or A76s to keep up the pace with recent x86 offerings.
        The conservative clocking by NXP was to validate the SOC for their ultra-long-term support cycle (10 years). Our overclocking is an experiment that is fun and also something we thought the developer community would be interested in (Note the original benchmarks I posted were against the lower 2GHz clock, nothing dishonest going on). It also helps to illustrate that ARM on the desktop is a possibility for developers that want to work on the architecture they are coding for. A76s would be great to have no doubt, but then you also need to integrate the PCIe IP, SERDES architecture etc...all this takes time. If we just keep waiting around for the next faster chip because it is faster you will never ship a product.

        This is a tool to help move the needle of ARM beyond the SBC market. If it doesn't fit your needs, it doesn't fit your needs *shrug* there is no requirement for you to buy it. We have had many talks with many developers that are very excited for the product because it is a tool they would like to have.

        Comment


        • #14
          Originally posted by linux4kix View Post

          This is a tool to help move the needle of ARM beyond the SBC market. If it doesn't fit your needs, it doesn't fit your needs *shrug* there is no requirement for you to buy it. We have had many talks with many developers that are very excited for the product because it is a tool they would like to have.

          I believe that this will be a successful product, because it is much better than what was available until now and except for the future Qualcomm laptops, which will probably be overpriced and which might not allow the replacement of the operating system, there will be few if any other solutions for doing native ARM software development.


          GOOD:

          1. This will be the first ARM board where the compilation speed for a large software project will be high enough to allow comfortable native software development.

          There are people who are not experienced in setting up a cross-compiling environment and who waste their time by compiling programs on a Raspberry Pi or on other similar small computer, but even on a much faster MACCHIATObin board, with quadruple Cortex-A72, the compilation times are large enough that it is far more convenient to compile any ARM program on some computer with a decent Intel or AMD CPU.


          2. NXP usually provides decent documentation for their processors, hopefully this will be true also for LS2160A. Real documentation can make a huge difference for software development, especially when trying to take advantage of the special hardware included in this SoC, e.g. for fast networking.




          BAD:

          Cortex-A72 (and the other 5 Cortex-A cores that support only the ARMv8.0-A architecture) are obsolete, especially if you want to develop a heavily-multithreaded application, which can benefit from having 16 cores.


          The cores supporting the ARMv8.2-A architecture, which were introduced during the last 2 years (Cortex-A55, Cortex-A65, Cortex-A75, Cortex-A76 & Cortex-A77), among other useful additions, correct what in my opinion was a serious mistake in the initial 64-bit ARM architecture and they add some atomic read-modify-write instructions. The most useful of them are fetch-and-add and atomic swap.

          On a Cortex-A72, you can implement in software atomic primitives that are mostly equivalent with those provided in hardware in a Cortex-A75 or newer, but the equivalence is only partial. On Cortex-A72, those atomic primitives can be retried an unpredictable number of times, so in a program running on 16 cores it can be difficult to guarantee progress, bounded latencies or fairness in resource utilization.


          NXP probably had valid cost reasons to keep using Cortex-A72, but I believe that developing a chip with so many cores, using a core not well designed for many-core chips, is rather dumb.

          However, for applications where the 16 cores do mostly independent tasks, e.g. for program compilation or for many networking tasks, where the cores may process independent client requests or independent data streams, the NXP LS2160A should work fine.
















          Comment


          • #15
            As ARM boards go:

            SBSA = killer feature (if it works out)
            Socketed DDR4 = killer feature
            open ended PCIe = killer feature
            >4 big cores = killer feature
            100G Ethernet = killer feature
            4×SATA = killer feature
            M.2 = still somewhat killer feature, but getting common
            Last edited by andreano; 06-05-2019, 11:27 AM.

            Comment


            • #16
              Originally posted by Michael View Post
              PTS is somehow biased against ARM
              To elaborate on linux4kix' answer: The tests aren't to blame, but some tests are probably best to avoid if the objective is to compare hardware across architectures as fairly as possible:

              1. Codebases that are full of SIMD, but aren't equally optimized for all architectures (as roughly measured by the number of vectorized code paths). Video codecs are problematic for this reason – SIMD makes them several times faster.
              2. Tests that do different things on different platforms (such as compilation, especially with platform specific code like the kernel).

              Of course, if the objective is to find the best hardware for say, video encoding, then it makes sense to include x264 and so on…
              Last edited by andreano; 06-05-2019, 01:05 PM.

              Comment


              • #17
                Originally posted by AdrianBc View Post


                I believe that this will be a successful product, because it is much better than what was available until now and except for the future Qualcomm laptops, which will probably be overpriced and which might not allow the replacement of the operating system, there will be few if any other solutions for doing native ARM software development.


                GOOD:

                1. This will be the first ARM board where the compilation speed for a large software project will be high enough to allow comfortable native software development.

                There are people who are not experienced in setting up a cross-compiling environment and who waste their time by compiling programs on a Raspberry Pi or on other similar small computer, but even on a much faster MACCHIATObin board, with quadruple Cortex-A72, the compilation times are large enough that it is far more convenient to compile any ARM program on some computer with a decent Intel or AMD CPU.


                2. NXP usually provides decent documentation for their processors, hopefully this will be true also for LS2160A. Real documentation can make a huge difference for software development, especially when trying to take advantage of the special hardware included in this SoC, e.g. for fast networking.




                BAD:

                Cortex-A72 (and the other 5 Cortex-A cores that support only the ARMv8.0-A architecture) are obsolete, especially if you want to develop a heavily-multithreaded application, which can benefit from having 16 cores.


                The cores supporting the ARMv8.2-A architecture, which were introduced during the last 2 years (Cortex-A55, Cortex-A65, Cortex-A75, Cortex-A76 & Cortex-A77), among other useful additions, correct what in my opinion was a serious mistake in the initial 64-bit ARM architecture and they add some atomic read-modify-write instructions. The most useful of them are fetch-and-add and atomic swap.

                On a Cortex-A72, you can implement in software atomic primitives that are mostly equivalent with those provided in hardware in a Cortex-A75 or newer, but the equivalence is only partial. On Cortex-A72, those atomic primitives can be retried an unpredictable number of times, so in a program running on 16 cores it can be difficult to guarantee progress, bounded latencies or fairness in resource utilization.


                NXP probably had valid cost reasons to keep using Cortex-A72, but I believe that developing a chip with so many cores, using a core not well designed for many-core chips, is rather dumb.

                However, for applications where the 16 cores do mostly independent tasks, e.g. for program compilation or for many networking tasks, where the cores may process independent client requests or independent data streams, the NXP LS2160A should work fine.















                Some of the newer extensions were allowed to be included back to the older Cortex-A72 cores. For instance the LX2160A includes the crc32 and pmull extensions. Overall if you look at the OpenMP mult-threaded benchmarks you see that is an area where the LX2160A actually does quite well performance wise. You can see that it generally outperforms the Ampere, or ThunderX2 based systems sometimes outright and other times way above where they should align by just Ghz * Core count. One of the things that helps quite a bit is how NXP has designed the L2 and L3 cache, this seems to be quite optimized and a big performance boost for multi-core performance.

                If you have a specific test you would like to see the results from drop me an email or hit me up on twitter.

                Comment


                • #18
                  Originally posted by linux4kix View Post
                  Not all SOC implementations are the same. Please review my benchmarks at 2GHz against the Odroid N2 which has A73's. It is more about the power limit the cores are designed for.
                  I found your post here, thank you for benchmarks!

                  Where do you plan to sell it besides your web-site? For me it will be easier to order it from Amazon (if you allow to ship it to Russia) or from some European distributor.

                  Comment


                  • #19
                    Originally posted by RussianNeuroMancer View Post
                    I found your post here, thank you for benchmarks!

                    Where do you plan to sell it besides your web-site? For me it will be easier to order it from Amazon (if you allow to ship it to Russia) or from some European distributor.
                    The pre-order will only be available through our website. Once the boards are in full production they will be supplied through our normal redistributors. Please email [email protected] for more detailed information. Thanks for the interest.

                    Comment


                    • #20
                      Originally posted by linux4kix View Post
                      Please email [email protected] for more detailed information.
                      Ok, will do, when hardware will be in production.

                      Comment

                      Working...
                      X