Announcement

Collapse
No announcement yet.

The ClearFog ARM ITX Workstation Performance Is Looking Very Good

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by uid313 View Post
    Yeah, but how does it compare against Intel Core i5 and i7?
    Or even an AMD Epyc or Intel Xeon?
    It is very hard to get an Apples to Apples comparison here, probably the closest machine is an Intel Xeon Silver. A lot has to do with the workload you are comparing as well as the codebase you are using. There are some multi-threaded benchmarks where the LX2160A beats out many of the larger Intel and AMD chips. Anything single threaded obviously has an advantage on CPUs that run at the higher clock frequency. A lot also has to do with the maturity of the codebase. As Linus Torvalds pointed out earlier this year, ARM Servers are a non-starter because they aren't being used in day to day workstations by developers. If developers aren't developing directly on the architecture, they are less likely to spend a lot of time hyper-optimizing the code. We are trying hard to address this problem.

    A simple case in point. The vp9enc benchmark in the phoronix-test-suite was missing an option that was added to help improve performance on many threaded machines. Nobody noticed this or questioned the results until I started profiling the bottleneck. Personally I never would have looked at this if I wasn't trying to figure out why my 16-core machine wasn't using all the cores. Linus is right, you get optimizations by developers based on what they are using. Currently most ARM development is hyper-focused on what they have. Unfortunately what many developers have had for the longest time is a Raspberry Pi. Think of how many developer hours were spent hyper-optimizing code for an almost defunct ARMv6 architecture?

    In the benchmarks I posted against the Odroid-N2 you can see the GraphicsMagick benchmarks which are single threaded are actually pretty competitive with Intel, but then the single threaded GIMP benchmarks are nowhere close. This is most likely because GraphicsMagick has been optimized for small headless ARM boards, and GIMP hasn't been looked at for ARM architectures because nobody would run it there. You can take the GraphicsMagick benchmarks and compare them to other x86 results and then scale them against clock frequency. Remember this is early production SOCs as well. We are at 2Ghz now, production will be 2.2Ghz so a 10% boost, and we have a stable overclock at 2.4 Ghz which gives a per core 15% boost.

    Comment


    • #22
      Originally posted by BNieuwenhuizen View Post
      So comparing the numbers here to some ryzen 1700 on the open benchmarking size, it seems this board+SoC has 50-75% of the performance.

      Given that both perf is lower and price is higher (than combined ryzen + motherboard), what is the target market of this?

      Main thing I'm seeing is that power might be lower and that requires less cooling (The board is shown without fan, which might point to that direction)
      The target market is simple, Developers that want to develop natively on their target architecture. Please see my other post regarding comparison benchmarks, workloads and optimizations. x86 has basically the entire history of Linux full of optimizations. ARM is catching up fast, but still has a long way to go. A good example is what Google has done with ChromeOS. That sort of progress won't happen until developers are using ARM full time.

      If you want a Ryzen based system, go right ahead, they are great chips. We aren't trying to overtake the general consumer market. We are building a tool for customers that need it. Our target is to provide a better ARM workstation at far less than existing options that are out there.

      Also note this is a Workstation target market, not consumer desktop. Really a better comparison would be against the Epyc or Xeon based cpus, as they are intended to be highly reliable and long term CPUs. Clock for Clock x86 is the winner no doubt, but then you are comparing a CPU with 2x to 8x's the TDP of the LX2160.

      Comment


      • #23
        Originally posted by linux4kix View Post
        The vp9enc benchmark in the phoronix-test-suite was missing an option that was added to help improve performance on many threaded machines. Nobody noticed this or questioned the results until I started profiling the bottleneck. Personally I never would have looked at this if I wasn't trying to figure out why my 16-core machine wasn't using all the cores.
        I've also questioned the configuration of libvpx and libaom (in pretty much every encoder benchmark on phoronix) because it's no secret that everyone is bound to misconfigure these at first (because of their bad defaults plus bad documentation) – what you get by default is in every way the slowest encoding! I guess you're talking about the threading model, which is tile based, so no matter how many threads you tell them to use, they are unable to parallelize beyond the number of tiles, which defaults to 1… The solution is to either set a sensible number of tiles or enable the automatic `--row-mt` option (which really should have been the default).

        But as you say, a lot depends on the codebase, and I think a codebase full of SIMD probably isn't a fair comparison across architectures. Which benchmarks are written in straight C? C-Ray?

        Comment


        • #24
          Originally posted by andreano View Post

          I've also questioned the configuration of libvpx and libaom (in pretty much every encoder benchmark on phoronix) because it's no secret that everyone is bound to misconfigure these at first (because of their bad defaults plus bad documentation) – what you get by default is in every way the slowest encoding! I guess you're talking about the threading model, which is tile based, so no matter how many threads you tell them to use, they are unable to parallelize beyond the number of tiles, which defaults to 1… The solution is to either set a sensible number of tiles or enable the automatic `--row-mt` option (which really should have been the default).

          But as you say, a lot depends on the codebase, and I think a codebase full of SIMD probably isn't a fair comparison across architectures. Which benchmarks are written in straight C? C-Ray?
          yes, I opened a request to have --row-mt, which is now included in the test.

          That is a good question and one I am trying to figure out. I think it is fair to compare widely used applications that use SIMD for acceleration like libvpx. This is a big part of accelerating certain computations on the chip but very few projects have the maturity where both instruction sets are similarly optimized. I think a lot of libraries used in ChromeOS are a good choice because the OS needs to be optimized for both x86 and ARM.

          Even certain workloads are not similar, for instance the kernel compile benchmark. The test is compiling the default kernel config for the native architecture. While at face value this seems like a good comparison, the reality is because of how ARM and ARM64 is designed there is a lot more architecture and driver code to be built. Even on my build machine using a cross compiler an ARM64 defconfig build takes over twice as long as an x86_64 defconfig build.

          Part of what I am doing right now is trying to evaluate what is a good representative set of workload benchmarks for this comparison.

          Comment


          • #25
            This is for sure a very Nice Board, and a must have( if the price is competitive )

            The closest thing we saw, was the AMD Opteron A1XX processor in a board, but no testings with it..
            Its Like a Ghost..
            Comparing to the AMD Opteron, I am following another Chip that makes me curious about is the:
            Baikal-M

            It has 10 MB Cache, 8 cores aCortexA57( it should be like the AMD Opteron A1xx or so..though the amd one had higher clock frequencies, but half the core count, I think.. )
            Although it has a big advantage, in my point of view..
            A 8 Core MaliT628( which is nice to attach to a display.. without consuming 50W of power for graphics.. ).
            No words about the Power Consumption yet, has the processor are still to ship samples in 2nd half 2019...

            Comment


            • #26
              A very interesting board, indeed. I've been wondering for years why no company would bring an ARM-based workstation motherboard to the market. This finally seems like a suitable and affordable candidate to build a daily driver around.

              linux4kix Can you perhaps tell us a bit about your experiences in plugging a discrete graphics card into this board? One with an AMD or NVIDIA GPU? Are there drivers available to make such mainstream cards work on an ARM64 architecture yet, with 3D hardware acceleration and all? Perhaps the open source AMD GPU drivers could be made to work with some minimal patches? Thanks for taking the time to answer our questions here, by the way.

              Comment


              • #27
                Really like this board targets developers.

                Comment


                • #28
                  And if I could suggest a comparison, I will evaluate against a ZCU102. Not only for the performances, but for the quality of the specs, the support, the various coding materials.

                  Comment


                  • #29
                    Originally posted by cyring View Post
                    And if I could suggest a comparison, I will evaluate against a ZCU102.
                    That boards are around ~2500 $, ...you can't compare them.
                    They are for a different purpose..

                    Comment


                    • #30
                      This board is expected to have multiple 10GbE SFP+ connections, Gigabit Ethernet, mPCIe, SATA ports, and socketed DDR4 memory support
                      Now THAT is sexxy. Especially those 10GbE SFP+ ports. Request: Linux and FreeBSD network performance on 10GbE.

                      Wanna see

                      sustained bandwith, packets per second TCP, UDP, mixed, sustained. And then set up some complex routing/firewall/IDS rules, and run again. Both Linux, and FreeBSD.

                      this might be some serious networking hardware we have. It would have been the icing on the cake if it had 2.5/5G ethernet.

                      Comment

                      Working...
                      X