Announcement

Collapse
No announcement yet.

ARM On Ubuntu 12.04 LTS Battling Intel x86?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    The situation with Android is ridiculous. I have HTC Desire and this is my first and probably last Android phone(unless the situation changes). It's rooted and running oxygen rom. I like it now, but I'm not going to buy a new one unless the manufacturer guarantees that they will support it atleast 2 years. These phones are so full of proprietary crap(graphics,radio,camera) that the rom-community have very hard time dealing with these things.

    Comment


    • #32
      Just found it. Atom & Ubuntu x64
      http://openbenchmarking.org/result/1...AR-A1082013582

      Comment


      • #33
        Originally posted by Milli View Post
        Very interesting. This basically nullifies this review.
        Does it? I re-read the review and it doesn't say NEON was used for the PandaBoard (i doubt Ubuntu for ARM comes in default with it, not everyone has NEON implemented on metal, i.e Tegra 2) , so things are pretty much equal, and we should expect equal speed-ups for everyone if SIMD was used.

        Comment


        • #34
          Originally posted by WillyThePimp View Post
          Does it? I re-read the review and it doesn't say NEON was used for the PandaBoard (i doubt Ubuntu for ARM comes in default with it, not everyone has NEON implemented on metal, i.e Tegra 2) , so things are pretty much equal, and we should expect equal speed-ups for everyone if SIMD was used.
          SSEx is default fp instruction set for x64 Operating Systems. Also scalar SSE is much faster then outdated x87 and should be used over x87 when possible. ARM neon can not be compared to SSE at this point since it does not support for double precision. Also I'm not sure that neon was not used in the benchmark.

          Comment


          • #35
            Originally posted by WillyThePimp View Post
            Does it? I re-read the review and it doesn't say NEON was used for the PandaBoard (i doubt Ubuntu for ARM comes in default with it, not everyone has NEON implemented on metal, i.e Tegra 2) , so things are pretty much equal, and we should expect equal speed-ups for everyone if SIMD was used.
            Why do you say that these two verdors' different SIMD implementations result in the same performance gains? Because they don't and that's why it's important that it's used on both platforms. Intel's implementation is more powerful than ARM's. It's just in a different league. The Atom supports SSE, SSE2, SSE3 and SSSE4 so it's not just SSE.

            Comment


            • #36
              Originally posted by atom01 View Post
              I didn't expect such a big difference on x64. The Atom N450 is one generation newer but basically the same as the N270.
              I saw a gain of 2.5x(!) on one test, a 2x gain on another one but generally around 40% faster.

              Comment


              • #37
                Originally posted by Milli View Post
                I didn't expect such a big difference on x64. The Atom N450 is one generation newer but basically the same as the N270.
                I saw a gain of 2.5x(!) on one test, a 2x gain on another one but generally around 40% faster.
                Not sure it makes sense to compare against Atom 64-bit given that the low power versions (Medfield included) don't have it. Also it seems SSE is not that much faster than x87 according to Agner Fog tables, though I agree it should be used.

                Comment


                • #38
                  Originally posted by ldesnogu View Post
                  Not sure it makes sense to compare against Atom 64-bit given that the low power versions (Medfield included) don't have it. Also it seems SSE is not that much faster than x87 according to Agner Fog tables, though I agree it should be used.
                  Instruction latency is only part of the story. Even if we put aside potential vectorization, using SSE over x87 should produce denser code because of the better register availability which should result in less pressure on narrow Atom decoder. But the real gain should come from using xmm registers for memory move/copy. And again, I'm not sure that neon was not used in pandaboard benchmarks.

                  Comment


                  • #39
                    Originally posted by Milli View Post
                    Why do you say that these two verdors' different SIMD implementations result in the same performance gains? Because they don't and that's why it's important that it's used on both platforms. Intel's implementation is more powerful than ARM's. It's just in a different league. The Atom supports SSE, SSE2, SSE3 and SSSE4 so it's not just SSE.
                    Let's go with the antithesis. Let's say we should expect no (significant) gains on either platform from compiling with non-vectorized SIMD instructions. Neither platform is using a 64 bits userspace, nor SIMD, anyways. This is for compatibility's sake, of course. If Canonical wants its OS on ARM devices, they have to support the most basic feature: A FP unit, because as I said, a so much of a leadeing SoC as Tegra 2 is, it hasn't got NEON. That's why I'm sure no SIMD instrucctions were used on the ARM machine.

                    Also, VFP and NEON are two very elegant SIMD instruction sets. While we cannot claim NEON implementation superiority over SSE(x), doing so the other way is equally wrong, it's a lie. Anyways, the default SSE2 in x86_64 is the first SSE that introduced, as far as I know, double precision formats for integer and fp operations, but VFP, in the other hand, is baseline for every modern ARM core and supports it. Shall we compare SSE vs NEON on 32 bits kernel? I'm pretty sure Atom is gonna keep loosing. ANd this is, with a much more mature support for it's architecture at compiler level, overall better system specs and higher power consumption.

                    Comment


                    • #40
                      Even Nvidia acknowledged lacking neon was a big issue, it's there in Tegra 3. So the easy solution would be to just set hardfp + neon as the minimum baseline, and completely ignore inferior hw such as Tegra 2.

                      Comment


                      • #41
                        Originally posted by WillyThePimp View Post
                        Let's go with the antithesis. Let's say we should expect no (significant) gains on either platform from compiling with non-vectorized SIMD instructions. Neither platform is using a 64 bits userspace, nor SIMD, anyways. This is for compatibility's sake, of course. If Canonical wants its OS on ARM devices, they have to support the most basic feature: A FP unit, because as I said, a so much of a leadeing SoC as Tegra 2 is, it hasn't got NEON. That's why I'm sure no SIMD instrucctions were used on the ARM machine.

                        Also, VFP and NEON are two very elegant SIMD instruction sets. While we cannot claim NEON implementation superiority over SSE(x), doing so the other way is equally wrong, it's a lie. Anyways, the default SSE2 in x86_64 is the first SSE that introduced, as far as I know, double precision formats for integer and fp operations, but VFP, in the other hand, is baseline for every modern ARM core and supports it. Shall we compare SSE vs NEON on 32 bits kernel? I'm pretty sure Atom is gonna keep loosing. ANd this is, with a much more mature support for it's architecture at compiler level, overall better system specs and higher power consumption.
                        While SSE2 is a double percision IEEE754 compliant SIMD unit, VFP isn't. Actually VFP isn't even a SIMD unit. VFP does vector ops by sequencing scalar ones.
                        On the other hand, the NEON instruction set doesn't have double precision instructions and its single precision is not fully IEEE754 compliant. Other disadvantages of NEON that i can think of are a shared register file with VFP while SSE has it's own registers (XMM) and moving a value from a NEON/VFP register to an ARM register is very slow, causing a 20 cycle pipeline stall.
                        So VFP is nowhere near as fast as SSE2 and NEON has much more limited use compared to SSE2.

                        Comment


                        • #42
                          Originally posted by ldesnogu View Post
                          Not sure it makes sense to compare against Atom 64-bit given that the low power versions (Medfield included) don't have it. Also it seems SSE is not that much faster than x87 according to Agner Fog tables, though I agree it should be used.
                          Like atom01 pointed out, the gain doesn't come from x64 but because SSE2 is the default fp instruction set for the x64 Ubuntu kernel. The same gains can be expected on the Atom Medfield when using the x86 kernel with SSE2 flags set.

                          Comment


                          • #43
                            Originally posted by WillyThePimp View Post
                            Let's go with the antithesis. Let's say we should expect no (significant) gains on either platform from compiling with non-vectorized SIMD instructions. Neither platform is using a 64 bits userspace, nor SIMD, anyways. This is for compatibility's sake, of course. If Canonical wants its OS on ARM devices, they have to support the most basic feature: A FP unit, because as I said, a so much of a leadeing SoC as Tegra 2 is, it hasn't got NEON. That's why I'm sure no SIMD instrucctions were used on the ARM machine.

                            Also, VFP and NEON are two very elegant SIMD instruction sets. While we cannot claim NEON implementation superiority over SSE(x), doing so the other way is equally wrong, it's a lie. Anyways, the default SSE2 in x86_64 is the first SSE that introduced, as far as I know, double precision formats for integer and fp operations, but VFP, in the other hand, is baseline for every modern ARM core and supports it. Shall we compare SSE vs NEON on 32 bits kernel? I'm pretty sure Atom is gonna keep loosing. ANd this is, with a much more mature support for it's architecture at compiler level, overall better system specs and higher power consumption.
                            I think you somewhere missing the point. SSEx now is a full replacement for old and outdated x87 in all respects and this is the reason why it should be used for testing in 32-bit systems also. x87 is still used in 32-bit Ubuntu to maintain compatibility with older cpus but this does not contradicts the fact that Intel is going to use SSEx in the upcoming android x86. I may guess that Intel is keeping x87 in Atom just for compatibility with legacy software and it is not optimized performance-wise. As for NEON vs. SSEx tests - I would not be so sure about potential winner (especially when you consider that NEON is not replacement for VFP)..

                            Comment


                            • #44
                              Originally posted by atom01 View Post
                              And again, I'm not sure that neon was not used in pandaboard benchmarks.
                              And again NEON can't be used for floating-point, even in single precision it does not conform to IEEE, so I doubt it was used

                              Comment


                              • #45
                                Originally posted by ldesnogu View Post
                                And again NEON can't be used for floating-point, even in single precision it does not conform to IEEE, so I doubt it was used
                                Not for fp math, but it might be used to speed up memory move/copy (by using 128-bit registers).

                                Comment

                                Working...
                                X