Announcement

Collapse
No announcement yet.

Raspberry Pi OS 32-bit vs. 64-bit Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    It's worth pointing out that there are a few critical cases where code has arm32 code for inner loops etc, but are still using a generic C fallback for other cases including arm64. For ffmpeg, that can easily result in a 10x slowdown in some inner loops and an overall net impact of 2x-5x slower.

    On the flipside, now that there's an "official RPF" arm64 build, rather than just Ubuntu/Arch, hopefully such code will get updated sooner now.

    Comment


    • #62
      Yeah, nice results. I also would love to see memory consumption for the benchmarks but I guess that's hard to do because memory use vary's a lot over a single bench. Maybe just measure peek mem?
      Comparing the official Raspian images in 32 vs 64 bits, the 64 ones are smaller. So somehow the newer instruction set must be more compact. Also pointers are just a small fraction of a program and shouldn't make a relevant difference. On x86 the difference in binary size was much more pronounced and to the favor of 32 bit.

      Comment


      • #63
        Originally posted by coder View Post

        Nah, it's just a branch by another name. There's really no reason you couldn't handle predicated instructions in the same way, if you wanted to.
        There is: if every instruction can branch, the branch predictor would have to cache the path of every single instruction (this is not doable). Typical branch predictors only need to cache the path taken by branch instructions, so it is a lot more efficient.
        Last edited by paulpach; 08 February 2022, 08:44 AM.

        Comment


        • #64
          Originally posted by paulpach View Post
          There is: if every instruction can branch, the branch predictor would have to cache the path of every single instruction (this is not doable).
          There must be some value to stuff in that field that tells it an instruction isn't predicated.

          Anyway, the point is moot. AArch64 doesn't give us predicated instructions (except branches), so that's that.

          Comment


          • #65
            Originally posted by Anux View Post
            somehow the newer instruction set must be more compact.
            They're still 32-bits per instruction. If some code compiled to the old ABI did lots of register spilling, then the larger register file would reduce load/stores -> fewer instructions. There are probably other reasons, too.

            Originally posted by Anux View Post
            Also pointers are just a small fraction of a program and shouldn't make a relevant difference.
            That depends quite a lot on the program in question. Something like a compiler will have lots and lots of pointers. Every string variable and every tree node will have pointers. Pointers in your hash tables. Pointers everywhere!

            In contrast, an imaging editing program will probably have most of its memory comprised of pixel data.

            Comment


            • #66
              Time to compare the Raspberry Pi 4 to the Odroid N2 again... encription and thermals (of the non-400) likely still differ but other aspects idono

              Comment


              • #67
                Well it was about damn time they moved to an AArch64 build. Should have done that years ago, but like with x86, people will always get hung up on how limited the performance jump was on early compilers for a new improved ISA. Thankfully on the x64 side these people have long since shut up, but people kept referring to results from early compilers that would do stupid things like only use two general purpose registers* when the ISA actually went from 8 to 16 for years and years.

                *At my uni we noticed a program went trough a significant reduction in performance when compiled to x86-64 and we found that when we dumped the assembly and started going trough it. Not sure what compiler they were using and it could have just been ICC and Intel at their usual shenanigans.
                Last edited by L_A_G; 08 February 2022, 12:39 PM.

                Comment


                • #68
                  Originally posted by coder View Post
                  There must be some value to stuff in that field that tells it instruction isn't predicated
                  Not exactly. A typical branch predictor is implemented as a cache. It looks like a hash table in hardware. Given the address of an instruction, it returns the address it jumped to last time (which may or may not be wrong).
                  In typical general-purpose code, 1 out of 5 instructions is a branch. So you only cache the prediction for 1 out of 5 instructions.
                  If every instruction can branch, you would have to cache 5x more predictions, this is a no-go. So predication is not done with a branch predictor at all. I believe typical speculative execution just executes the instruction but does not commit the result until the predicate is available.

                  Comment


                  • #69
                    Originally posted by paulpach View Post
                    If every instruction can branch, you would have to cache 5x more predictions, this is a no-go.
                    That's the maximum that can be predicated, not the typical amount that are. A good optimizing compiler shouldn't emit so many predicated instructions as to overflow the branch predictor in whatever core you're tuning for.

                    Comment


                    • #70
                      Originally posted by L_A_G View Post
                      Well it was about damn time they moved to an AArch64 build. Should have done that years ago, but like with x86, people will always get hung up on how limited the performance jump was on early compilers for a new improved ISA.
                      I doubt that was what held them back. Maintaining AArch32 and AArch64 is almost twice the work. Also, drivers had to be ported to AArch64, all while everything still had to be done to keep the AArch32 distro working and adding support for new hardware models. I think the number of full-time employees at RPF was quite small, for a long time.

                      Comment

                      Working...
                      X