Announcement

Collapse
No announcement yet.

Further Investigating The Raspberry Pi 32-bit vs. 64-bit Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by vladpetric View Post
    Yes, you're right.

    What about Move Elimination though?
    Sorry, I was referring to the diagram which has it nestled in the register rename block.

    Originally posted by vladpetric View Post
    most such move instructions wouldn't exist in an ARM binary.
    Good point. Still, moves would be used for copying values from one variable to another. So, it's not as if they don't happen. Just, maybe not enough to put it at the top of the list of optimizations to implement.
    Last edited by coder; 22 February 2022, 04:37 PM.

    Comment


    • Originally posted by coder View Post
      Sorry, I was referring to the diagram which has it nestled in the register rename block.


      Good point. Still, moves would be used for copying values from one variable to another. So, it's not as if they don't happen. Just, maybe not enough to put it at the top of the list of optimizations to implement.
      No worries - rename is the place to do such things.

      Essentially when you process mov A -> B at rename (using arrow notation as it's clearer than Intel/at&t), you already have a physical register name for A (say, p16).

      Without any optimization, mov A -> B first creates a new name for B based on a free physical register (say, p53), and then executes (in the OoO engine) the copy instruction p16->p53 when p16 becomes available (which could be immediately, but could also be later, who knows, maybe p53 is produced by a load that misses in the cache ...).

      With the optimization, you simply map (set the name) B to p16, and you don't care about this instruction at all anymore until retirement (back-end). There are multiple speed benefits here (immediately increased IPC, but also capacity in the instruction window aka reservation stations). I would say that the coolest thing is that you could optimize mov A -> B before even the input (p16) is available!

      There is a downside - you need to keep track of how many in-flight instructions point to the same register. Previously, you had to track free versus used physical registers , but now you need a counter for each physical register. There's complexity there, but manageable/implementable (Intel does it ...) .

      Note that I did my PhD on rename optimizations :-)
      Last edited by vladpetric; 22 February 2022, 05:32 PM.

      Comment

      Working...
      X