Problem is data structures grow, which means a smaller working set fits into caches and memory traffic increases.Processors are natively 64-bit, so they must handle 64-bit address natively too. There is no reason that 64-bit address are slower to access. I even would say the contrary...
Pixman has hand-written SSE instincs on x86 too, so this is not limited to 64-bit code.Page rendering speed on 64 bit might benefit from SSE instructions in pixman, but a quick Google search turned up no meaningful comparison here. You'll have to benchmark that yourself.
This has nothing to do with "make Atom suck less" - its just a matter of efficiency. X86 is an ugly instruction set with way too few registers, and x86_64 fixes that to some extent.and please - whoever mentioned x32 - forget it. As fast as you can. It was created by Intel to make Atom suck a bit less. Breaking all and everything just to make a crap CPU look better is NOT a good thing to
However x86_64 has the burden of 64-bit pointers, quite useless for userland in mobile devices (as you can still compile the kernel as 64-bit, and run 32-bit code) - which just makes software run slower (compared to x32) and means higher power consumption because of increased memory traffic.
Usually on phones and tablets, binary compatibility is not an issue, Intel often stated this is not intended for desktops.
There is usually a tendency that computation-bound code benefits of the extra and larger registers on 64-bit, while "heavy" bussiness-logic code like large java applications, firefox and so on suffer, because they typically use a lot of complex data structures, with many levels of indirection.
Java is a good example - its a very pointer-heavy language. Because 64bit pointers hurt Java apps typically a lot, modern java runtimes run on 64-bit machines with a hybrid between 32 and 64 bit mode, where pointers are stored in 32-bit and are expanded right before use - if memory conditions allow it.