Your making assumptions that show me you don't know the basic concepts required to truly understand x32 (and its implications).
Bottom line, you're mixing things that aren't x32 specific, that apply to both 32bit and x86_64.
I think you need a recap of what x32 is.
32bit runs in a older 32bit mode of x86 cpus.
x86_64 runs in 64bit mode, which expands all registers to 64bits, plus double the general register profile.
Regardless of the above, gcc / g++ even under 32bit has 64bit integer data types, with a full range of operations. Most operations will generate multiple instructions, but that's only a performance hit, functionality is unaffected (except for atomicity, but that's a direct consequence of multiple instructions).
x32 is x86_64 with only one difference, 32 bit pointers, every other feature of x86_64 is there.
But at the C / C++ language level, you're able to do signed / unsigned 64bit integer math in all 3 modes.
Since x86_64 uses a 64bit timestamp to represent seconds since Jan, 1st 1970, Linus decided x32 would use that same format (avoiding the 2037 time stamp overflow).
Bottom line, is per the usual, there is a lot of people bashing x32 that just don't really understand how in works (specially under the hood).
No offense, but if you understood under the hood, you wouldn't be demanding benchmarks, just because there no set backs. It has everything good about 32bit and everything good about x86_64 performance wise. The only hit is that system calls must get 32 bit pointers and convert to 64 bit (a trivial operation, that would only be even measurable on apps that execute very little work of its own between system calls, other benefits overwhelm this hit).
Like I said, c / c++ code that works properly in both 32bit and x86_64 modes and don't use assembly should run under x32 with almost no source changes, also very tiny build environment changes.
Why I don't need no stinking x32 benchmarks to know it will be better:
1 - All pointers that need to be stored in the stack (auto allocation), global (static allocate) variables or data structures benefits from x32 vs x86_64 (with the only exception of register variables, but even then, since there are more registers, you can keep more variables in the cpu's registers), they use half the address space which benefits the L1/L2/L3 caches. There zero chance of this being worse than either 32bit or x86_64, except for system call overhead, which is negligible. There's a significant performance benefit for stack allocated pointers, stack allocated structs (with pointers), pointers passed as function arguments.
2 - The doubling of the register profile speeds up everything versus 32bit, zero chance of any losses here, cause pointers don't double in size (the downsize of x86_64). Since the C int is still 32bit (in all 3 cases) 32bit variables don't become 64bit. x32 longs are 64 bit like x86_64 (32bit in 32bit).
3 - All x86_64 special tricks are in x32. The system clock can be read without a system call. Less CPU instructions are used for function calls. Function calls with few parameters use register only calling convention.
Like I said, even with 30 years of programming experience (half of it doing C level stuff) I wrote very little assembly code. I know this stuff cold because I'm ultra curious and a performance nut.
If this isn't clear, you don't need benchmarks, you need to learn assembly language and cpu architectures properly.
It's a pity we don't have a magazine like BYTE was circa 1990, that explained this low level stuff extremely well.