Originally posted by xir_
View Post
Announcement
Collapse
No announcement yet.
Ubuntu Plans For Linux x32 ABI Support
Collapse
X
-
Originally posted by Hirager View PostI am just curious. Do -O3 optimizations make binaries "eat" more L2 than -O2 optimizations? Consider everything else left the same in comparison.
The linked ubuntu docs seem to be hidden behind a login. Is there a solution for the library redundancy? Having to load x32 kdelibs+Qt AND x86_64 kdelibs+Qt for that one KDE-App that benefits from >4GB memory would probably outweight any memory savings to be had.
Comment
-
Originally posted by rohcQaH View PostIIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.
The linked ubuntu docs seem to be hidden behind a login. Is there a solution for the library redundancy? Having to load x32 kdelibs+Qt AND x86_64 kdelibs+Qt for that one KDE-App that benefits from >4GB memory would probably outweight any memory savings to be had.
As to your question. You forget just how big multimedia projects can. It is not about memory savings for big programs. It is about savings achieved in workflows which do not require 64-bit software. 64-bit programs are treated here like an additions and nothing more. So this is a back to the past situation, because it turned out that the drawbacks of 64-bit software can be nullified.
Comment
-
Originally posted by Hirager View PostI am just curious. Do -O3 optimizations make binaries "eat" more L2 than -O2 optimizations? Consider everything else left the same in comparison.
In reality though the heuristics governing this are very difficult to get right and this is why sometimes the same code compiled using -O2 will beat -O3. I've never encountered this is with PGO (profile guided optimization) though, which means that the runtime data it uses for making choices when optimizing allows it to accurately value the impact code size/cache misses will have on performance.
Comment
-
Originally posted by jakubo View PostWill there be a benefit for WINE?
Comment
-
Originally posted by rohcQaH View PostIIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.
x32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility. And the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.
The benefit comes from reducing L1, L2, and L3 cache pressure, which can lead to significant speed boosts. It depends heavily on the application in question, though - and even the hardware it's running on. x32 might bring a big boost on hardware with smaller caches, while giving no boost at all on cpus with a large cache size.
Comment
-
Originally posted by smitty3268 View Postx32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility.
Originally posted by smitty3268 View PostAnd the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.
Comment
-
Originally posted by XorEaxEax View PostThat is assuming you will keep/need to run applications as x64, in particular if you have a 64-bit cpu and 4gb or less of ram x32 ONLY would be the perfect fit.
I could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.
If you are talking about custom building your own distro (on gentoo? or lfs?) then maybe you have a point.
I believe you are wrong here, I believe typically a full 32-bit system will use ~20% less ram than an equivalent 64-bit system due to libraries and applications being smaller (as in binaries) and using less ram when running (due to pointer size). Also potentially the x32 code could be even smaller than 32-bit code, this is because that even though both 32-bit and x32 has 32-bit pointers, 32-bit still suffers from having very few registers which means it will need to waste more code performing push'ing and pop'ing from stack in order to reuse the registers. x32 also has 32 bit pointers but TWICE the amount of registers which means that it can keep much more data inside the registers and require much less code to do stack push/pop'ing, thus making code smaller.
The avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.
Comment
-
Originally posted by smitty3268 View PostI could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.
Originally posted by smitty3268 View PostAnd i believe i'm right. Do you have any proof?
after starting X/Openbox this is what conky reported:
Now for the x32 vs 32-bit code size, no I had no proof as it was just something which seemed logical, more registers = less push:ing and pop:ing = smaller code footprint, anyway thanks to your scepticism I figured I should see if it was true.
As I'm running a pure 64-bit system and the GCC I'm using (Arch vanilla) wasn't configured with 32,x32 multilib I could compile code as 32-bit and x32 but not build a final binary. That's not so bad though since I can generate assembly output which actually shows us the code. I took meteor.c from Language Shootout as test subject as it didn't need to link in any external functionality (commented out main/printf) and compiled 32-bit and x32 into assembly output using:
gcc -Os -march=native -fomit-frame-pointer -m32 -S -c meteor.c
gcc -Os -march=native -fomit-frame-pointer -mx32 -S -c meteor.c
The resulting x32 assembly output listing turned out to be quite a bit smaller than the 32-bit one (1505 vs 1691 lines respectively) but that could be the result of 32-bit assembly containing more compiler directives rather than actually smaller code so obviously I had to examine the listings. I can't say I did any thorough comparisons on the larger functions but from quickly scanning I couldn't see any occurence where the x32 code was larger but I did see several places where the x32 code was smaller, I picked out some small (and thus easier to examine) examples from the generated assembly:
Code:32-bit: boardHasIslands: .LFB19: pushl %edi xorl %eax, %eax pushl %esi movb 12(%esp), %dl cmpb $39, %dl jg .L237 movb $5, %cl movsbw %dl, %ax movl board+4, %edi idivb %cl movl board, %esi movsbl %al, %ecx leal (%ecx,%ecx,4), %ecx shrdl %edi, %esi shrl %cl, %edi testb $32, %cl cmovne %edi, %esi andl $32767, %esi testb $1, %al je .L238 movl bad_odd_triple(,%esi,4), %eax jmp .L237 .L238: movl bad_even_triple(,%esi,4), %eax .L237: popl %esi popl %edi ret x32: boardHasIslands: .LFB19: xorl %eax, %eax cmpb $39, %dil jg .L231 movb $5, %dl movsbw %dil, %ax idivb %dl movq board(%rip), %rdx movsbl %al, %ecx leal (%rcx,%rcx,4), %ecx shrq %cl, %rdx andl $32767, %edx sall $2, %edx testb $1, %al movslq %edx, %rdx je .L232 movl bad_odd_triple(%rdx), %eax ret .L232: movl bad_even_triple(%rdx), %eax .L231: ret 32-bit: record_piece: .LFB11: pushl %edi pushl %esi pushl %ebx movl 16(%esp), %esi movl 20(%esp), %eax movl 32(%esp), %edx imull $50, %esi, %ebx imull $600, %esi, %esi addl %eax, %ebx imull $12, %eax, %eax movl piece_counts(,%ebx,4), %ecx addl %eax, %esi movl 28(%esp), %eax leal (%esi,%ecx), %edi movl %edx, pieces+4(,%edi,8) movl %eax, pieces(,%edi,8) movl 24(%esp), %eax movb %al, next_cell(%ecx,%esi) incl %ecx movl %ecx, piece_counts(,%ebx,4) popl %ebx popl %esi popl %edi ret x32: record_piece: .LFB11: imull $50, %edi, %eax imull $600, %edi, %edi addl %esi, %eax imull $12, %esi, %esi sall $2, %eax cltq movl piece_counts(%rax), %r8d addl %edi, %esi addl %r8d, %esi incl %r8d leal 0(,%rsi,8), %edi movslq %esi, %rsi movl %r8d, piece_counts(%rax) movslq %edi, %rdi movb %dl, next_cell(%rsi) movq %rcx, pieces(%rdi) ret
When kernel 3.4 is released and I thus have the possibility to actually run and benchmark x32 code I will recompile GCC with 32,x32 multilib so that I can build and compare proper binaries.
Originally posted by smitty3268 View PostThe avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.
Comment
Comment