Announcement

Collapse
No announcement yet.

Ubuntu Plans For Linux x32 ABI Support

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by jakubo View Post
    Will there be a benefit for WINE?
    I don't think so, obviously the actual windows programs won't be faster but I also think that the parts of Windows which Wine reimplements which could potentially be faster needs to run as standard 32-bit code aswell and thus won't be faster. But I'm not sure about this, I don't have much insight into how Wine works.

    Comment


    • #17
      Originally posted by rohcQaH View Post
      IIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.
      I believe that was changed when they updated to a more recent GCC version and started supporting PGO. I believe they switched to -O3, along with using an option to limit the amount of inlining that normally enables.


      x32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility. And the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.

      The benefit comes from reducing L1, L2, and L3 cache pressure, which can lead to significant speed boosts. It depends heavily on the application in question, though - and even the hardware it's running on. x32 might bring a big boost on hardware with smaller caches, while giving no boost at all on cpus with a large cache size.

      Comment


      • #18
        Originally posted by smitty3268 View Post
        x32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility.
        That is assuming you will keep/need to run applications as x64, in particular if you have a 64-bit cpu and 4gb or less of ram x32 ONLY would be the perfect fit.

        Originally posted by smitty3268 View Post
        And the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.
        I believe you are wrong here, I believe typically a full 32-bit system will use ~20% less ram than an equivalent 64-bit system due to libraries and applications being smaller (as in binaries) and using less ram when running (due to pointer size). Also potentially the x32 code could be even smaller than 32-bit code, this is because that even though both 32-bit and x32 has 32-bit pointers, 32-bit still suffers from having very few registers which means it will need to waste more code performing push'ing and pop'ing from stack in order to reuse the registers. x32 also has 32 bit pointers but TWICE the amount of registers which means that it can keep much more data inside the registers and require much less code to do stack push/pop'ing, thus making code smaller.

        Comment


        • #19
          Originally posted by XorEaxEax View Post
          That is assuming you will keep/need to run applications as x64, in particular if you have a 64-bit cpu and 4gb or less of ram x32 ONLY would be the perfect fit.
          You're assuming distros are going to create pure x32 distros, which i find unlikely. They already have to use the x64 kernel, so I find it hard to believe they wouldn't include x64 userland libs as well.

          I could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.

          If you are talking about custom building your own distro (on gentoo? or lfs?) then maybe you have a point.

          I believe you are wrong here, I believe typically a full 32-bit system will use ~20% less ram than an equivalent 64-bit system due to libraries and applications being smaller (as in binaries) and using less ram when running (due to pointer size). Also potentially the x32 code could be even smaller than 32-bit code, this is because that even though both 32-bit and x32 has 32-bit pointers, 32-bit still suffers from having very few registers which means it will need to waste more code performing push'ing and pop'ing from stack in order to reuse the registers. x32 also has 32 bit pointers but TWICE the amount of registers which means that it can keep much more data inside the registers and require much less code to do stack push/pop'ing, thus making code smaller.
          And i believe i'm right. Do you have any proof?

          The avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.

          Comment


          • #20
            Originally posted by smitty3268 View Post
            I could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.
            Yes, I'm doubtful of this aswell, Ubuntu as the article states is looking into it but that is a long way from fully supporting it, Gentoo is very much build-it-yourself from scratch so I believe they will 'support' x32. I'm not sure what you mean by combining x32 and x64 architectures though, they will use the same kernel but they will need different libraries.

            Originally posted by smitty3268 View Post
            And i believe i'm right. Do you have any proof?
            As for 32-bit using ~20% less ram than equivalent 64-bit system/code that has been quite verified (I've done it twice myself in the past, both on Windows and Ubuntu), but since it's quick to do in these days of VM's I did a test just now, two identical setups in terms of software, one Arch 32-bit and one Arch 64-bit. After the same base installation I installed X, OpenBox and Conky on both,
            after starting X/Openbox this is what conky reported:

            http://img442.imageshack.us/img442/3500/32bit.png
            http://img232.imageshack.us/img232/8794/64bit.png

            Now for the x32 vs 32-bit code size, no I had no proof as it was just something which seemed logical, more registers = less push:ing and pop:ing = smaller code footprint, anyway thanks to your scepticism I figured I should see if it was true.

            As I'm running a pure 64-bit system and the GCC I'm using (Arch vanilla) wasn't configured with 32,x32 multilib I could compile code as 32-bit and x32 but not build a final binary. That's not so bad though since I can generate assembly output which actually shows us the code. I took meteor.c from Language Shootout as test subject as it didn't need to link in any external functionality (commented out main/printf) and compiled 32-bit and x32 into assembly output using:

            gcc -Os -march=native -fomit-frame-pointer -m32 -S -c meteor.c
            gcc -Os -march=native -fomit-frame-pointer -mx32 -S -c meteor.c

            The resulting x32 assembly output listing turned out to be quite a bit smaller than the 32-bit one (1505 vs 1691 lines respectively) but that could be the result of 32-bit assembly containing more compiler directives rather than actually smaller code so obviously I had to examine the listings. I can't say I did any thorough comparisons on the larger functions but from quickly scanning I couldn't see any occurence where the x32 code was larger but I did see several places where the x32 code was smaller, I picked out some small (and thus easier to examine) examples from the generated assembly:

            Code:
            32-bit:
            boardHasIslands:
            .LFB19:
            	pushl	%edi
            	xorl	%eax, %eax
            	pushl	%esi
            	movb	12(%esp), %dl
            	cmpb	$39, %dl
            	jg	.L237
            	movb	$5, %cl
            	movsbw	%dl, %ax
            	movl	board+4, %edi
            	idivb	%cl
            	movl	board, %esi
            	movsbl	%al, %ecx
            	leal	(%ecx,%ecx,4), %ecx
            	shrdl	%edi, %esi
            	shrl	%cl, %edi
            	testb	$32, %cl
            	cmovne	%edi, %esi
            	andl	$32767, %esi
            	testb	$1, %al
            	je	.L238
            	movl	bad_odd_triple(,%esi,4), %eax
            	jmp	.L237
            .L238:
            	movl	bad_even_triple(,%esi,4), %eax
            .L237:
            	popl	%esi
            	popl	%edi
            	ret
            
            x32:
            boardHasIslands:
            .LFB19:
            	xorl	%eax, %eax
            	cmpb	$39, %dil
            	jg	.L231
            	movb	$5, %dl
            	movsbw	%dil, %ax
            	idivb	%dl
            	movq	board(%rip), %rdx
            	movsbl	%al, %ecx
            	leal	(%rcx,%rcx,4), %ecx
            	shrq	%cl, %rdx
            	andl	$32767, %edx
            	sall	$2, %edx
            	testb	$1, %al
            	movslq	%edx, %rdx
            	je	.L232
            	movl	bad_odd_triple(%rdx), %eax
            	ret
            .L232:
            	movl	bad_even_triple(%rdx), %eax
            .L231:
            	ret
            
            32-bit:
            record_piece:
            .LFB11:
            	pushl	%edi
            	pushl	%esi
            	pushl	%ebx
            	movl	16(%esp), %esi
            	movl	20(%esp), %eax
            	movl	32(%esp), %edx
            	imull	$50, %esi, %ebx
            	imull	$600, %esi, %esi
            	addl	%eax, %ebx
            	imull	$12, %eax, %eax
            	movl	piece_counts(,%ebx,4), %ecx
            	addl	%eax, %esi
            	movl	28(%esp), %eax
            	leal	(%esi,%ecx), %edi
            	movl	%edx, pieces+4(,%edi,8)
            	movl	%eax, pieces(,%edi,8)
            	movl	24(%esp), %eax
            	movb	%al, next_cell(%ecx,%esi)
            	incl	%ecx
            	movl	%ecx, piece_counts(,%ebx,4)
            	popl	%ebx
            	popl	%esi
            	popl	%edi
            	ret
            
            x32:
            record_piece:
            .LFB11:
            	imull	$50, %edi, %eax
            	imull	$600, %edi, %edi
            	addl	%esi, %eax
            	imull	$12, %esi, %esi
            	sall	$2, %eax
            	cltq
            	movl	piece_counts(%rax), %r8d
            	addl	%edi, %esi
            	addl	%r8d, %esi
            	incl	%r8d
            	leal	0(,%rsi,8), %edi
            	movslq	%esi, %rsi
            	movl	%r8d, piece_counts(%rax)
            	movslq	%edi, %rdi
            	movb	%dl, next_cell(%rsi)
            	movq	%rcx, pieces(%rdi)
            	ret
            Now granted, this is not irrefutable proof. I can't swear that the x32 assembly here generates smaller code footprint than 32-bit as I'm only going by the assembly output, but it does seem likely. I also compiled with both -O2 and -O3 and in both cases the resulting x32 assembly was quite a bit smaller than the 32-bit one, I didn't examine those listings though.

            When kernel 3.4 is released and I thus have the possibility to actually run and benchmark x32 code I will recompile GCC with 32,x32 multilib so that I can build and compare proper binaries.

            Originally posted by smitty3268 View Post
            The avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.
            Again the ram usage difference of roughly ~20% between 32-bit and 64-bit equivalent systems is pretty much confirmed. Also code size does matter for performance since the cpu cache isn't infinite.

            Comment


            • #21
              Originally posted by smitty3268 View Post
              You're assuming distros are going to create pure x32 distros, which i find unlikely. They already have to use the x64 kernel, so I find it hard to believe they wouldn't include x64 userland libs as well.
              I think the main point behind Intel developing x32 is mobile (due to limited memory bandwidth). If would not be surprised if a future x32 enabled Tizen comes with 64 bit kernel&toolchain and everything else 32 bit.

              Comment


              • #22
                Originally posted by chithanh View Post
                I think the main point behind Intel developing x32 is mobile (due to limited memory bandwidth). If would not be surprised if a future x32 enabled Tizen comes with 64 bit kernel&toolchain and everything else 32 bit.
                Is Intel behind x32? I thought it was the server guys, who wanted to bump up their benchmark scores.

                Mobile is one possibility, but I'm not sure it's the reason x32 exists.

                Comment


                • #23
                  See H. Peter Anvin's original x32 presentation (he works for Intel):
                  http://www.linuxplumbersconf.org/2011/ocw/sessions/531
                  Slide 10 explicitly mentions embedded devices as use case, although it is also possible to have better performance on other systems.

                  Comment

                  Working...
                  X