Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 42

Thread: Intel Is Trying To Support The x32 ABI For LLVM/Clang

  1. #31

    Default

    Quote Originally Posted by schmidtbag View Post
    Why? Seriously intel, just drop your 32 bit platforms already. This is getting old.
    Seriously. Ram is stupid cheap these days, any box out there can be fitted with DDR3 8Gb dimms, a small ram savings is all that X32 gets you over straight 64 bit.

    X32 would have made sense in the DDR1 days of the first run of 64 bit CPUs, where you maxed out at 1Gb dimms for a max of 4Gb of ram but these days its a solution looking for a problem.

  2. #32

    Default

    Quote Originally Posted by doom_Oo7 View Post
    Since you have 64 bit of bandwith you can transfer two 32 bitspointers at a time which might make stuff faster.
    In what real wold secenarios wouls be of bennifit over straight 64 bit? If I'm doing something that is that taxing to the point it makes a noticeable difference it probably gains more from full 64 bit anyways.

  3. #33
    Join Date
    Jul 2013
    Location
    Bordeaux, France
    Posts
    285

    Default

    Quote Originally Posted by Kivada View Post
    In what real wold secenarios wouls be of bennifit over straight 64 bit?

    This is the work of Michael to devise this and run appropriate benchmarks on PTS

  4. #34
    Join Date
    Jul 2013
    Location
    Bordeaux, France
    Posts
    285

    Default

    I tried making a VERY SYNTHETIC test just to check how it performs on pointers.

    Code:
    #include <vector>
    #include <cinttypes>
    #include <iostream>
    #include <chrono>
    using namespace std;
    using namespace std::chrono;
    
    int main()
    {
    	vector<int32_t *> v1(50000000, new int32_t{1});
    	vector<int32_t *> v2(50000000, new int32_t{-1});
    	vector<int64_t> times;
    
    	int32_t* sum = new int32_t;
    	for(int count = 1000; count --> 0;)
    	{
    		auto t1 = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
    		for(auto it1 = begin(v1), it2 = begin(v2); it1 != end(v1); ++it1, ++it2)
    		{
    			*sum = **it1 + **it2;
    		}
    		auto t2 = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
    		times.push_back((t2 - t1).count());
    		*sum = 0;
    	}
    
    	double tsum{};
    	for(auto t : times)
    		tsum += t;
    
    	cout << "Time: " << tsum / 1000 << endl;
    
    	return 0;
    }
    Code:
      % g++ -std=c++11 -O3 -march=native -mx32 test.cpp
      % ./a.out
    Time: 47.666
    
      % g++ -std=c++11 -O3 -march=native -m64 test.cpp 
      % ./a.out  
    Time: 65.757

  5. #35
    Join Date
    Sep 2012
    Posts
    650

    Default

    Quote Originally Posted by Kivada View Post
    Seriously. Ram is stupid cheap these days, any box out there can be fitted with DDR3 8Gb dimms, a small ram savings is all that X32 gets you over straight 64 bit.

    X32 would have made sense in the DDR1 days of the first run of 64 bit CPUs, where you maxed out at 1Gb dimms for a max of 4Gb of ram but these days its a solution looking for a problem.
    It is much harder to buy processor cache, reduce ram latency, or increase its bandwidth. There are applications where this is more important than total ram available.
    A CPU will spend 150 to 200 cycles doing nothing for your process if it is waiting for a value from main memory. It will only need 3-4 cycles if this value is in L1 cache, and 0 if it is in a register (whose number is fixed).
    It is very important to reduce cache usage, and x32, which uses 64 bit instructions, does exactly that by using smaller pointers.

    See numbers on the previous page to get real world measurements.

  6. #36
    Join Date
    Oct 2012
    Posts
    179

    Default

    Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
    For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...

  7. #37
    Join Date
    Jul 2013
    Location
    Bordeaux, France
    Posts
    285

    Default

    Quote Originally Posted by asdfblah View Post
    Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
    For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...
    Well on debian there are libc and libstdc++ so you could always try to recompile stuff...

    Or maybe do something like Sylvestre is doing for clang (http://clang.debian.net/) but with x32 instead ? But I don't know how much time it would take on a free EC2 instance... (it's 750h max)

  8. #38
    Join Date
    Jun 2011
    Location
    Scotland
    Posts
    101

    Default

    Quote Originally Posted by asdfblah View Post
    Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
    For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...
    If you want to try x32 properly I would suggest gentoo. x32 gentoo is fully supported http://distfiles.gentoo.org/releases...nt-stage3-x32/

  9. #39
    Join Date
    Jul 2008
    Location
    Berlin, Germany
    Posts
    821

    Default

    Quote Originally Posted by name99 View Post
    To add to the (remarkably few) facts in this thread, x32 buys you about 10% in performance over x64.
    That is very optimistic. 10% is roughly the difference which you can see in select synthetic benchmarks, outside pathological cases.

    Quote Originally Posted by asdfblah View Post
    Will we ever have x32 libs and programs?
    It was already said in this thread. On devices where all native software is managed by a central instance (e.g. mobile phone) we might see x32. As soon as you leave that realm (e.g. for a normal desktop distribution), the drawbacks of having an additional set of libraries start to outweigh the benefits of x32.

    Quote Originally Posted by erendorn View Post
    It will only need 3-4 cycles if this value is in L1 cache, and 0 if it is in a register (whose number is fixed).
    It is very important to reduce cache usage, and x32, which uses 64 bit instructions, does exactly that by using smaller pointers.
    This is discussed in the link which I posted before. Because it affects only L1 data cache and not L1 instruction cache, the difference is mostly limited to pointer heavy code and not as big as the 50% saving in pointer size might suggest.

    Quote Originally Posted by erendorn View Post
    See numbers on the previous page to get real world measurements.
    There exist precious few real-world benchmarks of x32 vs. x64. I did not see any posted to or linked from this thread.

    Quote Originally Posted by scottishduck View Post
    If you want to try x32 properly I would suggest gentoo. x32 gentoo is fully supported http://distfiles.gentoo.org/releases...nt-stage3-x32/
    And if you want to try Gentoo x32, read about the still unresolved problems: https://bugs.gentoo.org/showdependen...ide_resolved=1

    Quote Originally Posted by Rexilion View Post
    x32 is not 32bit. You just mentioned this yourself.
    x32 is 32 bit. In contrast to x86(-32), it requires a 64 bit CPU and a 64 bit kernel, but it uses ILP32 programming model. Hence, calling it 32 bit is certainly justified.

  10. #40
    Join Date
    Sep 2012
    Posts
    650

    Default

    Quote Originally Posted by chithanh View Post
    There exist precious few real-world benchmarks of x32 vs. x64. I did not see any posted to or linked from this thread.
    name99 posted an example on the previous page.
    It's a single data point, but it's quite real world (compilation time of a clang compiler compiled in 32, x64 and x32).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •