Announcement

Collapse
No announcement yet.

Intel Is Trying To Support The x32 ABI For LLVM/Clang

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by schmidtbag View Post
    Why? Seriously intel, just drop your 32 bit platforms already. This is getting old.
    Seriously. Ram is stupid cheap these days, any box out there can be fitted with DDR3 8Gb dimms, a small ram savings is all that X32 gets you over straight 64 bit.

    X32 would have made sense in the DDR1 days of the first run of 64 bit CPUs, where you maxed out at 1Gb dimms for a max of 4Gb of ram but these days its a solution looking for a problem.

    Comment


    • #32
      Originally posted by doom_Oo7 View Post
      Since you have 64 bit of bandwith you can transfer two 32 bitspointers at a time which might make stuff faster.
      In what real wold secenarios wouls be of bennifit over straight 64 bit? If I'm doing something that is that taxing to the point it makes a noticeable difference it probably gains more from full 64 bit anyways.

      Comment


      • #33
        Originally posted by Kivada View Post
        In what real wold secenarios wouls be of bennifit over straight 64 bit?

        This is the work of Michael to devise this and run appropriate benchmarks on PTS

        Comment


        • #34
          I tried making a VERY SYNTHETIC test just to check how it performs on pointers.

          Code:
          #include <vector>
          #include <cinttypes>
          #include <iostream>
          #include <chrono>
          using namespace std;
          using namespace std::chrono;
          
          int main()
          {
          	vector<int32_t *> v1(50000000, new int32_t{1});
          	vector<int32_t *> v2(50000000, new int32_t{-1});
          	vector<int64_t> times;
          
          	int32_t* sum = new int32_t;
          	for(int count = 1000; count --> 0;)
          	{
          		auto t1 = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
          		for(auto it1 = begin(v1), it2 = begin(v2); it1 != end(v1); ++it1, ++it2)
          		{
          			*sum = **it1 + **it2;
          		}
          		auto t2 = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
          		times.push_back((t2 - t1).count());
          		*sum = 0;
          	}
          
          	double tsum{};
          	for(auto t : times)
          		tsum += t;
          
          	cout << "Time: " << tsum / 1000 << endl;
          
          	return 0;
          }
          Code:
            % g++ -std=c++11 -O3 -march=native -mx32 test.cpp
            % ./a.out
          Time: 47.666
          
            % g++ -std=c++11 -O3 -march=native -m64 test.cpp 
            % ./a.out  
          Time: 65.757

          Comment


          • #35
            Originally posted by Kivada View Post
            Seriously. Ram is stupid cheap these days, any box out there can be fitted with DDR3 8Gb dimms, a small ram savings is all that X32 gets you over straight 64 bit.

            X32 would have made sense in the DDR1 days of the first run of 64 bit CPUs, where you maxed out at 1Gb dimms for a max of 4Gb of ram but these days its a solution looking for a problem.
            It is much harder to buy processor cache, reduce ram latency, or increase its bandwidth. There are applications where this is more important than total ram available.
            A CPU will spend 150 to 200 cycles doing nothing for your process if it is waiting for a value from main memory. It will only need 3-4 cycles if this value is in L1 cache, and 0 if it is in a register (whose number is fixed).
            It is very important to reduce cache usage, and x32, which uses 64 bit instructions, does exactly that by using smaller pointers.

            See numbers on the previous page to get real world measurements.

            Comment


            • #36
              Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
              For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...

              Comment


              • #37
                Originally posted by asdfblah View Post
                Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
                For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...
                Well on debian there are libc and libstdc++ so you could always try to recompile stuff...

                Or maybe do something like Sylvestre is doing for clang (http://clang.debian.net/) but with x32 instead ? But I don't know how much time it would take on a free EC2 instance... (it's 750h max)

                Comment


                • #38
                  Originally posted by asdfblah View Post
                  Will we ever have x32 libs and programs? It looks to me like x32 has been almost completely forgotten, has it?
                  For those who still don't understand... you can save memory AND gain performance with x32. x32 runs on x64 OSes, but NOT on 32 bits OSes...
                  If you want to try x32 properly I would suggest gentoo. x32 gentoo is fully supported http://distfiles.gentoo.org/releases...nt-stage3-x32/

                  Comment


                  • #39
                    Originally posted by name99 View Post
                    To add to the (remarkably few) facts in this thread, x32 buys you about 10% in performance over x64.
                    That is very optimistic. 10% is roughly the difference which you can see in select synthetic benchmarks, outside pathological cases.

                    Originally posted by asdfblah View Post
                    Will we ever have x32 libs and programs?
                    It was already said in this thread. On devices where all native software is managed by a central instance (e.g. mobile phone) we might see x32. As soon as you leave that realm (e.g. for a normal desktop distribution), the drawbacks of having an additional set of libraries start to outweigh the benefits of x32.

                    Originally posted by erendorn View Post
                    It will only need 3-4 cycles if this value is in L1 cache, and 0 if it is in a register (whose number is fixed).
                    It is very important to reduce cache usage, and x32, which uses 64 bit instructions, does exactly that by using smaller pointers.
                    This is discussed in the link which I posted before. Because it affects only L1 data cache and not L1 instruction cache, the difference is mostly limited to pointer heavy code and not as big as the 50% saving in pointer size might suggest.

                    Originally posted by erendorn View Post
                    See numbers on the previous page to get real world measurements.
                    There exist precious few real-world benchmarks of x32 vs. x64. I did not see any posted to or linked from this thread.

                    Originally posted by scottishduck View Post
                    If you want to try x32 properly I would suggest gentoo. x32 gentoo is fully supported http://distfiles.gentoo.org/releases...nt-stage3-x32/
                    And if you want to try Gentoo x32, read about the still unresolved problems: https://bugs.gentoo.org/showdependen...ide_resolved=1

                    Originally posted by Rexilion View Post
                    x32 is not 32bit. You just mentioned this yourself.
                    x32 is 32 bit. In contrast to x86(-32), it requires a 64 bit CPU and a 64 bit kernel, but it uses ILP32 programming model. Hence, calling it 32 bit is certainly justified.

                    Comment


                    • #40
                      Originally posted by chithanh View Post
                      There exist precious few real-world benchmarks of x32 vs. x64. I did not see any posted to or linked from this thread.
                      name99 posted an example on the previous page.
                      It's a single data point, but it's quite real world (compilation time of a clang compiler compiled in 32, x64 and x32).

                      Comment

                      Working...
                      X