Announcement

Collapse
No announcement yet.

MIPS Loongson 3A Benchmarks On Debian

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by maldorordiscord View Post
    Your reading skill is questionable because there is no comparison between native speed and x86 format speed. There is only a comparison between full emulated x86 format speed and accelerated x86 format speed. The only logical conclusion out of this is the speed up factor. Not the absolute speed.
    You seem to have reading skill issues too: figure 6 caption states:
    Experimental results for the nine benchmarks. 100 percent
    performance represents the execution of native MIPS code
    . Columns
    represent the execution of x86 emulation with and without hardware
    support. Higher is better.
    Highlighted just so you can re-read it again and hopefully understand.

    Comment


    • Originally posted by maldorordiscord View Post
      My effort to simplify it for you doesn't mean that your acting dumb pays you anything.
      I wrote: the Loongson need to translate the format from the x86 one into the Loongson one.
      So now you confirm me, of course, only to seemingly contradict me.
      Any reasonably intelligent reader will notice this rotten trick..
      There is no rotten trick. It is simple as that and even a self proclaimed super-genius like you should grasp that: If you have to translate it is not native. Isn't that hard, is it?

      You lag any Focus because a x86 core also can not handle CISC code natively this turn your argument into bullshit. a native x86 core also accelerate the cisc code after translating the code with the microcode into the internal CPU architecture logic for example VLIW-like (all modern cpus are internal RISC or VLIW)
      Not long ago that was exactly your argument why these CPUs can't run Windows, despite the fact they can and are intended to do it by the developers. One thing you didn't gues would show up in the documentation, I think.
      So basically you say: When it comes to Windows it is not native, because it has to be emulated/translated, but when it comes to SSE it is native, although it has to be emulated/translated.
      Oh, wait, I am the one lacking the focus?

      I can not do anything for your limited intelligence to recognize the truth.
      But one is for sure you can put two 64bit vector data areas into one 128bit vector data area.
      And you can put four 64bit vector data areas into one 256bit vector data area.
      And you can put eight 64bit vector data areas into one 512bit vector data area.
      So only thing you have done here is to show that you are able to handle basic math. My question was and still is: Do you have any proof that the Loongson 3 can handle 8 SSE instructions at a time? That there are not things that may cause the CPU to only be handle to use 4 instructions (by the way, instructions is not really the right word here, should be more like data packets or something) at a time, may be because of translation issues or something?
      Show me a clear statement that it can handle 8 SSE data-packets at a time.

      Your reading skill is questionable because there is no comparison between native speed and x86 format speed. There is only a comparison between full emulated x86 format speed and accelerated x86 format speed. The only logical conclusion out of this is the speed up factor. Not the absolute speed.
      EPIC FAIL.

      And about "technical forum" I insert 1000 times more technical informations about this tropic to the forum than you because you do not have insert any informations.
      So please improve your personal performance and be a useful member.
      You should first learn what information is. The only information you gave was an article about someone translating MMX to a different CPU with a MMX-like SIMD unit and an article that proofs your claims only to 33%. You wildly guessing and estimating things is not information, it is bullshit.

      Comment


      • Originally posted by ldesnogu View Post
        You seem to have reading skill issues too: figure 6 caption states:

        Highlighted just so you can re-read it again and hopefully understand.
        This is an analysis-error because: They don't benchmark real hardware they only test 1 core in a hardware emulator. This means the 100% is a fictional number of the emulated hardware driven with native mips code.
        This means we do not know the REAL performing number of a Loongson 3A CPU in the article.
        The only logical conclusion with the data base we have is the speed up factor and not the absolute speed.

        Comment


        • I'm speechless...

          Comment


          • Originally posted by ldesnogu View Post
            I'm speechless...
            just for the case you don't get the point:

            "Before the chip
            returned from fabrication, we carried out
            the Godson-3 performance analysis on
            two platforms: a register-transfer-level (RTL)
            simulation platform and a field-programmable
            gate array (FPGA) prototyping platform.
            In the RTL simulation environment, we
            set the core clock frequency to 1 GHz,
            the DDR2/DDR3 clock frequency to
            333 MHz, and the HyperTransport clock
            frequency to 800 MHz. To speed up the
            simulation, we used Cadence?s Xtreme-313
            simulation accelerator, which can achieve a
            speed of 200,000 to 400,000 cycles per sec-
            ond. Because of the difficulty of building a
            full-scale Godson-3 FPGA prototype system,
            we built a partial-scale prototype to evaluate a
            single processor core?s performance. The pro-
            totype system includes one processor core, a
            1-Mbyte L2 cache, one DDR2/DDR3 con-
            troller, and one HyperTransport controller.
            FPGA prototyping speed is 50 MHz,"
            source: GODSON-3: A SCALABLE MULTICORE
            RISC PROCESSOR WITH X86
            EMULATION

            The real one do have 4 cores and 4mb cache.
            We are waiting for real code on real hardware.
            Also we are waiting for sse benchmarks.

            anyway I'm such a smartass sorry for that

            Comment


            • Originally posted by maldorordiscord View Post
              This is an analysis-error because: They don't benchmark real hardware they only test 1 core in a hardware emulator. This means the 100% is a fictional number of the emulated hardware driven with native mips code.
              Absolute non-sense. 100% in that graph represents running native MIPS code in the simulator, the bars represent x86 execution speed running the same benchmarks (one bar emulated, one bar emulated with acceleration), but compiled to x86 code instead to MIPS code. No fiction there, but a qualitative statement about expected performance on the real hardware.
              Man, sometimes it is hard to tell if you are willfully ignorant, plain stupid or both.

              The only logical conclusion with the data base we have is the speed up factor and not the absolute speed.
              No one except you is talking about absolute speed. Everyone who is able to read the graph and the explanatory statements knows that this is not about absolute speed. That you aren't able to read them was successfully proven by, hm, who was it again, ah yes: you.

              Comment


              • I work in a CPU design team so I know for sure that if their RTL env is properly set up the results are correct, no matter whether it's a real chip or not. And their benchmarks are small enough that going to 4mb cahce size wouldn't change the result.

                Any way that deson't change anything: you claim this has good x86 emulation speed, but following your argument, you can't say so, because this was not run on real hadware

                Comment


                • Originally posted by TobiSGD View Post
                  There is no rotten trick. It is simple as that and even a self proclaimed super-genius like you should grasp that: If you have to translate it is not native. Isn't that hard, is it?
                  Do a reality check of your out of focus babbling in your definition a Core2duo E6600 intel cpu can not run x86 code without translating the CISC code with the microcode into the internal RISC/VLIW or whatever it uses hardware code this turns your definition into bullshit.
                  My definition fit to the reality because i say : if hardware can accelerate CISC code based on hardware then its native!
                  My definition fit to an intel core2duo and Loongson 3A!

                  Originally posted by TobiSGD View Post
                  Not long ago that was exactly your argument why these CPUs can't run Windows, despite the fact they can and are intended to do it by the developers.
                  This hardware can not run windows native and it was clear that you can emulate all kind of stuff on a cpu. and the second argument is: this hardware can not emulate windows without linux.



                  Originally posted by TobiSGD View Post
                  One thing you didn't gues would show up in the documentation, I think.
                  So basically you say: When it comes to Windows it is not native, because it has to be emulated/translated, but when it comes to SSE it is native, although it has to be emulated/translated.
                  Oh, wait, I am the one lacking the focus?
                  Its all about the definition your definition of "native" do not fit to the reality because a modern intel cpu can not run x86 CISC code native. And native with running windows in meaning only mean run windows without run linux to emulate windows.


                  Originally posted by TobiSGD View Post
                  So only thing you have done here is to show that you are able to handle basic math. My question was and still is: Do you have any proof that the Loongson 3 can handle 8 SSE instructions at a time?
                  why in hell you don't use specific CPU names?
                  In your way of thinking it can handle it right now because it does have 2 sse units per core and loongson 3A is a quatcore means it can handle 8 SSE instructions at a time. The Loongson 3A do not need to fill many small 64 vectores in a bigger vector space because the Loongson3A do not have a bigger vector space it only do have two 64bit vector units per core.
                  The Question about the smal vector space calculated in a bigger one with other smal ones comes up with the loongson 3B and 3C because it increase the vector space like AVX->Advanced Vector Extensions to 256bit and intel uses the same mechanism to calculate four 64bit sse instruction in the 256bit unit.


                  Originally posted by TobiSGD View Post
                  That there are not things that may cause the CPU to only be handle to use 4 instructions (by the way, instructions is not really the right word here, should be more like data packets or something) at a time, may be because of translation issues or something?
                  Show me a clear statement that it can handle 8 SSE data-packets at a time.
                  data packets right i show you a wikipedia page about a tropic to put many small vector data packets into a big data packet. this is possible because you can cut a big vector space into smaller ones.
                  "Some compiler optimizations, particularly for vector processors, are able to perform this transformation automatically when arrays of structures are created in the program."
                  http://en.wikipedia.org/wiki/Parallel_array

                  Originally posted by TobiSGD View Post
                  EPIC FAIL.
                  You've realized that there is at maximum 1/4 the speed and not 100%?
                  the question about the multi-core scaling is not answered.
                  Also 4mb cache vs 1mb cache.

                  Originally posted by TobiSGD View Post
                  You should first learn what information is. The only information you gave was an article about someone translating MMX to a different CPU with a MMX-like SIMD unit and an article that proofs your claims only to 33%. You wildly guessing and estimating things is not information, it is bullshit.
                  in your numbers i talk 33% informations and 66% bullshit and you talk 100% bullshit because no informations comes from you.

                  Comment


                  • Originally posted by ldesnogu View Post
                    I work in a CPU design team so I know for sure that if their RTL env is properly set up the results are correct, no matter whether it's a real chip or not. And their benchmarks are small enough that going to 4mb cahce size wouldn't change the result.
                    You've realized that there is at maximum 1/4 (25%) the speed and not 100%?
                    Even if the RTL gives us a perfect result.
                    The question about the multi-core scaling is not answered.

                    Originally posted by ldesnogu View Post
                    Any way that deson't change anything: you claim this has good x86 emulation speed, but following your argument, you can't say so, because this was not run on real hadware
                    If RTL env can give you all answers then benchmark websites like phoronix do not exist at all.
                    The result of the RTL env is 25% and not 100% and the multi-core scaling is not tested the cache size impact is also not tested.

                    Comment


                    • You definitely have no clue. Ask yourself what simulation speed is.

                      Comment

                      Working...
                      X