Announcement

**ldesnogu** · 09 July 2012, 10:45 AM

Originally posted by maldorordiscord View Post

Your reading skill is questionable because there is no comparison between native speed and x86 format speed. There is only a comparison between full emulated x86 format speed and accelerated x86 format speed. The only logical conclusion out of this is the speed up factor. Not the absolute speed.

You seem to have reading skill issues too: figure 6 caption states:

Experimental results for the nine benchmarks. 100 percent
performance represents the execution of native MIPS code. Columns
represent the execution of x86 emulation with and without hardware
support. Higher is better.

Highlighted just so you can re-read it again and hopefully understand.

**TobiSGD** · 09 July 2012, 11:12 AM

Originally posted by maldorordiscord View Post

My effort to simplify it for you doesn't mean that your acting dumb pays you anything.
I wrote: the Loongson need to translate the format from the x86 one into the Loongson one.
So now you confirm me, of course, only to seemingly contradict me.
Any reasonably intelligent reader will notice this rotten trick..

There is no rotten trick. It is simple as that and even a self proclaimed super-genius like you should grasp that: If you have to translate it is not native. Isn't that hard, is it?

You lag any Focus because a x86 core also can not handle CISC code natively this turn your argument into bullshit. a native x86 core also accelerate the cisc code after translating the code with the microcode into the internal CPU architecture logic for example VLIW-like (all modern cpus are internal RISC or VLIW)

Not long ago that was exactly your argument why these CPUs can't run Windows, despite the fact they can and are intended to do it by the developers. One thing you didn't gues would show up in the documentation, I think.
So basically you say: When it comes to Windows it is not native, because it has to be emulated/translated, but when it comes to SSE it is native, although it has to be emulated/translated.
Oh, wait, I am the one lacking the focus?

I can not do anything for your limited intelligence to recognize the truth.
But one is for sure you can put two 64bit vector data areas into one 128bit vector data area.
And you can put four 64bit vector data areas into one 256bit vector data area.
And you can put eight 64bit vector data areas into one 512bit vector data area.

So only thing you have done here is to show that you are able to handle basic math. My question was and still is: Do you have any proof that the Loongson 3 can handle 8 SSE instructions at a time? That there are not things that may cause the CPU to only be handle to use 4 instructions (by the way, instructions is not really the right word here, should be more like data packets or something) at a time, may be because of translation issues or something?
Show me a clear statement that it can handle 8 SSE data-packets at a time.

Your reading skill is questionable because there is no comparison between native speed and x86 format speed. There is only a comparison between full emulated x86 format speed and accelerated x86 format speed. The only logical conclusion out of this is the speed up factor. Not the absolute speed.

EPIC FAIL.

And about "technical forum" I insert 1000 times more technical informations about this tropic to the forum than you because you do not have insert any informations.
So please improve your personal performance and be a useful member.

You should first learn what information is. The only information you gave was an article about someone translating MMX to a different CPU with a MMX-like SIMD unit and an article that proofs your claims only to 33%. You wildly guessing and estimating things is not information, it is bullshit.

**maldorordiscord** · 09 July 2012, 01:18 PM

Originally posted by ldesnogu View Post

You seem to have reading skill issues too: figure 6 caption states:

Highlighted just so you can re-read it again and hopefully understand.

This is an analysis-error because: They don't benchmark real hardware they only test 1 core in a hardware emulator. This means the 100% is a fictional number of the emulated hardware driven with native mips code.
This means we do not know the REAL performing number of a Loongson 3A CPU in the article.
The only logical conclusion with the data base we have is the speed up factor and not the absolute speed.

**ldesnogu** · 09 July 2012, 01:45 PM

I'm speechless...

**maldorordiscord** · 09 July 2012, 02:35 PM

Originally posted by ldesnogu View Post

I'm speechless...

just for the case you don't get the point:

"Before the chip
returned from fabrication, we carried out
the Godson-3 performance analysis on
two platforms: a register-transfer-level (RTL)
simulation platform and a field-programmable
gate array (FPGA) prototyping platform.
In the RTL simulation environment, we
set the core clock frequency to 1 GHz,
the DDR2/DDR3 clock frequency to
333 MHz, and the HyperTransport clock
frequency to 800 MHz. To speed up the
simulation, we used Cadence?s Xtreme-313
simulation accelerator, which can achieve a
speed of 200,000 to 400,000 cycles per sec-
ond. Because of the difficulty of building a
full-scale Godson-3 FPGA prototype system,
we built a partial-scale prototype to evaluate a
single processor core?s performance. The pro-
totype system includes one processor core, a
1-Mbyte L2 cache, one DDR2/DDR3 con-
troller, and one HyperTransport controller.
FPGA prototyping speed is 50 MHz,"
source: GODSON-3: A SCALABLE MULTICORE
RISC PROCESSOR WITH X86
EMULATION

The real one do have 4 cores and 4mb cache.
We are waiting for real code on real hardware.
Also we are waiting for sse benchmarks.

anyway I'm such a smartass sorry for that

**TobiSGD** · 09 July 2012, 02:47 PM

Originally posted by maldorordiscord View Post

This is an analysis-error because: They don't benchmark real hardware they only test 1 core in a hardware emulator. This means the 100% is a fictional number of the emulated hardware driven with native mips code.

Absolute non-sense. 100% in that graph represents running native MIPS code in the simulator, the bars represent x86 execution speed running the same benchmarks (one bar emulated, one bar emulated with acceleration), but compiled to x86 code instead to MIPS code. No fiction there, but a qualitative statement about expected performance on the real hardware.
Man, sometimes it is hard to tell if you are willfully ignorant, plain stupid or both.

The only logical conclusion with the data base we have is the speed up factor and not the absolute speed.

No one except you is talking about absolute speed. Everyone who is able to read the graph and the explanatory statements knows that this is not about absolute speed. That you aren't able to read them was successfully proven by, hm, who was it again, ah yes: you.

**ldesnogu** · 09 July 2012, 02:49 PM

I work in a CPU design team so I know for sure that if their RTL env is properly set up the results are correct, no matter whether it's a real chip or not. And their benchmarks are small enough that going to 4mb cahce size wouldn't change the result.

Any way that deson't change anything: you claim this has good x86 emulation speed, but following your argument, you can't say so, because this was not run on real hadware