Announcement

**Manuel Jose** · 04 March 2017, 04:50 PM

I Think that the low results for Himeno is for AVX2 instructions.

In the benchmark history:

pts/himeno-1.2.0
- Use AVX2 by default if available

**Manuel Jose** · 04 March 2017, 04:55 PM

I think that the low result in Himeno is by AVX2 instructions.

Himeno benchmark history:

pts/himeno-1.2.0
- Use AVX2 by default if available.

**peter_g** · 08 March 2017, 03:18 PM

I looked at the test for the stockfish benchmark for a friend and the result was worse the for the Core i3 7100.
It is strange that a 8 core 16 thread processor is worse then a 2 core 4 processor in a test that is described as "This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores."
The Intel processor closes to the 1800x was the Core i7 5775C 4 cores 8 thread and it is even stranger that is is slower then the i3

After som investigation and testing of the benchmark the problem is oblivious.
Stockfish is stated in the phoronix-test-suite with no parameter with the line
./stockfish bench > $LOG_FILE 2>&1
And a look at the code at https://github.com/official-stockfis.../benchmark.cpp has a interesting line for default values
string threads = (is >> token) ? token : "1";
Now everything is explained because the test i only using a single thread and it is not strange that a higher clocked i3 has highest performance

The benchmark need to be change since is is quite misleading when the documentation says it scales with cores

If one looks at the documentation for the benchmarking the test can be changes with
stockfish bench hash threads depth
with the defaults
stockfish bench 16 1 13

I did som test on a AMD A10-7850K that i uses as a server. The test need to have a higher depth because 13 is a to low value to be used in practice and the multitread scaling get better with longer run. And the hash has to be increased 16 MB get filled for longer test and i used 1024 and the usage was 15% for depth 20.

I did som tested with 20 posted below.
The used parameter for the benchmark is
<OutputTemplate>Total time (ms) : #_RESULT_#</OutputTemplate>
But is is better to use the Nodes/second since the number of tested nodes varies with number of threads and nodes per second is the value often used for chess engine CPU tests.

The difference is even larger the latest version i used instead of a 2014 version

Thread 1/Runtime Nodes/s
1 1 1
2 1.7 1.8
4 2.1 2.9

Stockfish 8
1 1 1
2 1.45 2.06
4 1.6 3.6

./stockfish bench 1024 1 20
===========================
Total time (ms) : 83924
Nodes searched : 124423702
Nodes/second : 1482575

./stockfish bench 1024 2 20
===========================
Total time (ms) : 48686
Nodes searched : 133431896
Nodes/second : 2740662

./stockfish bench 1024 4 20

==========================
Total time (ms) : 39932
Nodes searched : 177124223
Nodes/second : 4435646

I would recommend that the benchmark is changed to use the appropriate amount of thread and the hash and depth is increased. And it would be an idea to update a later release when since the old values are no longer relevant and the multithread performance has been improved. The preformance difference between 1 and 4 cores increases 23% for my test and i likely larger for higher tests.

To run 20 nodes depth might give longer runtime then you like. But increase it from 13 to what give a resonable runtime.

**peter_g** · 08 March 2017, 05:22 PM

./stockfish bench 1024 4 20

**amoratil** · 09 March 2017, 07:44 AM

I may have a note for Blender test:

The default BMW27 benchmark (https://blenderartists.org/forum/sho...rk-for-Dummies) has defaults values for GPU testing, not CPU:
In Performance tab, tiles section, the X and Y values are set to 512px. That values represents the rectangle dimensions which are going to be rendered, so if the default image has a resolution of 960px per 540px, breaking it in 512x512px boxes give us 4 diferent sizes boxes: 1 full box, 1 almost full box and 2 almost empty boxes: it'll only use 4 cores to complete the task, and only 1 of them is going to be used along all test, the other 3 are going to idle as soon as they finish their boxes: 2 of them really quickly.
So, the blender results shown in the review are useless.

If you want to test the CPU performance, the tiles X and Y dimensions MUST be set to a small enough (ideally a multiple of the image resolution, but given the values are small enough, it doesn't affect the end result). When we test CPU with Blender, we usually use 16x16, as is small enough to make almost full use of all availables cores.

When we test a GPU, it doesn't matter, as long a GPU behaves as an unique core to rendering process.

Oh, and keep in mind: if you change the X and Y boxes dimensions, the results are not comparable to other tests with different boxes dimensions...

Hope this will help to make a more valuable review.
Cheers!

**msroadkill612** · 15 March 2017, 05:16 AM

I am a ~newb, but i note here that gpuS have a role in VM servers.

Perhaps fyi re the upcoming (soon - it was demonstrated at the Ryzen release) Vegas GPU architecture?

As above, dunno, but 5b TB of virtual memory sure sounds as if it opens possibilities to coders.

Further preamble is that ryzen is coinciding with another tectonic shift in IT ground rules, and that is that the difference between between ram and storage speeds has been significantly reduced by m.2 pcie memory sticks, running on controllers with 4 full pcie lanes of bandwidth. Storage speeds aint what they used to be.

Perhaps most significantly in this context, are the astonishing improvements in IOPS with the top samsung m.2 SSDs.

e.g.

Samsung 960 PRO 2TB M.2 NVMe SSD Full Review - Even Faster! - PC Perspective

https://www.pcper.com/reviews/Storage/Samsung-960-PRO-2TB-M2-NVMe-SSD-Full-Review-Even-Faster#comment-318202

Samsung 960 PRO 2TB M.2 NVMe SSD Full Review - Even Faster! Introduction:

SO, my main point is:

pcgamesn.com
re amd-vega-gpu-specifications

"it's the High Bandwidth Cache and High Bandwidth Controller silicon which looks the most exciting and that's all related to moving outside of the limits of the graphics card's video memory. In normal GPUs developers have to fit all the data they need to render into the frame buffer, meaning all the polygons, shaders and textures have to squeeze into your card's VRAM.

That can be restrictive and devs have to find clever workarounds for large, open-world games. The revolution with AMD's Vega design is to break free of those limits. The High Bandwidth Cache and High Bandwidth Controller mean the GPU can now stream in rendering data from your PC's system memory, or even an SSD, meaning it doesn't have to come via the card's frame buffer."

PS, just a heads up i have not seen mentioned here, is builders should note the ryzen mobos are a bit light on pci lanes vs top intels. 32 lanes vs 40.

**oooverclocker** · 09 June 2017, 08:49 AM

I updated my bios to AGESA 1.0.0.6 today and was able to overclock my dual rank ECC memory to 3333 MHz.

Code:

user@PC:~$ sudo dmidecode | grep MHz
    External Clock: 100 MHz
    Max Speed: 3900 MHz
    Current Speed: 3800 MHz
    Speed: 3334 MHz
    Configured Clock Speed: 1667 MHz
    Speed: 3334 MHz
    Configured Clock Speed: 1667 MHz

BTW: 3800 MHz is just my CPU frequency for those who are confused.

It feels like the performance has increased significantly. Games also seem to run extremely smooth - even smoother than before. I tested with CS:GO which was nearly stuck at around 230 FPS the whole time. I would really like to see new gaming tests with these memory clocks.

Announcement

AMD Ryzen 7 1800X Linux Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment