Announcement

**edwaleni** · 28 May 2019, 01:57 PM

Originally posted by pegasus View Post

Very interesting indeed.

Are these tests run at a fixed cpu frequency? If not, can the frequency be monitored and displayed with the results as well?

I was thinking the same thing.

Charting CPU frequency and temp/power during the scalar operations.

**edwaleni** · 28 May 2019, 03:36 PM

Originally posted by pegasus View Post

If you can't get your code ported to OpenMP/OpenCL/CUDA to eventually run on GPUs, take a look at NEC Aurora Tsubasa cards. They're super wide vector engines built with NEC decades of vector supercomputer knowhow.

Correct me if I am wrong, but don't Tsubasa vector cards run their own OS and you have to use their offload API to process your data?

This isn't much different than the Intel Knight's Bridge, where the card had its own OS and you had to use their API to get data into the vector engine.

Intel Mobidius I think works the same way.

With OpenCL, don't you just declare or request a device ID? Makes it much more portable I would think as you could run your data set against any vector platform that supports it.

**celrod** · 28 May 2019, 06:33 PM

On hacker news someone named dragontamer just made a great post about SIMD and GPGPU, here.

I'd suggest reading all of it, but in particular I'd want to highlight the problem that folks think too often about using SIMD to calculate the answer to one function faster, rather than to evaluate many at a time, as well as:

My short summary is... the AMD Vega64 is effectively a 16384-wide SIMD unit. (or perhaps... 64 parallel CU clusters of 256-wide SIMD units).

In my applications, taking advantage of 8-wide SIMD units is easy. 256, on the other hand, is much too wide.

**willmore** · 01 June 2019, 11:36 AM

Could it be that a lot of this code is using SSE/AVX intrinsics that don't play well with the compiler throwing in AVX-256 and AVX-512 vecors as the data isn't stored properly for them to work ideally?

Announcement

Core i9 7980XE GCC 9 AVX Compiler Tuning Performance Benchmarks

Comment

Comment

Comment

Comment