Originally posted by Shnatsel
View Post
Announcement
Collapse
No announcement yet.
New Benchmark Test Profiles This Weekend: GIMP, Memcached, JPEG Turbo, More OpenCL
Collapse
X
-
Originally posted by defaultUser View Post
The folder gegl/perf contains some performance benchmarks. However is important to notice that due to the GEGL architecture the performance of these tests is sub optimal. Since for every tile (the gegl decomposes the image in tiles) is necessary to allocate and copy to the GPU. For this case what you are going to measure is basically the timing to allocate things on the GPU and to copy to it. For real use, for instance on a gimp session, the buffers pre allocated are preserved. However I believe that still necessary to copy to and from the gpu. Again interfering with the results
Here's a blurb from AMD itself about the benefits of HSA....
" HSA creates an improved processor design that exposes the benefits and capabilities of mainstream programmable compute elements, working together seamlessly. With HSA, applications can create data structures in a single unified address space and can initiate work items on the hardware most appropriate for a given task. Sharing data between compute elements is as simple as sending a pointer. Multiple compute tasks can work on the same coherent memory regions, utilizing barriers and atomic memory operations as needed to maintain data synchronization (just as multi-core CPUs do today).
The HSA team at AMD analyzed the performance of Haar Face Detect, a commonly used multi-stage video analysis algorithm used to identify faces in a video stream. The team compared a CPU/GPU implementation in OpenCL™ against an HSA implementation. The HSA version seamlessly shares data between CPU and GPU, without memory copies or cache flushes because it assigns each part of the workload to the most appropriate processor with minimal dispatch overhead. The net result was a 2.3x relative performance gain at a 2.4x reduced power level*. This level of performance is not possible using only multicore CPU, only GPU, or even combined CPU and GPU with today’s driver model. Just as important, it is done using simple extensions to C++, not a totally different programming model. "
Comment
-
Suggestion for benchmarking: Flightgear. I am not really familiar with the cli-options, but it looks quite possible:
Comment
-
Originally posted by taxi_bs View PostSuggestion for benchmarking: Flightgear. I am not really familiar with the cli-options, but it looks quite possible:
http://wiki.flightgear.org/Command_line_optionsMichael Larabel
https://www.michaellarabel.com/
Comment
-
Originally posted by Jumbotron View Post
Pardon my potential ignorance.....but wouldn't HSA ( Heterogeneous System Architecture ) as promoted by AMD and ARM and HMM (Heterogeneous Memory Managment ) as promoted by Intel and Nvidia take care of this issue? Or at least such a time as for the appropriate code optimizations to appear in GIMP ?
Here's a blurb from AMD itself about the benefits of HSA....
" HSA creates an improved processor design that exposes the benefits and capabilities of mainstream programmable compute elements, working together seamlessly. With HSA, applications can create data structures in a single unified address space and can initiate work items on the hardware most appropriate for a given task. Sharing data between compute elements is as simple as sending a pointer. Multiple compute tasks can work on the same coherent memory regions, utilizing barriers and atomic memory operations as needed to maintain data synchronization (just as multi-core CPUs do today).
The HSA team at AMD analyzed the performance of Haar Face Detect, a commonly used multi-stage video analysis algorithm used to identify faces in a video stream. The team compared a CPU/GPU implementation in OpenCL™ against an HSA implementation. The HSA version seamlessly shares data between CPU and GPU, without memory copies or cache flushes because it assigns each part of the workload to the most appropriate processor with minimal dispatch overhead. The net result was a 2.3x relative performance gain at a 2.4x reduced power level*. This level of performance is not possible using only multicore CPU, only GPU, or even combined CPU and GPU with today’s driver model. Just as important, it is done using simple extensions to C++, not a totally different programming model. "
As far I understand HSA is "just" a better way to use integrated accelerators on a SOC, for instance integrated graphics, dsp chips. Since these things is inside the package all of them can have access/share the main memory. However this memory is very slow when compared to the memory used on discrete GPU's (actually at the time of chipsets with north bridge nvidia IGP's also have this ability). For discrete GPU's without direct access to the main memory. There is no magic (Because of that Nvidia and other are investing in things like the NVlink that removes the bottleneck of pci express) you need to copy dat back and forth. However there are various tricks to do that, streaming the data to the gpu and overlapping the communication with computation.
Comment
Comment