Announcement

**Linuxhippy** · 13 January 2016, 06:49 PM

From a technological point of view this is really impressive - finally programming accelerator-cores does not require to think in two completly different worlds.

**boxie** · 13 January 2016, 08:47 PM

Quick question for those in the know - How does fallback work?

If you have an HSA compliant CPU and System - it obviously works
What happens if you have run the same binary on a system that does not have the prerequisites - Does it refuse to run/crash or does it fallback gracefully to just using CPU cores?

**bridgman** · 13 January 2016, 10:30 PM

Originally posted by boxie View Post

Quick question for those in the know - How does fallback work?

From a spec perspective, if you have the HSA stack but not an HSA compliant accelerator, then it falls back gracefully to executing HSAIL on the CPU cores.

From a practical perspective, most apps (or rather the toolkits they ran on) had runtime switches for presence/absence of specific HW already so it's easier to say "if we don't have an HSA stack and compliant HW then do things the way you did before".

**Rakot** · 13 January 2016, 11:18 PM

Originally posted by bridgman View Post

From a spec perspective, if you have the HSA stack but not an HSA compliant accelerator, then it falls back gracefully to executing HSAIL on the CPU cores.

From a practical perspective, most apps (or rather the toolkits they ran on) had runtime switches for presence/absence of specific HW already so it's easier to say "if we don't have an HSA stack and compliant HW then do things the way you did before".

Hello John,

From developer's point does it mean that I can write programs using OpenMP and be able to offload some calculations to GPU? What programming languages are supported? Does it support Fortran which is kind of standard in scientific community?

**starshipeleven** · 14 January 2016, 07:02 AM

AFAIK, HSA is mostly handled by the compiler, that detects parallel-computable parts in your source and generates the HSA bytecode for them (on some languages you need to flag them for it or something like that), it also automatically deals with the fact that you now have "stuff run on CPU" and "stuff run on GPU", by adding the code to let them call/communicate-with each other.

So if the compiler is HSA-aware, it should work on any language and be relatively painless (if compared to current ways to get GPU acceleration).

**Meteorhead** · 14 January 2016, 07:45 AM

The API support for the HSA back-end is indeed a good question. I am quite sure that it is not automatic, as no compiler in the world has managed to pull that off just yet (automatic offload to IGP/dGPU).

Most likely all off the offload capable front-ends (OpenMP 4.0, OpenACC, CUDA, etc.) can make use of HSA enabled HW, if they are available and the compiler has been built accordingly. I would like to know if indeed all supported front-ends can benefit from the HSA back-end, or it only supports the OpenMP 4.0 API? As for the fortran question: yes, it does support fortran, as far as OpenMP 4.0 goes with the fortran bindings.

ps.: I believe the significance of fortran is decreasing, as more and more such codebase is ported over to C++. Fortran has some features that are still missing from C++ (or at least is provided as an external lib), but in the end, it is all too painful to use and maintain. I am remotely related to CERN SW development, and the pain of fortran is very apparent, and so is the need to migrate away from it. Such work in GCC supporting fortran is something like filling a Skoda 120 with 100 Octane# fuel. It will be faster, but you know it doesn't feel right.

**starshipeleven** · 14 January 2016, 03:02 PM

Originally posted by Meteorhead View Post

The API support for the HSA back-end is indeed a good question. I am quite sure that it is not automatic, as no compiler in the world has managed to pull that off just yet (automatic offload to IGP/dGPU).

I read this (official documentation, I read only the simpler parts), if you can understand it better than me (which is likely), let me know what you understood from it.

Page not found – Heterogeneous System Architecture Foundation

http://www.hsafoundation.com/html/HSA_Library.htm

The relevant part I'm talking about is here

Page not found – Heterogeneous System Architecture Foundation

http://www.hsafoundation.com/html/HSA_Library.htm#Runtime/Topics/01_Intro/overview.htm%3FTocPath%3DHSA%2520Runtime%2520Programmer%25E2%2580%2599s%2520Reference%2520Manual%2520Version%25201.0%2520|Chapter%25201.%2520Introduction|_____1

The way I understood it, the plan is:
-the HSA-aware compiler turns parallel parts of a program into HSA bytecode while you are compiling it (by either knowing or by following flags left by the developer),
-the generic HSA bytecode is compiled into GPU/accelerator-specific code by a compiler made by that specific GPU/Accelerator's maker (don't know if this happens on the user's system or before shipping the software).
-the linking between the part that runs on the CPU and the one that will run on GPU seems to be done automatically by the compilers

Besides, they also mention that Java can be accelerated by HSA, so I think it is a bit more revolutionary than just "AMD's version of CUDA", I don't think it is just a fancy API.

HSA isn't just software, it is also hardware. The automatic off-load is possible because the hardware shares access space or does things in a different way, so that both GPU and CPU can access the same ram directly (and not that the GPU needs to ask the CPU to work on ram and get it to move stuff forth and back, I've seen something that was like that).

**Meteorhead** · 20 January 2016, 10:31 AM

You understand it correctly. The compiler turns parallel parts of the code into HSAIL and embeds it into the executable. That is what gets shipped. The IL has been designed so that it can be compiled to binary rapidly. This is done once your .exe starts, at a very early stage of execution. Some omni-present entity in your application (just like the C Runtime) takes care of detecting the available HSA finalizers on the system (very similar to ICD OpenCL runtimes) and tailors the HSAIL to your hardware. This happens so quickly, it is of no concern to you.

The tricky part is the "parallel part" of the code. You have to mark such parts via some API. GPGPU requires not just any kind of parallelism, but data parallelism. It is one of the simplest, but not easy to detect automatically in a way that is friendly to the GPU. The Portland Group compilers have strived to create automatically GPU parallelizing compilers for a long time (before being bought by Nvidia), but even through this acquisition, Nvidia managed to release OpenACC, practically an OpenMP variant. You, as the developer have to mark parts of the code that you intend to execute in parallel, and you have to take care of being GPU friendly on your own (ordering of loads, avoiding branching, explicitly using shared memory, syncing only in small execution groups (warps/wavefronts), etc.).

I am aware HSA is not only software, but HW also, and the common memory address space (and common hardware for memory) makes it fast, but HSA says nothing about how you obtain HSAIL. It provides no API for you to mark code as parallel, it does not define language constructs in any programming language that would enable you to decorate code naming all the different kinds of cache systems, buses and what not, things that HSA is actually aware of.

I hope I was clear.

**bridgman** · 20 January 2016, 10:55 AM

Originally posted by Meteorhead View Post

The tricky part is the "parallel part" of the code. You have to mark such parts via some API. GPGPU requires not just any kind of parallelism, but data parallelism. ...

... but HSA says nothing about how you obtain HSAIL. It provides no API for you to mark code as parallel, it does not define language constructs in any programming language that would enable you to decorate code naming all the different kinds of cache systems, buses and what not, things that HSA is actually aware of.

Right. This is where HCC comes in. It includes support for Parallel STL extensions, which allow you to mark potentially-parallel sections of code.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0069r0.pdf

Announcement

AMD HSA Support Finally Appears Ready To Be Merged In GCC

AMD HSA Support Finally Appears Ready To Be Merged In GCC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment