Announcement

**drag** · 20 June 2012, 11:33 AM

Looking at slide 29 of the HSA presentation, it looks like the Khronos APIs ( OpenGL, EGL, OpenVG, OpenGL ES, OpenCL etc.) will be implemented in terms of HSA. If that's true, I would take that to mean that a programmer might also be able to program directly with the HSA "language" as well... Or will it be a nice protocol similar to OpenGL et al. The mind runs wild.

Well the thing to keep in mind is that all 'code' is, well 'code'.

C code, python code, machine code... all these things are software code that can be programmed and modified.

With x86 you have what is called the ISA.

ISA can be seen the API for your CPU. You may have heard of the CISC versus RISC processors and which is better or whatever. While x86 is a CISC instruction set, but all modern processors are in fact RISC processors. Even your x86-64 CPUs are all RISC processor cores.

How it works is that Intel and AMD have a hardware-based abstraction layer that takes the CISC instructions and transforms them into RISC code that is what is actually executed. The so-called 'x86-tax' in the ARM versus x86 wars refers to the complex set of transistors that perform this function. This is just about the only reason the Atom processor hasn't displaced ARM on phones yet.

Well this ISA is very important. This why code compiled for ancient 486 processors can be made to still work today.

But we have a problem......

Modern processors, due to Moore's Law, is ever increasing in complexity. Every 18 months or so they can double in complexity while not increasing in price.

Unfortunately there is a limit on useful it is to increase the complexity of a single processor core to make it faster. We saw this with the Pentium-4 netburst architecture. It tried to use long pipelines to go faster, but it ended up being beaten by AMD who took a different approach.

So now instead of making the processor more complex we just add more processor cores. 2 cores. 4 cores. 8 cores. 16 cores. That means we can do more and more work on a single machine.

But we still have a problem....

There is a limit on how useful this approach is. On servers people have taken advantage of this through virtualization and such things, but on desktops and laptops a 32 core system isn't really going to be much faster then a 4 or 6 core system running the same type of cores at the same clock speeds. There just isn't a whole lot going on except for the main thing that the person is paying attention too. And threaded programming isn't the panacea that most people seem to think it is.

What is the solution to making things go faster/more efficient then?

Well you use different types of cores for different things. Different processors are better at different things. So if you have multiple processors of different types then you can go faster and save more battery.

This is why closed source drivers and non-disclosing GPU designs are such shit when it comes to general computing. The GPU is ceasing to be a separate entity. It's not something you just add on to make games go faster or to make video smooth.

The GPU is one of those 'different cores' that you can integrate into your processor to make everything go faster.

But to be used on the GPU the software needs to be able to use it.

I think that this is what a big part of AMD's HSA is about.

**curaga** · 20 June 2012, 02:01 PM

Originally posted by rrohbeck

Now if they could explain what they plan in a way that a simple computer scientist like myself understands what they're going to do...

You use their lib (sorts etc other basic operations), and they then run on the cpu or gpu if available.

**popper** · 20 June 2012, 02:05 PM

strip away all the ingrained bullshit PR and its fun to finally see the "Co-Processor" make a comeback into the popular market place mindset , and make no mistake about it that's exactly what they mean by Heterogeneous System Architecture (HSA) and how you might collect up all the separate API's re badge it and call it something new.

personally iv always preferred May's Law that states "Software efficiency halves every 18 months, compensating Moore's Law"

http://en.wikipedia.org/wiki/David_M...ter_scientist) the original inventor and lead architect of the so called Heterogeneous System Architecture with his http://en.wikipedia.org/wiki/Transputer chips and now founder and Chief Technology Officer of XMOS Semiconductor.

**Drago** · 20 June 2012, 02:27 PM

You are probably right. In the end, what AMD want is to remove FPU part from the CPU, and use the highly parallel GPU part for that. This means they will have to have GPU<->CPU shared memory pointers, context switch awareness, and all that fancy stuff mentioned in the slides.

Originally posted by popper View Post

strip away all the ingrained bullshit PR and its fun to finally see the "Co-Processor" make a comeback into the popular market place mindset , and make no mistake about it that's exactly what they mean by Heterogeneous System Architecture (HSA) and how you might collect up all the separate API's re badge it and call it something new.

personally iv always preferred May's Law that states "Software efficiency halves every 18 months, compensating Moore's Law"

http://en.wikipedia.org/wiki/David_M...ter_scientist) the original inventor and lead architect of the so the called Heterogeneous System Architecture with his http://en.wikipedia.org/wiki/Transputer chips and now founder and Chief Technology Officer of XMOS Semiconductor.

**Drago** · 20 June 2012, 02:28 PM

So you are saying that apple, don't use the best of the best

Originally posted by ldesnogu View Post

They probably used what is considered in the industry as the best C++ front-end: Edison Design Group C++. It's used by almost all larger companies that produce their own compilers (TI, Intel to name a few).

**smitty3268** · 20 June 2012, 02:39 PM

HSA is AMD's vision of the future of integrating the CPU and GPU

So that apps can use the GPU seamlessly for compute.

It includes both hardware and software elements.

On the software side, they are creating a generic IL that is not vendor specific, and allowing drivers to compile that down to the hardware. Actually, it sounds kind of like Gallium/TGSI. So languages would get compiled down to HSAIL, and then drivers would take that and run it on the GPU.

AMD is addressing this via HSA. HSA addresses these fundamental points by introducing an intermediate layer (HSAIL) that insulates software stacks from the individual ISAs. This is a fundamental enabler to the convergence of SW stacks on top of HC.

Unless the install base is large enough, the investment to port *all* standard languages across to an ISA is forbiddingly large. Individual companies like AMD are motivated but can only target a few languages at a time. And the software community is not motivated if the install base is fragmented. HSA breaks this deadlock by providing a "virtual ISA" in the form of HSAIL that unifies the view of HW platforms for SW developers. It is important to note that this is not just about functionality but preserves performance sufficiently to make the SW stack truly portable across HSA platforms

Hardware is supposed to be out in 2014, with certain elements done earlier:

Existing APIs for GPGPU are not the easiest to use and have not had widespread adoption by mainstream programmers. In HSA we have taken a look at all the issues in programming GPUs that have hindered mainstream adoption of heterogeneous compute and changed the hardware architecture to address those. In fact the goal of HSA is to make the GPU in the APU a first class programmable processor as easy to program as today's CPUs. In particular, HSA incorporates critical hardware features which accomplish the following:

1. GPU Compute C++ support: This makes heterogeneous compute access a lot of the programming constructs that only CPU programmers can access today

2. HSA Memory Management Unit: This allows all system memory is accessible by both CPU or GPU, depending on need. In today's world, only a subset of system memory can be used by the GPU.

3. Unified Address Space for CPU and GPU: The unified address space provides ease of programming for developers to create applications. By not requiring separate memory pointers for CPU and GPU, libraries can simplify their interfaces

4. GPU uses pageable system memory via CPU pointers: This is the first time the GPU can take advantage of the CPU virtual address space. With pageable system memory, the GPU can reference the data directly in the CPU domain. In all prior generations, data had to be copied between the two spaces or page-locked prior to use

5. Fully coherent memory between CPU & GPU: This allows for data to be cached in the CPU or the GPU, and referenced by either. In all previous generations GPU caches had to be flushed at command buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU share a high speed coherent bus

6. GPU compute context switch and GPU graphics pre-emption: GPU tasks can be context switched, making the GPU in the APU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. As UI's are becoming increasing more touch focused, it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions. With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized

As a result, HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of CPUs (sequential programming) and GPUs (parallel processing) to deliver new capabilities to users that go beyond the traditional usage scenarios. It may be the first time a processor company has made such significant investment primarily to improve ease of programming!

In addition on an HSA architecture the application codes to the hardware which enables user mode queueing, hardware scheduling and much lower dispatch times and reduced memory operations. We eliminate memory copies, reduce dispatch overhead, eliminate unnecessary driver code, eliminate cache flushes, and enable GPU to be applied to new workloads. We have done extensive analysis on several workloads and have obtained significant performance per joule savings for workloads such as face detection, image stabilization, gesture recognition etc…

Finally, AMD has stated from the beginning that our intention is to make HSA an open standard, and we have been working with several industry partners who share our vision for the industry and share our commitment to making this easy form of heterogeneous computing become prevalent in the industry. While I can't get into specifics at this time, expect to hear more about this in a few weeks at the AMD Fusion Developer Summit (AFDS).

So you see why HSA is different and why we are excited

Quotes are from AMD via Anandtech article: http://www.anandtech.com/show/5847/a...ds-manju-hegde

**jvillain** · 20 June 2012, 04:08 PM

Certainly powerful stuff and the future. But I would rather see a proper spec for the new AMD GPU's any day of the week.

**entropy** · 20 June 2012, 04:18 PM

Originally posted by jvillain View Post

Certainly powerful stuff and the future. But I would rather see a proper spec for the new AMD GPU's any day of the week.

[Slightly offtopic]

Good point.

Are the documents found at http://www.x.org/docs/AMD/ complete, by means of released docs?
If so - that means there's no specific documentation available for Evergreen (HD 5000) and newer. I wasn't aware of that. :/
Otherwise, can someone in charge please dump the newer stuff in there?

Is there another location for the docs that I'm overlooking?

**Drago** · 20 June 2012, 05:15 PM

http://developer.amd.com/SDKS/AMDAPPSDK/documentation/Pages/default.aspx

Originally posted by entropy View Post

[Slightly offtopic]

Good point.

Are the documents found at http://www.x.org/docs/AMD/ complete, by means of released docs?
If so - that means there's no specific documentation available for Evergreen (HD 5000) and newer. I wasn't aware of that. :/
Otherwise, can someone in charge please dump the newer stuff in there?

Is there another location for the docs that I'm overlooking?

**entropy** · 20 June 2012, 05:28 PM

Originally posted by Drago View Post

http://developer.amd.com/SDKS/AMDAPP...s/default.aspx

Thanks!

Still, information seems a bit scattered all over the place.

Announcement

AMD To Open-Source Its Linux Execution & Compilation Stack

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment