Announcement

Collapse
No announcement yet.

AMD Publishes Open-Source Linux HSA Kernel Driver

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fteoOpty64
    replied
    Originally posted by seandarcy View Post
    So, once the HSA commits go into the kernel, x264 will use HSA through OpenCL?
    Yes, your x264 app would have to be OpenCL aware in the first place. It is the OpenCL primitives that calls the HSA abstraction driver to perform the hardware/software function. In terms of dedicating specific HSA cores, it could be manual or automatic. Either explicitly specified or defaulted. This is great progress for APU users of Linux.

    Leave a comment:


  • Jammyamerica
    replied
    AMD APU generations always Rocks

    Leave a comment:


  • Jedibeeftrix
    replied
    3.17?

    Leave a comment:


  • seandarcy
    replied
    Originally posted by bridgman View Post
    .... As examples, a C++ AMP app would need a rebuild while a OpenCL app would probably not.
    So, once the HSA commits go into the kernel, x264 will use HSA through OpenCL?

    Leave a comment:


  • bridgman
    replied
    Not expecting benefits to the kernel directly -- the kernel driver exposes functionality to userspace that toolchains can call on to let apps more-or-less invisibly run faster. The app may or may not need to be rebuilt in order to take advantage of the new functionality depending on the JIT-iness of the language & APIs already in use. As examples, a C++ AMP app would need a rebuild while a OpenCL app would probably not.

    Leave a comment:


  • seandarcy
    replied
    HSA requires application update ??

    Will HSA provide benefits to the kernel itself ? In other words, if I use existing applications will I see any benefit ? Or is this an API, where applications must be modified to see _any_ benefit. I understand that if applications are modified they can take advantage of HSA. The question is, are kernel level operations helped, even without application modification.

    Leave a comment:


  • GreatEmerald
    replied
    So, will earlier AMD APU generations (Bobcat and whatnot) be able to make use of any of this? They probably don't have the Youmu, err, IOMMU to make full use of it, but maybe there could be some minor gains out of this anyway?

    Leave a comment:


  • edgar_wibeau
    replied
    A10-7800 has 3500 MHz base and 3900 MHz max turbo frequency at 65 Watts, that was published when it was introduced on 3dr July.




    As apposed to when A8-7600 was introduced in January, they decided to this time not communicate the speeds for the 45 Watt mode. Same for the desktop "Pro" models, some of which come at 65/45W and 65/35W TDP, with only the speds for the higher TDP having been communicated. Sadly. AMD marketing often is a little questionable. To say it the sugarsweet way.

    And almost nobody seems to have delivered those news. Even more sadly.
    Die AMD Produkt-Webseiten bieten seit der Computex umfangreiche Spezifikationen zu weiteren Modellen der Kaveri-Prozessorenserie für den Desktop-Einsatz. Bislang waren aus dem ursprünglich einmal genannten Portfolio nur die Modellen A10-7700K und

    Leave a comment:


  • bridgman
    replied
    Originally posted by ObiWan View Post
    The upper (and higher) are turbo clocks,
    the lower ones are the standard non turbo clock.
    That made sense for parts with a single power dissipation rating, but for parts with two ratings (eg 65W/45W) where I *think* the clock speeds are different at different ratings it's less clear how to interpret the numbers. I guess for now I'll stay with the cynic's view that the numbers represent the highest power rating

    Leave a comment:


  • name99
    replied
    Originally posted by kaprikawn View Post
    So if I understand this correctly, it means that both the CPU and GPU portions of an APU can both access the same memory (like they've been banging on about for the PS4 and Xbone 180)?

    Does that mean that before, if you had an APU, some of your RAM was allocated to GPU tasks at startup, and when the CPU needed the GPU to do something then it had to transfer the data from the memory addresses used by the CPU to the parts used by the GPU (even if that was on the same physical stick of RAM)?

    If my understanding is correct, I'm guessing it has no benefit for users with a CPU and a dedicated GPU where, obviously, the GPU has it's own RAM on the card?
    The point is that, for historical reasons, the GPU has been treated as a kind of weird peripheral, not as a kind of CPU that just happens to use a different ISA from the main CPU. Suppose you have a SoC (ie GPU on same chip as CPU) and imagine that you ditched all that historical baggage. How would you do things? The obvious model is that the GPU would be treated by the OS as just another "CPU". (Depending on the detail, "the" GPU might in fact be treated by the OS in fact four or six or eight GPU "cores"). CPU and GPU cores would share the same virtual address space in a coherent fashion. The OS would schedule code on the GPU, just like it does on the CPU. The GPU would have the same structured page tables (with the same permissions, and the same ability to fault code or have code paged out). The GPU would support at least a small subset of interrupts (for example an interrupt which would allow for context switching).

    Obviously for certain purpose you would arrange things for optimal performance (just like you arrange thing for optimal audio performance on a CPU). If the task demands it, you would wire down certain pages being accessed by the GPU so that they don't have to fault with the glitch that implies. You'd run certain GPU threads at real-time priority so they aren't interrupted by less important threads, etc. But the basic model is to have the OS controlling memory management and time scheduling for the GPU cores just like for CPUs. The value of this is most obvious when you imagine, for example, that you want a large compute-job to run on your GPU, but you want to time-slice it with something realtime like video decoding or game playing, or just UI. The OS can, on demand, submit bundles of code representing UI updates to the real-time queue and have them immediately executed, but while that's not happening, in any free time, the compute job can do what it does, which might include (for very large jobs) occasionally page faulting to bring in new memory. Compute jobs will no longer have to be written like it's the 80s, manually handling segmentation to swap memory in and out, manually trying to reduce their outer loop to something that lasts for less than a 30th of a second ala co-operative multi-tasking.

    But all this is based on the idea that the CPU and GPU cores share a NoC, a common ultra-highspeed communication system, along with a shared address space and a high performance coherency mechanism (eg common L3 cache). That's not the case for existing discrete GPUs, and it's not clear (at least to me) if it could be made to work fast enough to be useful over existing PCIe. Basically this is a model based on the idea that the future of interest is GPU integrated onto the CPU (or, if necessary, communicating with it by the sort of inter-socket communications pathways you see on multi-socket Xeon motherboards). This fact makes gamers scream in fury because it is very obvious that they are being left behind by this. Well, that's life --- gaming just isn't very important compared to mainstream desktop computing and mobile, the worlds that don't use and don't care about discrete GPUs.

    Leave a comment:

Working...
X