Announcement

Collapse
No announcement yet.

AMD Is Prepared To Release A Complete User-Space Open-Source Stack For HSA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by mmstick View Post
    There's a reason why China made an announcement that they were buying 10,000,000 AMD GPUs

    http://www.phoronix.com/scan.php?pag...tem&px=MTEyNTE
    Link to statement from Chinese government announcing they're purchasing 10 million AMD GPUs?

    Comment


    • #22
      Originally posted by johnc View Post
      Link to statement from Chinese government announcing they're purchasing 10 million AMD GPUs?
      They might be referencing this switch from Nvidia to AMD announced in mid 2012.

      Comment


      • #23
        Would be nice for AMD. They need the ... umm, Yuan Renminbi.

        Anyway nice addition. Sadly I don't have an HSA ready sytem yet (iirc Kabini ist not HSA) but Kaveri will probably be my next one in my main box.
        Thanks for the work and putting speck & sauce to the people AMD devs.
        Stop TCPA, stupid software patents and corrupt politicians!

        Comment


        • #24
          Originally posted by uid313 View Post
          Could HSA be used by Intel and Nvidia too?

          I find vendor-specific technology utterly boring and uninteresting.
          From a quick glance at http://www.hsafoundation.com/ it seems that Intel and NVidia are not members. However other companies that have GPU IP are, among them ARM (Mali), Broadcom (VideoCore), Imagination Technologies (PowerVR), Qualcomm (Adreno), and Vivante (GC).

          Comment


          • #25
            Originally posted by Adarion View Post
            Anyway nice addition. Sadly I don't have an HSA ready sytem yet (iirc Kabini ist not HSA) but Kaveri will probably be my next one in my main box.
            What I find strange is that older AMD roadmaps listed Kabini as having "New HSA Features" while more recent AMD marketing material avoids mentioning Kabini and HSA together. Then again according to AMD's Marc Diana, the PS4 and Xbox One support HSA/hUMA to some degree, and their SoCs are basically bigger Kabinis. Maybe it didn't work out as planned.

            Comment


            • #26
              I think the distinction is between "some HSA features" (Kabini) and "all HSA features" (Kaveri). IIRC the key issue for Kabini is 40-bit GPU virtual addresses versus the 48-bit virtual addresses in Kaveri's GPU.

              The 48-bit virtual addresses in the Kaveri GPU match what you get in an AMD64 CPU today, allowing full pointer equivalency between CPU and GPU when accessing system memory via the IOMMUv2.
              Last edited by bridgman; 10 November 2014, 11:23 AM.
              Test signature

              Comment


              • #27
                Originally posted by Nille View Post
                AMD is not well known for vendor specific stuff. But you can ask Nvidia about multivendor CUDA
                or g-sync

                Comment


                • #28
                  The HSA-1.0 spec @ http://amd-dev.wpengine.netdna-cdn.c...2/10/hsa10.pdf shows up some light into memory semantics.

                  3. Memory Model

                  3.1.Overview

                  A key architectural feature of HSA is its unified memory model. In the HSA memory model, a combined
                  latency/throughput application uses a single unified virtual address space. All HSA-accessible memory
                  regions are mapped into a single virtual address space to achieve Shared Virtual Memory (SVM)
                  semantics.
                  Memory regions shared between the LCU and TCU are coherent. This simplifies programming by
                  eliminating the need for explicit cache coherency management primitives, and it also enables finer-
                  grained offload and more efficient producer/consumer interaction. The major benefit from coherent
                  memory comes from eliminating explicit data movement and eliminating the need for explicit heavyweight
                  synchronization (flushing or cache invalidation). The support of existing programming models that
                  already use flushing and cache invalidation can also be supported, if needed.

                  3.2. Virtual Address Space

                  Not all memory regions need to be accessible by all compute units. For example:

                  • TCU work-item or work-group private memory need not be accessible to the LCUs. In fact, each work-
                  item or work-group has its own copy of private memory, all visible in the same virtual address space.
                  Private memory accesses from different work-items through the same pointer result in accesses to
                  different memory by each work-item; each work-item accesses its own copy of private memory. This is
                  similar to thread-local storage in CPU multi-threaded applications. Access to work-item or work-group
                  memory directly by address from another accessor is not supported in HSA.

                  • LCU OS kernel memory should not be accessible to the TCUs. The OS kernel must have ownership of
                  its own private data (process control blocks, scheduling, memory management), so it is to be expected
                  that TCUs should not have access to this memory. The OS kernel, however, may expose specific
                  regions of memory to the TCUs, as needed.

                  When a compute unit dereferences an inaccessible memory location, HSA requires the compute unit to
                  generate a protection fault. HSA supports full 64-bit virtual addresses, but currently physical addresses
                  are limited to 48 bits, which is consistent with modern 64-bit CPU architectures.

                  3.2.1. Virtual Memory Regions

                  HSA abstracts memory into the following virtual memory regions. All regions support atomic and
                  unaligned accesses.

                  • Global: accessible by all work-items and work-groups in all LCUs and TCUs. Global memory embodies the main advantage of the HSA unified memory model: it provides data sharing between CUs and TCUs.
                  • Group: accessible to all work-items in a work-group.
                  • Private: accessible to a single work-item.
                  • Kernarg: read-only memory used to pass arguments into a compute kernel.
                  • Readonly: global read-only memory.
                  • Spill: used for load and store register spills. This segment provides hints to the finalizer to allow it to generate better code.
                  • Arg: read-write memory used to pass arguments into and out of functions.

                  3.3.Memory Consistency and Synchronization

                  3.3.1. Latency Compute Unit Consistency

                  LCU consistency is being dictated by the host processor architecture. Different processor architectures
                  may have different memory consistency models, and it is not the scope of HSA to define these models.
                  HSA needs to operate, however, within the constraints of those models.

                  3.3.2.Work-item Load/Store Consistency

                  Memory operations within a single work-item to the same address are fully consistent and ordered. As a
                  consequence, a load executed after a store by the same work-item will never receive stale data, so no
                  fence operations are needed for single work-item consistency. Memory operations (loads / stores) at
                  different addresses, however, could be re-ordered by the implementation.

                  3.3.3. Memory Consistency across Multiple Work-Items

                  The consistency model across work-items in the same work-group, or work-items across work-groups,
                  follows a “relaxed consistency model”: from the viewpoint of the threads running on different compute
                  units, memory operations can be reordered.

                  • Loads can be reordered after loads.
                  • Loads can be reordered after stores.
                  • Stores can be reordered after stores.
                  • Stores can be reordered after loads.

                  • Atomics can be reordered with loads.
                  • Atomics can be reordered with stores.

                  This relaxed consistency model allows better performance. In cases where a stricter consistency model
                  is required, explicit fence operations or the use of the special load acquire (ld_acq) and store release
                  (st_rel) is needed.
                  Last edited by dibal; 23 November 2014, 12:32 PM.

                  Comment

                  Working...
                  X