Announcement

Collapse
No announcement yet.

ChipStar 1.2 Released For Compiling & Running HIP/CUDA On SPIR-V/OpenCL Hardware

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ChipStar 1.2 Released For Compiling & Running HIP/CUDA On SPIR-V/OpenCL Hardware

    Phoronix: ChipStar 1.2 Released For Compiling & Running HIP/CUDA On SPIR-V/OpenCL Hardware

    ChipStar 1.2 has been released as the open-source software enabling HIP/CUDA programs to be compiled and run atop SPIR-V whether it be OpenCL or Vulkan drivers...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I remember there being some ridiculous EULA crap enforced by NVidia about them not allowing third-party CUDA runtimes and tooling(?).

    Can anyone clarify if and why this would be exempt from their legal shenanigans? I'm a little out of loop, but would be happy if this is truly untouchable by NVidia.

    Comment


    • #3
      Originally posted by recyclebin View Post
      I remember there being some ridiculous EULA crap enforced by NVidia about them not allowing third-party CUDA runtimes and tooling(?).

      Can anyone clarify if and why this would be exempt from their legal shenanigans? I'm a little out of loop, but would be happy if this is truly untouchable by NVidia.
      AFAIU, this works similarly to HIP, where you recompile the pipeline, rather than ZLUDA and similar that translate an already-compiled pipeline.

      Comment


      • #4
        Originally posted by QwertyChouskie View Post

        AFAIU, this works similarly to HIP, where you recompile the pipeline, rather than ZLUDA and similar that translate an already-compiled pipeline.
        To answer that:

        Yes, ChipStar works similarly to HIP (with it literally using HIP as a base) in that it **recompiles** the CUDA pipeline at the source level, converting it to HIP for execution on different hardware (AMD, Intel, etc.).

        This is different from ZLUDA, which translates an **already-compiled** CUDA binary.

        ChipStar’s `cucc` allows direct compilation of CUDA sources into HIP, avoiding NVIDIA’s proprietary pipeline, whereas ZLUDA tries to run compiled CUDA binaries on GPUs through real-time translation.​

        Code:
        ================================================== ==============================
        ChipStar 1.2 Compilation Flow
        ================================================== ==============================
        
        1. Source Code Flow
        ────────────────────
        Source Code (CUDA)
        │
        └── cucc (Drop-in nvcc)
        ├── Translates CUDA → HIP
        ├── Handles headers (e.g., dummy cublas_v2.h)
        └── Compiles for platform-specific backends
        
        2. Translation Process
        ──────────────────────
        cucc
        │
        └── HIP Translation Layer
        ├── Converts CUDA API to HIP
        │ └── Example: cudaMalloc → hipMalloc
        ├── Example Translations:
        │ ├── cudaMemcpy → hipMemcpy
        │ ├── cudaFree → hipFree
        │ └── ... (rest of CUDA to HIP mappings of similar nature)
        └── cucc to ROCm-Specific Optimizations
        ├── Adjusts for ROCm’s intrinsic optimizations
        ├── Enhances memory management for AMD GPUs
        └── Integrates GCN, CDNA, RDNAX hardware specifics
        
        3. Backend Mapping
        ───────────────────
        Backends (Target Platforms)
        │
        ├── ROCm (AMD GPUs)
        ├── OpenCL (SPIR-V) → Intel/ARM GPUs
        └── Level Zero API → Intel GPUs (low-level execution)
        
        └── Intermediate Representation Details (OpenCL - SPIR-V)
        ├── Optimizes SPIR-V for backend hardware
        ├── Facilitates cross-architecture compatibility
        └── Maintains performance w/o loss
        
        4. Execution Interplay (Detailed)
        ────────────────────────────────────
        ┌──────────────────────────────┐
        │ Backend Execution │
        └───────────────┬──────────────┘
        │
        ┌───────────┼───────────┐
        │ │ │
        ▼ ▼ ▼
        ROCm OpenCL (SPIR-V) Level Zero API
        (AMD GPUs) (Intel/ARM) (Intel GPUs)
        │ │ │
        │ │ │
        │ │ ├── Thread Synchronization in Level Zero (Nothing exciting here)
        │ │ │ ├── Mutexes
        │ │ │ ├── Barriers
        │ │ │ └── Memory Management
        │ │ │
        │ └── Compiles to SPIR-V
        │ └── Executes on Intel and ARM GPUs
        │
        └── Direct execution via HIP API
        ├── Optimized for AMD GPUs
        ├── (Means its well-integrated with the ROCm Ecosystem)
        └── Utilizes ROCm-Specific Optimizations
        
        5. Fallback Paths and Redundancy Mechanisms
        ────────────────────────────────────────────
        Across Backend Execution Steps:
        ├── If OpenCL support is lacking:
        │ └── Fallback to Level Zero or ROCm (if available)
        ├── If ROCm optimizations fail:
        │ └── Default to generic HIP execution
        └── Yey for executive continuity?
        
        6. Multi-Level Compilation Validation
        ──────────────────────────────────────────
        cucc
        ├── Validates HIP Translation
        └── Verifies Target-Specific Backend Optimizations
        
        7. Performance Profiling
        ────────────────────────
        ChipStar
        ├── Leverages ROCm Profiler for AMD GPUs
        ├── Utilizes OpenCL Profiling Tools for SPIR-V
        └── Employs Level Zero Profiling for Intel GPUs
        └── Assesses execution performance
        └── Optimizes code paths based on profiling data
        
        8. Legal Shenanigans
        ────────────────────────
        - *Source-level translation
        └── Avoids binary or runtime reverse-engineering
        - *Public CUDA documentation
        └── Maps API calls transparently
        - *Open standards:
        ├── Utilizes OpenCL, SPIR-V, Level Zero
        └── (Ensures compatibility without NVIDIA’s proprietary components)
        
        ================================================== ==============================​
        So, hopefully this will address the ZLUDA-related chat as well as the other poster’s legality concerns.

        Comment


        • #5
          Originally posted by A1B2C3 View Post
          intel can return to service and reach a new level if it implements royal cores that will be assembled into one core when needed, or use an integrated GPU to receive a huge number of threads, all depending on the task. if you need high performance per core-then royal cores. if we urgently need a large number of threads and cores, then an integrated GPU + CPU.
          That is pretty much how SYCL works, most C++ code run on the CPU, the heavy parallel stuff is done in lambda functions executed by the GPU.

          Comment


          • #6
            Originally posted by A1B2C3 View Post

            Please don't tell me children's stories. the idea of royal cores is simply gorgeous. they can even be called Gelsinger cores. If NVIDIA engineers had implemented this on CUDA cores, when they could combine into one large core, then it would be super. this solution is more suitable for GPU. in general, the topic is very interesting. thanks .
            Having 1 huge core is a dumb idea, current big cores are already very large and show less benefits as they grown, especially Intel P-cores at 4mm^2. (Aka law of diminishing returns).

            Simply because power consumption rise exponentially with clock speed, so you are pretty much stuck at ~5.7 GHZ today and unlikely to rise much more (Pentium 4 was running 3 GHZ in 2003), and increasing IPC is also difficult due to RAM latency.

            So puts a few big cores is much more efficient than 1 huge core.

            Again, SYCL is programming model that allows easy mixing of CPU and GPU code, already provides what you suggest without hardware changes.

            Comment


            • #7
              ChipStar don't give me hope! I'm on the ZULDA copium train and it's running full speed!

              Originally posted by A1B2C3 View Post

              Please don't tell me children's stories. the idea of royal cores is simply gorgeous. they can even be called Gelsinger cores. If NVIDIA engineers had implemented this on CUDA cores, when they could combine into one large core, then it would be super. this solution is more suitable for GPU. in general, the topic is very interesting. thanks .
              Not sure if troll or birdy

              Comment


              • #8
                Originally posted by A1B2C3 View Post

                What kind of nonsense are you talking about? Do you understand the meaning? for example, one CPU/GPU core is choking and it will soon run out of processor time. N number of cores will be connected to it, forming, as it were, one core. this is figuratively and simplistically said. and these N cores will quickly process the task. these N cores are thought of as one for the task.
                That is not how it works...

                you're suggesting automatically running a program in parallel, if it was so easy it would already be done, a program that is coded to run in 1 core makes lots of assumptions, wich are not held when running in multiple cores, there will he data races and race conditions all over the places.

                There are special compilers (Intel ISPC) and extensions (OpenMP), wich allow running code in multiple cores with little work, but the compiler still need to validate the code to see if it can run in parallel, and then generate the code.

                I will Not respond you again, very suspicious that you are a troll.

                Comment

                Working...
                X