Announcement

Collapse
No announcement yet.

Shopping For A Launch-Day AMD Ryzen AI 300 Series Laptop For Linux Testing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oleid
    replied
    Originally posted by krzyzowiec View Post

    Yes, and absolutely, I require full Linux support. Qualcomm is not there yet, but they are working on it. They announced that the major features required to use this chip effectively will be mainlined in kernel version 6.11. Based on the kernel schedule, that should be around 3 months from now. (and even longer for the optimizations to hit) I don't mind waiting a bit though. I can check out what Microsoft's been up to while I've been away. :P
    Yeah, I read about kernel 6.11. But I was especially wondering what GPU driver they will provide. Will it be https://docs.mesa3d.org/drivers/freedreno.html or maybe something proprietary? And will the NPU be supported? In any case, I'll keep an eye on it as well.

    Leave a comment:


  • krzyzowiec
    replied
    Originally posted by oleid View Post

    By the way, are or will be all Snapdragon drivers mainline, especially GPU? Is this a criteria for you?
    Yes, and absolutely, I require full Linux support. Qualcomm is not there yet, but they are working on it. They announced that the major features required to use this chip effectively will be mainlined in kernel version 6.11. Based on the kernel schedule, that should be around 3 months from now. (and even longer for the optimizations to hit) I don't mind waiting a bit though. I can check out what Microsoft's been up to while I've been away. :P

    Leave a comment:


  • skeevy420
    replied
    Originally posted by avis View Post

    tmpfs is 100% dynamic. If you've got no directories/files on it, it consumes around 0 bytes. It doesn't take 50% of RAM by default contrary to what you say/claim/imply. And if you really need to store huge temporary files, you may as well use your storage instead of RAM.

    On three of my Fedora installations all the tmpfs directories combined occupy less than 1MB.
    It used to not be that way...at least I can swear I remember TMPFS preallocating memory...perhaps I'm remembering back when we used to have to set all this up manually and I'm merging the memories together

    I'm using 35GB with just Firefox, Steam, Dolphin, and Yakuake running....because my dynamic RAM user, ZFS, is gobbling up 22GB. /tmp is using up a whopping 16K .

    Leave a comment:


  • avis
    replied
    Originally posted by skeevy420 View Post

    24/32GB is fine for Windows end users. It may or may not be fine for professional users or Linux end users. Because I'm a Linux user that knows how systemd treats ram, 32GB is basically 16GB of ram after TMPFS needs are taken care of. An end user may or may not know to edit fstab to limit how much ram /tmp will gobble up. So with 24GB of ram and a game needing 12GB, thats nothing left for the actual system. It's all allocated to potential temp storage and a game
    tmpfs is 100% dynamic. If you've got no directories/files on it, it consumes around 0 bytes. It doesn't take 50% of RAM by default contrary to what you say/claim/imply. And if you really need to store huge temporary files, you may as well use your storage instead of RAM.

    On three of my Fedora installations all the tmpfs directories combined occupy less than 1MB.

    Leave a comment:


  • skeevy420
    replied
    Originally posted by avis View Post

    24/32GB soldered RAM is actually just fine for 99% of people out there. Thankfully, someone at ASUS or AMD has realized that 8/16GB laptops just don't cut it nowadays.

    Not only some modern games require ~12GB of RAM (I'm not counting VRAM), LLMs eat RAM for breakfast. Of course, it would be nice to see 64GB or ever 128GB SKUs as well for those who need to run VMs or even larger LLMs.
    24/32GB is fine for Windows end users. It may or may not be fine for professional users or Linux end users. Because I'm a Linux user that knows how systemd treats ram, 32GB is basically 16GB of ram after TMPFS needs are taken care of. An end user may or may not know to edit fstab to limit how much ram /tmp will gobble up. So with 24GB of ram and a game needing 12GB, thats nothing left for the actual system. It's all allocated to potential temp storage and a game

    Leave a comment:


  • avis
    replied
    Originally posted by nealeu View Post

    That's really a matter of opinion. A "nice" machine where you are limited to 24 or 32GB soldered, and can't upgrade and repair it easily is what's been frustrating me for years. When the Framework 16 launched I joined the waiting list and have something far better than anything I've been able to get before.
    And as someone in the UK I can actually buy it. There have been ASUS, HP and Lenovo laptops that I see adverts for "coming soon" and never get sold in the UK.

    I really like that I have something that isn't everyone's take on a Macbook Air or Pro.
    24/32GB soldered RAM is actually just fine for 99% of people out there. Thankfully, someone at ASUS or AMD has realized that 8/16GB laptops just don't cut it nowadays.

    Not only some modern games require ~12GB of RAM (I'm not counting VRAM), LLMs eat RAM for breakfast. Of course, it would be nice to see 64GB or ever 128GB SKUs as well for those who need to run VMs or even larger LLMs.

    Leave a comment:


  • nealeu
    replied
    Originally posted by sophisticles View Post
    That ASUS Zenbook S is so much nicer than anything Framework, System76, and TUXEDO put out and at a much better price point.

    It makes sense, Asus is a big player and can afford to sell their products at a more competitive price point, those other three are probably barely treading water and don't have the same flexibility.
    That's really a matter of opinion. A "nice" machine where you are limited to 24 or 32GB soldered, and can't upgrade and repair it easily is what's been frustrating me for years. When the Framework 16 launched I joined the waiting list and have something far better than anything I've been able to get before.
    And as someone in the UK I can actually buy it. There have been ASUS, HP and Lenovo laptops that I see adverts for "coming soon" and never get sold in the UK.

    I really like that I have something that isn't everyone's take on a Macbook Air or Pro.

    Leave a comment:


  • timofonic
    replied
    Cerebras WSE-3 (125 petaflops, 300000 TOPS):
    - 31.25x (125/4) than NVIDIA H100 GPU (4 petaflops, 30000 TOPS).
    - 2500x (125/0.05) than AMD Ryzen AI NPU's (0.05 petaflops, 50 TOPS).
    - 2604x (125/0.048) than Intel NPU 4 (0.048 petaflops, 48 TOPS).
    - 4166x (125/0.03) than Smartphone NPUs (0.03 petaflops, 30 TOPS).
    - 2500x (125/0.05) than AMD Radeon AI 300 Series (0.05 petaflops, 50 TOPS).
    - 1689.19x (125/0.074) than AMD Radeon RX 7000 Series (0.074 petaflops, 70000 TOPS).
    - 8503.4x (125/0.0147) than Intel Arc A750 (0.0147 petaflops, 2000 TOPS).
    - 762.939x (125/0.16384) than NVIDIA RTX 4090 (0.16384 petaflops, 100000 TOPS).

    My humble point of view: Until we reach at least WSE-3 performance at cheap consumer hardware, AI stuff will be a toy outside supercomputing and cloud. 10 years to reach it? 40 years? Stuff such as Copilot+ is like VR, still too experimental and more of a cool gimmick than real use. I hope NPUs will be useful outside AI/ML circus.

    Leave a comment:


  • timofonic
    replied
    Originally posted by Mitch View Post

    It's funny to think you can ask the laptop this question about its own capabilities, given it has the right software.
    I doubt it. I experimented with a LLMs and only managed the following after tons of iterations and guiding. And ask to generate very specific stuff separately. Etc. And the result is mediocre. I had to do some modifications.

    Ryzen AI processors combine traditional x86-64 CPU cores, GPU and NPU.

    Code:
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                                Ryzen AI                                     │
    │ ┌─────────────┐   ┌────────┐   ┌────────┐   ┌─────────────┐   ┌──────────┐ │
    │ │ x86-64 Cores│──►│ GPU    │──►│ XDNA2  │──►│ PC Peripherals│ │          │ │
    │ └─────────────┘   └────────┘   └────────┘   └─────────────┘   └──────────┘ │
    └─────────────────────────────────────────────────────────────────────────────┘

    Code:
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                                   NPU                                       │
    │ ┌─────────────┐   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐      │
    │ │ AI Engines  │──►│ DSP Blocks  │──►│ Memory Ctrl │──►│ I/O Ports    │      │
    │ └─────────────┘   └─────────────┘   └─────────────┘   └─────────────┘      │
    │                                                                             │
    │ ┌─────────────┐   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐      │
    │ │ Codecs      │──►│ Cache       │──►│ Data Fabric │──►│ Power Mgmt   │      │
    │ └─────────────┘   └─────────────┘   └─────────────┘   └─────────────┘      │
    └─────────────────────────────────────────────────────────────────────────────┘
    - **AI Engines**: Specialized cores designed for AI computations.
    - **DSP Blocks**: Digital Signal Processing units for handling complex mathematical functions.
    - **Memory Control**: Manages the flow of data in and out of the NPU's memory.
    - **I/O Ports**: Input/Output interfaces for external communication.
    - **Codecs**: Used for encoding and decoding digital data streams.
    - **Cache**: Temporary storage for quick data access.
    - **Data Fabric**: The network within the NPU that connects different components.
    - **Power Management**: Regulates power usage within the NPU to ensure efficiency.

    Code:
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                              AI Engine Tile                                │
    │ ┌──────────────────────┐   ┌─────────────────────┐   ┌───────────────────┐ │
    │ │ Tile Interconnect    │──►│ Memory Module       │──►│ AI Engine          │ │
    │ │ Module               │   │                     │   │                     │ │
    │ └──────────────────────┘   └─────────────────────┘   └───────────────────┘ │
    │       │                         │                         │                │
    │       │                         │                         │                │
    │       │  ┌──────────────────────┴──────────────────────┐  │                │
    │       │  │ 32 KB Data Memory divided into 8 banks      │  │                │
    │       │  │ Memory Interface, DMA, Locks                │  │                │
    │       └─►│ Access to neighboring tile memories         │  │                │
    │          └──────────────────────┬──────────────────────┘  │                │
    │                                 │                           │                │
    │                                 │  ┌──────────────────────┐ │                │
    │                                 └─►│ SIMD Vector Processor │ │                │
    │                                    │ (Optimized for ML and │ │                │
    │                                    │  signal processing)   │ │                │
    │                                    └──────────────────────┘ │                │
    │                                 │                           │                │
    │                                 │  ┌──────────────────────┐ │                │
    │                                 └─►│ 32-bit Scalar RISC   │ │                │
    │                                    │ Processor (Scalar Unit│ │                │
    │                                    └──────────────────────┘ │                │
    └─────────────────────────────────────────────────────────────────────────────┘

    - **SIMD Vector Processor**: Executes parallel vector operations for machine learning and signal processing.
    - **Scalar Processing Unit**: Handles scalar operations and control flow.
    - **Local Memory**: Provides fast access to data for the processors.
    - **Memory Interface**: Manages data transfer between local and external memory.
    - **Direct Memory Access (DMA)**: Automates data transfers without processor intervention.
    - **Interconnects**: Facilitate communication with other AI Engine tiles and system components.
    - **Control and Status Registers (CSR)**: Used for configuring and monitoring the AI Engine's state.
    - **Instruction Cache**: Stores the instructions for the processors to reduce latency.
    - **Event and Interrupt Controllers**: Manage events and interrupts for synchronization and communication.



    The XDNA architecture is derived from AMD's acquisition of Xilinx, initially mirrored the spatial dataflow design of Xilinx Versal's AI Engine processors. The transition to XDNA2 brought enhancements tailored for Generative AI.


    Code:
    ┌─────────────────────┐    ┌─────────────────────┐    ┌────────────────────────┐
    │  Xilinx Versal AI   │───►│   XDNA Architecture │───►│ XDNA2 with AI Enhancements │
    │     Core (IP)       │    │ (AMD's Integration) │    │  (Optimized for Generative AI)  │
    └─────────────────────┘    └─────────────────────┘    └────────────────────────┘

    There's XRT driver for Linux systems, with AMD promise of ongoing development to provide a comprehensive software stack (time will tell, I'm cautionary and skeptical).

    Code:
    ┌───────────────┐    ┌─────────────┐    ┌─────────┐
    │ Linux Kernel  │───►│ XRT Driver  │───►│  XDNA2  │
    └───────────────┘    └─────────────┘    └─────────┘
    The XRT driver bridges XDNA2 with Linux environments.

    NPUs can be utilized for a variety of non-AI tasks. These include digital signal processing, image and video processing, and scientific simulations where parallel processing can lead to performance gains.

    Code:
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                                   NPU                                       │
    └─────────────────────────────────────────────────────────────────────────────┘
          │                                │                             │
          ▼                                ▼                             ▼
    ┌──────────┐                   ┌─────────────────┐             ┌───────────┐
    │ Digital  │                   │ Image & Video   │             │ Scientific│
    │ Signal   │                   │ Processing      │             │ Simulations│
    │ Processing│                   └─────────────────┘             └───────────┘
    └──────────┘
    NPUs are not limited to AI tasks and can enhance performance in various non-AI applications.

    Development cycle for AI models includes training, conversion to ONNX format, quantization to INT8, compilation into `.xclbin` files, and execution on the NPU (XDNA2).

    Code:
    +─────────────+    +────────────────────+    +──────────────+    +──────────────+    +───────────+    +─────────────────────+    +──────────────+    +──────────────+
    |  Training   | ──►| Conversion to ONNX | ──►| Quantization | ──►| Compilation  | ──►| .xclbin File |───►│ Executable AI Model │───►│ Optimization │───►│ NPU Execution │
    |             |    | Format             |    | to INT8      |    | into .xclbin |    |             |    | (ONNX/Vitis AI)      |    |              |
    +─────────────+    +────────────────────+    +──────────────+    +──────────────+    +───────────+    +─────────────────────+    +──────────────+    +──────────────+

    Before deployment, AI applications undergo a crucial optimization phase through ONNX/Vitis AI, ensuring they run efficiently on the NPU.

    Critical optimization steps that AI applications undergo for peak performance on the NPU.

    The `.xclbin` file is the final, executable form of an AI model, ready to be processed by the NPU.

    The `.xclbin` file enables execution on the NPU, representing the readiness of an AI model.


    URLs
    - Ryzen AI Software GitHub Repository: https://github.com/amd/RyzenAI-SW
    - AMD Developer Resources for Ryzen AI Software: https://www.amd.com/en/developer/res...en-ai-software
    - Ryzen AI Software Documentation Portal: https://ryzenai.docs.amd.com/en/latest/
    - XRT Driver for Linux: https://github.com/amd/xdna-driver​
    Last edited by timofonic; 07 June 2024, 12:48 AM.

    Leave a comment:


  • oleid
    replied
    Originally posted by krzyzowiec View Post
    Nice, I was looking at this very same laptop. I want to go with either this Ryzen chip or the Snapdragon X Elite. Looking forward to the results.
    By the way, are or will be all Snapdragon drivers mainline, especially GPU? Is this a criteria for you?

    Leave a comment:

Working...
X