Announcement

Collapse
No announcement yet.

Intel Updates Its PyTorch Build With More Large Language Model Optimizations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Updates Its PyTorch Build With More Large Language Model Optimizations

    Phoronix: Intel Updates Its PyTorch Build With More Large Language Model Optimizations

    Intel has released their Intel Extension for PyTorch v2.3 to succeed their earlier v2.1 derived extension. With this updated extension targeting PyTorch 2.3, Intel is rolling out more optimizations around Large Language Models (LLMs)...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Well that makes Llama.cpp way more exciting on Intel!

    Comment


    • #3
      Originally posted by Eirikr1848 View Post
      Well that makes Llama.cpp way more exciting on Intel!
      i only started playing around with LLMs locally just recently, with llama.cpp. you can use this intel thing with llama.cpp? how? it says its for pytorch

      Comment


      • #4
        Do the downstream ipex extensions for pytorch (the CPU version, the 2.3 ipex release from this week is cpu only) add any benefit to 12-14 gen desktop CPUs from intel when compared to standard upstream pytorch?

        I do not believe I have seen any difference, but I am wondering if others have tried and got different results.
        I am aware that AMX and AVX-512 are not there in consumer grade cpus, but I was hoping that some avx2 optimisations would help. Anyway I seem to get 1.8-2.3 iter/sec on a toy stable diffusion model from huggingface on i9-14900k regardless if I use stock pytorch (CPU) or intel's version. I guess avx2 is not used or is already upstreamed.

        Another thing I notice is that pytorch does not use all CPU cores. On i9-12900k it used 16 (24 thread CPU) whole on i9-14900k it uses 24 (32 thread CPU), so it seems to leave 8 CPU cores (thread) on the table.

        Btw, arc a770 using ipex xpu extensions is reaching 6-7iter/s on the same model hence is roughly x3 faster than the CPU.

        Comment

        Working...
        X