Intel Releases OpenVINO 2024.2 With Llama 3 Optimizations, More AVX2 & AVX-512 Optimizations
Intel today released OpenVINO 2024.2, the newest version of its open-source AI toolkit for optimizing and deploying deep learning (A) inference models across a range of AI frameworks and broad hardware types.
With OpenVINO 2024.2 they have continued optimizing for Meta's Llama 3 large language model. OpenVINO 2024.2 brings more Llama 3 optimizations for execution across CPUs, integrated GPUs, and discrete GPUs to further enhance performance while yielding more efficient memory use too.
OpenVINO 2024.2 also adds support for Phi-3-mini AI models, broader large language model support, support for Intel Atom Processor X Series, preview support for Intel Xeon 6 processors, and more AVX2/AVX-512 tuning. Intel is seeing a "significant improvement" in second token latency and memory footprint of FP16 weight LLMs for AVX2 on Intel Core CPus and then AVX-512 with Intel Xeon processors when leveraging small batch sizes.
Downloads and more details on the OpenVINO 2024.2 release via GitHub.
With OpenVINO 2024.2 they have continued optimizing for Meta's Llama 3 large language model. OpenVINO 2024.2 brings more Llama 3 optimizations for execution across CPUs, integrated GPUs, and discrete GPUs to further enhance performance while yielding more efficient memory use too.
OpenVINO 2024.2 also adds support for Phi-3-mini AI models, broader large language model support, support for Intel Atom Processor X Series, preview support for Intel Xeon 6 processors, and more AVX2/AVX-512 tuning. Intel is seeing a "significant improvement" in second token latency and memory footprint of FP16 weight LLMs for AVX2 on Intel Core CPus and then AVX-512 with Intel Xeon processors when leveraging small batch sizes.
Downloads and more details on the OpenVINO 2024.2 release via GitHub.
Add A Comment