Tesseract OCR 5.2 Engine Finds Success With AVX-512F

Written by Michael Larabel in Intel on 7 July 2022 at 05:40 AM EDT. 9 Comments
INTEL --
As time goes on more open-source projects are beginning to make better use of AVX-512 support even though it's no longer enabled in the latest Alder Lake processors. After reporting on the big AVX-512 wins for JSON parsing with simdjson, another open-source project finding gains is the Tesseract optical character recognition (OCR) engine.

Tesseract 5.2 was released on Wednesday as the newest feature release to this open-source OCR engine that has been in development going back to the 80s with HP while for the past decade and a half has been worked on as an open-source project by Google. Tesseract remains one of the leading OCR engines and with the v5.2 release may see some performance gains out of CPUs with AVX-512F support.


The AVX-512F support for Tesseract was merged earlier this year with the developer finding around a 10% run-time reduction for the particular benchmark he was using (lstm_squashed_test). Though he did note that the Apple M1 performance was still "much better" than the Intel AVX-512F enabled performance, per the pull.

Tesseract OCR 5.2 also has improvements to its build system and CI, better handling of very large PDFs on 32-bit systems, fixing Arm NEON detection on FreeBSD, and various other improvements and fixes. Tesseract OCR 5.2 details over on GitHub.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week