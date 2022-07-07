Tesseract OCR 5.2 Engine Finds Success With AVX-512F
Written by Michael Larabel in Intel on 7 July 2022 at 05:40 AM EDT. 3 Comments
As time goes on more open-source projects are beginning to make better use of AVX-512 support even though it's no longer enabled in the latest Alder Lake processors. After reporting on the big AVX-512 wins for JSON parsing with simdjson, another open-source project finding gains is the Tesseract optical character recognition (OCR) engine.

Tesseract 5.2 was released on Wednesday as the newest feature release to to this open-source OCR engine that has been in development going back to the 80s with HP while for the past decade and a half has been worked on as an open-source project by Google. Tesseract remains one of the leading OCR engines and with the v5.2 release may see some performance gains out of CPUs with AVX-512F support.


The AVX-512F support for Tesseract was merged earlier this year with the developer finding around a 10% run-time reduction for the particular benchmark he was using (lstm_squashed_test). Though he did note that the Apple M1 performance was still "much better" than the Intel AVX-512F enabled performance, per the pull.

Tesseract OCR 5.2 also has improvements to its build system and CI, better handling of very large PDFs on 32-bit systems, fixing Arm NEON detection on FreeBSD, and various other improvements and fixes. Tesseract OCR 5.2 details over on GitHub.
