Tesseract OCR 5.2 Engine Finds Success With AVX-512F
As time goes on more open-source projects are beginning to make better use of AVX-512 support even though it's no longer enabled in the latest Alder Lake processors. After reporting on the big AVX-512 wins for JSON parsing with simdjson, another open-source project finding gains is the Tesseract optical character recognition (OCR) engine.
Tesseract 5.2 was released on Wednesday as the newest feature release to this open-source OCR engine that has been in development going back to the 80s with HP while for the past decade and a half has been worked on as an open-source project by Google. Tesseract remains one of the leading OCR engines and with the v5.2 release may see some performance gains out of CPUs with AVX-512F support.
The AVX-512F support for Tesseract was merged earlier this year with the developer finding around a 10% run-time reduction for the particular benchmark he was using (lstm_squashed_test). Though he did note that the Apple M1 performance was still "much better" than the Intel AVX-512F enabled performance, per the pull.
Tesseract OCR 5.2 also has improvements to its build system and CI, better handling of very large PDFs on 32-bit systems, fixing Arm NEON detection on FreeBSD, and various other improvements and fixes. Tesseract OCR 5.2 details over on GitHub.
Tesseract 5.2 was released on Wednesday as the newest feature release to this open-source OCR engine that has been in development going back to the 80s with HP while for the past decade and a half has been worked on as an open-source project by Google. Tesseract remains one of the leading OCR engines and with the v5.2 release may see some performance gains out of CPUs with AVX-512F support.
The AVX-512F support for Tesseract was merged earlier this year with the developer finding around a 10% run-time reduction for the particular benchmark he was using (lstm_squashed_test). Though he did note that the Apple M1 performance was still "much better" than the Intel AVX-512F enabled performance, per the pull.
Tesseract OCR 5.2 also has improvements to its build system and CI, better handling of very large PDFs on 32-bit systems, fixing Arm NEON detection on FreeBSD, and various other improvements and fixes. Tesseract OCR 5.2 details over on GitHub.
9 Comments