Tesseract 5.0 OCR Engine Bringing Faster Performance With "Fast Floats"

Written by Michael Larabel in Free Software on 16 August 2021 at 01:58 PM EDT. 28 Comments

Tesseract as the leading open-source optical character recognition (OCR) engine that employs neural networks for converting images/scans of text into actual recognized text is nearing its 5.0 release.

The Tesseract 5.0 Alpha has been available since the end of last year while marked this weekend was the first beta of Tesseract 5.0. Earlier Tesseract 5.0 Alpha releases have brought improved performance, support for Apple Silicon, build system improvements, an overhaul to its public API, and a lot of code improvements.

Yesterday's Tesseract 5.0 Beta brought more code modernization work, improved ARM NEON usage, and more.

Arguably most exciting with Tesseract 5.0 Beta is support for using floats for LSTM model training and text recognition. Traditionally the Tesseract OCR engine has relied upon doubles but when enabling the new "fast float" option at build time, floats can be used instead. In turn the hope is this will lead to faster training and OCR performance while also requiring less system memory than earlier versions of Tesseract or when building Tesseract without fast-float enabled.

Tests by Tesseract developers found the fast float mode is yielding dot product operations to be about 50% faster while other operations should also benefit from this new mode in Tesseract 5.0. There are also more fast float optimizations pending, including around AVX/AVX-512.

More details on the Tesseract 5.0 Beta release via GitHub.

28 Comments