Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 18+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
Tesseract 5.0 Released For This Leading Open-Source OCR Engine
Tesseract 5.0 had been available as alpha since the end of 2020 and the Tesseract beta was released in August. On Tuesday, Tesseract 5.0.0 was officially released. Tesseract 5.0 delivers on faster performance via "fast floats" to use floats instead of doubles now for its LSTM model training and text recognition. This should lead to much faster training and OCR performance while using less system memory.
Tesseract 5.0 also has native support for Apple Silicon, build system enhancements, API improvements for its library, better ARM support, and more. There are also other code improvements besides fast floats that should further help Tesseract's OCR performance.
Tesseract development originated at HP decades ago before being open-sourced in 2005. Google took over developing this OCR engine after it was open-sourced but in 2018 they stopped contributing as much to the effort, which seems to be partly why Tesseract 5.0 took so long to materialize. Much of Tesseract's recent activity has been by Stefan Weil of the UB Mannheim.
Tesseract 5.0 downloads and more details on this big open-source OCR update via GitHub.