itext/itext-pdfocr-java
pdfOCR is an iText add-on to recognize and extract text in scanned documents and images. It can also convert them into fully ISO-compliant PDF or PDF/A-3u files that are accessible, searchable, and suitable for archiving
GitHub repository with 44 stars and 8 forks.
Language: Java
Topics: archival, character, data, diacritic, extractable, glyphs, hindi, image, iso-compliant, ligatures