kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.
GitHub repository with 767 stars and 58 forks.
Language: HTML
Topics: hocr, html, html-converter, markdown, markdown-converter, rag, text-extraction, text-processing