A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
GitHub repository with 8,443 stars and 497 forks.
Trending score: 1.04; stars gained: +11; forks gained: +0.
Language: Rust
Topics: text-extraction, document-intelligence, metadata-extraction, pdf-extraction, pdfium, python