opendatalab/MinerU
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
GitHub repository with 66,521 stars and 5,608 forks.
Language: Python
Topics: extract-data, layout-analysis, ocr, parser, pdf, pdf-converter, python, document-analysis, pdf-parser, pdf-extractor-llm