Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.
GitHub repository with 79 stars and 10 forks.
Trending score: 0.94; stars gained: +8; forks gained: +1.
Language: Python
Topics: data-engineering, data-quality, deduplication, entity-resolution, fuzzy-matching, llm