moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
GitHub repository with 2,207 stars and 240 forks.
Language: Python
Topics: data-matching, data-science, deduplicate-data, deduplication, duckdb, em-algorithm, entity-resolution, fuzzy-matching, record-linkage, spark