sophie-nguyenthuthuy/data-engineering
100+ data engineering projects from scratch — streaming, CDC, table formats, query engines, consensus, governance. 2,500+ tests, mypy strict.
GitHub repository with 46 stars and 22 forks.
Language: Python
Topics: cdc, data-engineering, delta-lake, iceberg, kafka, lsm-tree, python, query-optimizer, raft, streaming