datachain-ai/datachain
The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
GitHub repository with 2,777 stars and 145 forks.
Language: Python
Topics: claude-code, codex, data-processing, harness-engineering, mlops, unstructured-data, data-context-layer, ai-agents, knowledge-base, multimodal