sophie-nguyenthuthuy/data-engineering

100+ data engineering projects from scratch — streaming, CDC, table formats, query engines, consensus, governance. 2,500+ tests, mypy strict.

GitHub repository with 46 stars and 22 forks.

Language: Python

Topics: cdc, data-engineering, delta-lake, iceberg, kafka, lsm-tree, python, query-optimizer, raft, streaming

Open provider repository

24h trend summary

Trending score 0.71, freshness score 0.59, stars gained +3, forks gained +0.

Latest metric snapshot

2026-06-15: 46 stars and 22 forks.

Similar repositories

  1. 1. sophie-nguyenthuthuy/data-engineering

    100+ data engineering projects from scratch — streaming, CDC, table formats, query engines, consensus, governance. 2,500+ tests, mypy strict.

    GitHub repository with 46 stars and 22 forks.

    Trending score: 0.71; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: cdc, data-engineering, delta-lake, iceberg, kafka, lsm-tree

  2. 2. Casheu1/perplexity-2api-python

    🔍 Connect Perplexity's powerful search capabilities easily to your AI applications with this versatile Python API package.

    GitHub repository with 11 stars and 2 forks.

    Trending score: 0.56; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: cdc, chatbot, coronavirus-tracking, discord, discord-server, disease

  3. 3. confluentinc/agent-skills

    AI agent skills for stream processing and event streaming

    GitHub repository with 36 stars and 2 forks.

    Trending score: 0.36; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai, confluent, flink, kafka, skills, cdc

  4. 4. Frostbound-northsea978/api2cursor

    Enable Cursor to use any LLM model API by converting and forwarding requests between Cursor formats and third-party proxy protocols.

    GitHub repository with 7 stars and 1 forks.

    Trending score: 0.24; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: actions, blockchain, capture, cdc, coronavirus, covid

Trending in Python

  1. 1. harry0703/MoneyPrinterTurbo

    利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

    GitHub repository with 88,031 stars and 12,625 forks.

    Trending score: 6.02; stars gained: +1,097; forks gained: +218.

    Language: Python

    Topics: ai, automation, chatgpt, moviepy, python, shortvideo

  2. 2. pewdiepie-archdaemon/odysseus

    Self-hosted AI workspace.

    GitHub repository with 71,420 stars and 9,105 forks.

    Trending score: 5.98; stars gained: +834; forks gained: +140.

    Language: Python

  3. 3. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 194,087 stars and 33,984 forks.

    Trending score: 5.92; stars gained: +753; forks gained: +209.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  4. 4. NVIDIA/SkillSpector

    Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

    GitHub repository with 5,654 stars and 427 forks.

    Trending score: 5.61; stars gained: +874; forks gained: +76.

    Language: Python

  5. 5. rohitg00/ai-engineering-from-scratch

    Learn it. Build it. Ship it for others.

    GitHub repository with 32,676 stars and 5,366 forks.

    Trending score: 5.59; stars gained: +762; forks gained: +135.

    Language: Python

    Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

  6. 6. Agents365-ai/drawio-skill

    Generate draw.io diagrams from natural language — 6 presets, vision self-check + up to 5-round refinement, codebase-to-diagram, 10,000+ official shapes & 321 AI/LLM brand logos. Exports PNG/SVG/PDF/JPG.

    GitHub repository with 3,445 stars and 240 forks.

    Trending score: 5.51; stars gained: +1,369; forks gained: +113.

    Language: Python

    Topics: agent-skill, agent-skills, architecture-diagram, claude-code, claude-code-skill, claude-skills

Trending topic: cdc

  1. 1. debezium/debezium

    Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.

    GitHub repository with 12,820 stars and 2,954 forks.

    Trending score: 1.81; stars gained: +7; forks gained: +0.

    Language: Java

    Topics: apache-kafka, cdc, change-data-capture, data-pipeline, database, debezium

  2. 2. supabase/realtime

    Broadcast, Presence, and Postgres Changes via WebSockets

    GitHub repository with 7,580 stars and 441 forks.

    Trending score: 0.88; stars gained: +1; forks gained: -1.

    Language: Elixir

    Topics: elixir, postgres, postgresql, realtime, phoenix, phoenix-framework

  3. 3. PeerDB-io/peerdb

    Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

    GitHub repository with 3,144 stars and 191 forks.

    Trending score: 0.85; stars gained: +1; forks gained: +0.

    Language: Go

    Topics: bigquery, cdc, clickhouse, cloud-native, distributed-systems, etl

  4. 4. apache/seatunnel

    SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

    GitHub repository with 9,380 stars and 2,269 forks.

    Trending score: 0.81; stars gained: -1; forks gained: +1.

    Language: Java

    Topics: data-integration, high-performance, offline, real-time, apache, batch

  5. 5. sophie-nguyenthuthuy/data-engineering

    100+ data engineering projects from scratch — streaming, CDC, table formats, query engines, consensus, governance. 2,500+ tests, mypy strict.

    GitHub repository with 46 stars and 22 forks.

    Trending score: 0.71; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: cdc, data-engineering, delta-lake, iceberg, kafka, lsm-tree

  6. 6. Casheu1/perplexity-2api-python

    🔍 Connect Perplexity's powerful search capabilities easily to your AI applications with this versatile Python API package.

    GitHub repository with 11 stars and 2 forks.

    Trending score: 0.56; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: cdc, chatbot, coronavirus-tracking, discord, discord-server, disease