GovHub-br/data-application-gov-hub
Pipeline de Dados do Gov-Hub
GitHub repository with 21 stars and 30 forks.
Language: Python
Topics: data-engineering, datascience, gov, government, government-data
Pipeline de Dados do Gov-Hub
GitHub repository with 21 stars and 30 forks.
Language: Python
Topics: data-engineering, datascience, gov, government, government-data
2026-06-15: 21 stars and 30 forks.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
GitHub repository with 45,814 stars and 17,240 forks.
Trending score: 2.84; stars gained: +16; forks gained: +8.
Language: Python
Topics: airflow, apache, apache-airflow, automation, dag, data-engineering
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.
GitHub repository with 21,457 stars and 5,220 forks.
Trending score: 2.29; stars gained: +9; forks gained: +0.
Language: Python
Topics: data, pipeline, data-analysis, data-engineering, java, python
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
GitHub repository with 6,656 stars and 631 forks.
Trending score: 1.90; stars gained: +8; forks gained: +1.
Language: Python
Topics: ai-agents, data-engineering, document-ai, generative-ai, idp, json-extraction
Zero-config entity resolution. The zero-tuning Fellegi-Sunter path beats hand-rolled Splink head-to-head; scales from a CSV to a verified 100M-row dedupe in 9.2 min on Ray. Fuzzy/exact/probabilistic + PPRL + LLM, identity graph. Python + edge-safe TypeScript (optional WASM), SQL-native in Postgres & DuckDB, MCP/REST + dbt/Airflow.
GitHub repository with 110 stars and 10 forks.
Trending score: 1.07; stars gained: +1; forks gained: +0.
Language: Python
Topics: active-learning, agent, airflow, auto-config, data-engineering, data-quality
100+ data engineering projects from scratch — streaming, CDC, table formats, query engines, consensus, governance. 2,500+ tests, mypy strict.
GitHub repository with 46 stars and 22 forks.
Trending score: 0.71; stars gained: +3; forks gained: +0.
Language: Python
Topics: cdc, data-engineering, delta-lake, iceberg, kafka, lsm-tree
A discipline harness for AI-assisted analytics: agent skills for every moment a number gets built, broken, or trusted — requirements, definitions, audits, triage, migrations, dashboards, briefs — every claim carrying its provenance in one living knowledge base.
GitHub repository with 8 stars and 0 forks.
Trending score: 0.58; stars gained: +2; forks gained: +0.
Language: Python
Topics: ai-agents, analytics, business-intelligence, claude-code, claude-code-plugin, data-engineering
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 27,902 stars and 1,891 forks.
Trending score: 6.49; stars gained: +2,776; forks gained: +250.
Language: Python
Topics: agent, ai, anthropic, claude-code, compression, context-engineering
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
GitHub repository with 87,926 stars and 12,612 forks.
Trending score: 6.02; stars gained: +1,097; forks gained: +218.
Language: Python
Topics: ai, automation, chatgpt, moviepy, python, shortvideo
Self-hosted AI workspace.
GitHub repository with 71,291 stars and 9,086 forks.
Trending score: 5.98; stars gained: +834; forks gained: +140.
Language: Python
The agent that grows with you
GitHub repository with 193,883 stars and 33,934 forks.
Trending score: 5.92; stars gained: +753; forks gained: +209.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.
GitHub repository with 5,654 stars and 427 forks.
Trending score: 5.61; stars gained: +874; forks gained: +76.
Language: Python
Learn it. Build it. Ship it for others.
GitHub repository with 32,527 stars and 5,342 forks.
Trending score: 5.59; stars gained: +762; forks gained: +135.
Language: Python
Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, or other AI agents to query data accurately and with full context of your company
GitHub repository with 1,204 stars and 64 forks.
Trending score: 3.00; stars gained: +21; forks gained: +2.
Language: TypeScript
Topics: agent, agent-skills, agents, ai-agent, ai-agents, analytics
Modern SeaTunnel Web UI with visual DAG pipelines, batch & streaming sync, connector management, built-in metrics, and runtime logs.
GitHub repository with 526 stars and 51 forks.
Trending score: 2.92; stars gained: +22; forks gained: +3.
Language: TypeScript
Topics: batch, dag, data-engineering, data-integration, data-pipeline, etl
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
GitHub repository with 45,814 stars and 17,240 forks.
Trending score: 2.84; stars gained: +16; forks gained: +8.
Language: Python
Topics: airflow, apache, apache-airflow, automation, dag, data-engineering
Apache Superset is a Data Visualization and Data Exploration Platform
GitHub repository with 73,296 stars and 17,613 forks.
Trending score: 2.83; stars gained: +16; forks gained: +7.
Language: TypeScript
Topics: analytics, apache, apache-superset, asf, bi, business-analytics
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
GitHub repository with 2,955 stars and 172 forks.
Trending score: 2.70; stars gained: +21; forks gained: +0.
Language: Rust
Topics: apache-iceberg, apache-spark, arrow, artificial-intelligence, big-data, data-engineering
Event Driven Orchestration & Scheduling Platform for Mission Critical Applications
GitHub repository with 27,058 stars and 2,621 forks.
Trending score: 2.43; stars gained: +12; forks gained: +6.
Language: Java
Topics: ai-agents, automation, control-plane, data-engineering, data-orchestration, data-orchestrator