opendata-lab/opendataworks
opendataworks 是一个面向大数据平台的统一数据门户系统,基于DolphinScheduler、Doris等开源项目,旨在为企业提供一站式的数据资产管理、任务调度编排和血缘关系追踪解决方案。
GitHub repository with 39 stars and 13 forks.
Language: Java
Topics: data-engineering, datax, dolphinscheduler, doris
opendataworks 是一个面向大数据平台的统一数据门户系统,基于DolphinScheduler、Doris等开源项目,旨在为企业提供一站式的数据资产管理、任务调度编排和血缘关系追踪解决方案。
GitHub repository with 39 stars and 13 forks.
Language: Java
Topics: data-engineering, datax, dolphinscheduler, doris
2026-06-05: 39 stars and 13 forks.
Agentic Data Engineering Harness for building data pipelines, data products, data APIs, and data lakes autonomously
GitHub repository with 213 stars and 25 forks.
Trending score: 0.09; stars gained: +0; forks gained: +0.
Language: Java
Topics: streaming, data-pipeline, api, event-driven, data-engineering, harness
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
GitHub repository with 1,408 stars and 139 forks.
Trending score: 0.04; stars gained: +0; forks gained: -1.
Language: Java
Topics: alerting, bigdata, data-catalog, data-discovery, data-engineering, data-exploration
WebHomeTV 基于FongMi二次开发,增强了 WebHome 自定义首页、App Native SDK、网盘链接检测 和 Nostr推荐首页。 这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用:开发者可以用 HTML/CSS/JavaScript 定制首页,再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。
GitHub repository with 385 stars and 108 forks.
Trending score: 3.29; stars gained: +83; forks gained: +16.
Language: Java
AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.
GitHub repository with 1,003 stars and 58 forks.
Trending score: 3.24; stars gained: +77; forks gained: +1.
Language: Java
Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq
Apache Kafka - A distributed event streaming platform
GitHub repository with 32,714 stars and 15,251 forks.
Trending score: 2.36; stars gained: +10; forks gained: +7.
Language: Java
Topics: scala, kafka, java, streaming
An Application Framework for AI Engineering
GitHub repository with 8,869 stars and 2,616 forks.
Trending score: 2.31; stars gained: +22; forks gained: +5.
Language: Java
Topics: artificial-intelligence, hacktoberfest, java, spring-ai
IntelliJ IDEA & IntelliJ Platform
GitHub repository with 20,185 stars and 5,924 forks.
Trending score: 2.29; stars gained: +16; forks gained: +4.
Language: Java
Topics: code-editor, ide, intellij, intellij-community, intellij-platform
Apache Beam is a unified programming model for Batch and Streaming data processing.
GitHub repository with 8,605 stars and 4,576 forks.
Trending score: 2.18; stars gained: +5; forks gained: +4.
Language: Java
Topics: batch, beam, big-data, golang, java, python
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer
GitHub repository with 895 stars and 46 forks.
Trending score: 2.76; stars gained: +24; forks gained: +1.
Language: TypeScript
Topics: agent, agent-skills, agents, ai-agent, ai-agents, analytics
Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer that compiles to SQL and runs on DuckDB. Tiny desktop app, no servers, git-friendly workspaces.
GitHub repository with 284 stars and 20 forks.
Trending score: 2.64; stars gained: +36; forks gained: +0.
Language: Rust
Topics: data-engineering, data-integration, data-pipeline, data-quality, desktop-app, drag-and-drop
SeaTunnel Web is a visual tool for building and watching over your Apache SeaTunnel data pipelines, with drag-and-drop DAGs and simple connector setup.
GitHub repository with 290 stars and 30 forks.
Trending score: 1.85; stars gained: +57; forks gained: +5.
Language: TypeScript
Topics: batch, dag, data-engineering, data-integration, data-pipeline, etl
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
GitHub repository with 9,065 stars and 776 forks.
Trending score: 1.62; stars gained: +2; forks gained: +0.
Language: Rust
Topics: apache-iceberg, data-engineering, database, etl-pipeline, event-streaming, kafka
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
GitHub repository with 41,902 stars and 8,303 forks.
Trending score: 1.49; stars gained: +23; forks gained: +3.
Language: Jupyter Notebook
Topics: course, data-engineering, dbt, docker, free, kafka
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
GitHub repository with 5,436 stars and 523 forks.
Trending score: 0.98; stars gained: +8; forks gained: +0.
Language: Python
Topics: data, data-engineering, data-lake, data-loading, data-warehouse, elt