apache/ozone
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
GitHub repository with 1,215 stars and 608 forks.
Language: Java
Topics: big-data, hadoop, kubernetes, object-store, s3, storage
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
GitHub repository with 1,215 stars and 608 forks.
Language: Java
Topics: big-data, hadoop, kubernetes, object-store, s3, storage
2026-06-04: 1,215 stars and 608 forks.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
GitHub repository with 11,753 stars and 2,435 forks.
Trending score: 1.36; stars gained: +2; forks gained: +1.
Language: Java
Topics: database, olap, sql, analytics, big-data, realtime-database
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
GitHub repository with 3,290 stars and 1,328 forks.
Trending score: 1.26; stars gained: +3; forks gained: +1.
Language: Java
Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark
Apache IoTDB
GitHub repository with 6,340 stars and 1,140 forks.
Trending score: 0.52; stars gained: +1; forks gained: +3.
Language: Java
Topics: timeseries, iot, big-data, java, database, nosql
The AI search platform
GitHub repository with 6,944 stars and 717 forks.
Trending score: 0.32; stars gained: +1; forks gained: +0.
Language: Java
Topics: vespa, search-engine, big-data, ai, serving-recommendation, machine-learning
Mirror of Apache Helix
GitHub repository with 498 stars and 250 forks.
Trending score: 0.32; stars gained: +0; forks gained: +0.
Language: Java
Topics: helix, java, big-data, cloud
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
GitHub repository with 361 stars and 233 forks.
Trending score: 0.19; stars gained: +0; forks gained: +0.
Language: Java
Topics: big-data, deployment, cloud, java, hadoop, cloudera
GitHub repository with 724 stars and 105 forks.
Trending score: 2.91; stars gained: +47; forks gained: +5.
Language: Java
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
GitHub repository with 23,671 stars and 2,206 forks.
Trending score: 2.84; stars gained: +700; forks gained: +69.
Language: Java
Topics: json, markdown, pdf, ai, document-parsing, html
WebHomeTV 基于FongMi二次开发,增强了 WebHome 自定义首页、App Native SDK、网盘链接检测 和 Nostr推荐首页。 这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用:开发者可以用 HTML/CSS/JavaScript 定制首页,再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。
GitHub repository with 353 stars and 105 forks.
Trending score: 2.73; stars gained: +30; forks gained: +5.
Language: Java
AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.
GitHub repository with 975 stars and 57 forks.
Trending score: 2.63; stars gained: +20; forks gained: +5.
Language: Java
Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq
Apache Kafka - A distributed event streaming platform
GitHub repository with 32,712 stars and 15,248 forks.
Trending score: 2.24; stars gained: +6; forks gained: +1.
Language: Java
Topics: scala, kafka, java, streaming
Apache Doris is an easy-to-use, high performance and unified analytics database.
GitHub repository with 15,433 stars and 3,811 forks.
Trending score: 2.23; stars gained: +5; forks gained: +0.
Language: Java
Topics: olap, database, hudi, iceberg, real-time, sql
ClickHouse® is a real-time analytics database management system
GitHub repository with 47,815 stars and 8,467 forks.
Trending score: 2.61; stars gained: +24; forks gained: +4.
Language: C++
Topics: ai, analytics, big-data, clickhouse, cloud-native, cpp
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
GitHub repository with 11,753 stars and 2,435 forks.
Trending score: 1.36; stars gained: +2; forks gained: +1.
Language: Java
Topics: database, olap, sql, analytics, big-data, realtime-database
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
GitHub repository with 3,290 stars and 1,328 forks.
Trending score: 1.26; stars gained: +3; forks gained: +1.
Language: Java
Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark
Apache Spark - A unified analytics engine for large-scale data processing
GitHub repository with 43,395 stars and 29,214 forks.
Trending score: 0.76; stars gained: +5; forks gained: -1.
Language: Scala
Topics: big-data, java, jdbc, python, r, scala
YTsaurus is a scalable and fault-tolerant open-source big data platform.
GitHub repository with 2,195 stars and 205 forks.
Trending score: 0.65; stars gained: +1; forks gained: +0.
Language: C++
Topics: big-data, clickhouse, distributed-database, lakehouse, olap-database, spark
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
GitHub repository with 8,835 stars and 2,107 forks.
Trending score: 0.60; stars gained: +3; forks gained: +0.
Language: Scala
Topics: acid, analytics, big-data, delta-lake, spark