apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

GitHub repository with 1,215 stars and 608 forks.

Language: Java

Topics: big-data, hadoop, kubernetes, object-store, s3, storage

Open provider repository

Latest metric snapshot

2026-06-04: 1,215 stars and 608 forks.

Similar repositories

  1. 1. StarRocks/starrocks

    The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

    GitHub repository with 11,753 stars and 2,435 forks.

    Trending score: 1.36; stars gained: +2; forks gained: +1.

    Language: Java

    Topics: database, olap, sql, analytics, big-data, realtime-database

  2. 2. apache/paimon

    Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

    GitHub repository with 3,290 stars and 1,328 forks.

    Trending score: 1.26; stars gained: +3; forks gained: +1.

    Language: Java

    Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark

  3. 3. apache/iotdb

    Apache IoTDB

    GitHub repository with 6,340 stars and 1,140 forks.

    Trending score: 0.52; stars gained: +1; forks gained: +3.

    Language: Java

    Topics: timeseries, iot, big-data, java, database, nosql

  4. 4. vespa-engine/vespa

    The AI search platform

    GitHub repository with 6,944 stars and 717 forks.

    Trending score: 0.32; stars gained: +1; forks gained: +0.

    Language: Java

    Topics: vespa, search-engine, big-data, ai, serving-recommendation, machine-learning

  5. 5. apache/helix

    Mirror of Apache Helix

    GitHub repository with 498 stars and 250 forks.

    Trending score: 0.32; stars gained: +0; forks gained: +0.

    Language: Java

    Topics: helix, java, big-data, cloud

  6. 6. hortonworks/cloudbreak

    CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

    GitHub repository with 361 stars and 233 forks.

    Trending score: 0.19; stars gained: +0; forks gained: +0.

    Language: Java

    Topics: big-data, deployment, cloud, java, hadoop, cloudera

Trending in Java

  1. 1. Lucas0623z/NoteLite

    GitHub repository with 724 stars and 105 forks.

    Trending score: 2.91; stars gained: +47; forks gained: +5.

    Language: Java

  2. 2. opendataloader-project/opendataloader-pdf

    PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

    GitHub repository with 23,671 stars and 2,206 forks.

    Trending score: 2.84; stars gained: +700; forks gained: +69.

    Language: Java

    Topics: json, markdown, pdf, ai, document-parsing, html

  3. 3. fish2018/webhtv

    WebHomeTV 基于FongMi二次开发,增强了 WebHome 自定义首页、App Native SDK、网盘链接检测 和 Nostr推荐首页。 这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用:开发者可以用 HTML/CSS/JavaScript 定制首页,再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。

    GitHub repository with 353 stars and 105 forks.

    Trending score: 2.73; stars gained: +30; forks gained: +5.

    Language: Java

  4. 4. juanjuandog/FinSight-AI

    AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

    GitHub repository with 975 stars and 57 forks.

    Trending score: 2.63; stars gained: +20; forks gained: +5.

    Language: Java

    Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq

  5. 5. apache/kafka

    Apache Kafka - A distributed event streaming platform

    GitHub repository with 32,712 stars and 15,248 forks.

    Trending score: 2.24; stars gained: +6; forks gained: +1.

    Language: Java

    Topics: scala, kafka, java, streaming

  6. 6. apache/doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    GitHub repository with 15,433 stars and 3,811 forks.

    Trending score: 2.23; stars gained: +5; forks gained: +0.

    Language: Java

    Topics: olap, database, hudi, iceberg, real-time, sql

Trending topic: big-data

  1. 1. ClickHouse/ClickHouse

    ClickHouse® is a real-time analytics database management system

    GitHub repository with 47,815 stars and 8,467 forks.

    Trending score: 2.61; stars gained: +24; forks gained: +4.

    Language: C++

    Topics: ai, analytics, big-data, clickhouse, cloud-native, cpp

  2. 2. StarRocks/starrocks

    The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

    GitHub repository with 11,753 stars and 2,435 forks.

    Trending score: 1.36; stars gained: +2; forks gained: +1.

    Language: Java

    Topics: database, olap, sql, analytics, big-data, realtime-database

  3. 3. apache/paimon

    Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

    GitHub repository with 3,290 stars and 1,328 forks.

    Trending score: 1.26; stars gained: +3; forks gained: +1.

    Language: Java

    Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark

  4. 4. apache/spark

    Apache Spark - A unified analytics engine for large-scale data processing

    GitHub repository with 43,395 stars and 29,214 forks.

    Trending score: 0.76; stars gained: +5; forks gained: -1.

    Language: Scala

    Topics: big-data, java, jdbc, python, r, scala

  5. 5. ytsaurus/ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

    GitHub repository with 2,195 stars and 205 forks.

    Trending score: 0.65; stars gained: +1; forks gained: +0.

    Language: C++

    Topics: big-data, clickhouse, distributed-database, lakehouse, olap-database, spark

  6. 6. delta-io/delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

    GitHub repository with 8,835 stars and 2,107 forks.

    Trending score: 0.60; stars gained: +3; forks gained: +0.

    Language: Scala

    Topics: acid, analytics, big-data, delta-lake, spark