apache/spark

Apache Spark - A unified analytics engine for large-scale data processing

GitHub repository with 43,397 stars and 29,213 forks.

Language: Scala

Topics: big-data, java, jdbc, python, r, scala, spark, sql

Open provider repository

Latest metric snapshot

2026-06-05: 43,397 stars and 29,213 forks.

Similar repositories

  1. 1. delta-io/delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

    GitHub repository with 8,835 stars and 2,107 forks.

    Trending score: 0.60; stars gained: +3; forks gained: -1.

    Language: Scala

    Topics: acid, analytics, big-data, delta-lake, spark

  2. 2. NVIDIA/spark-rapids

    Spark RAPIDS plugin - accelerate Apache Spark with GPUs

    GitHub repository with 977 stars and 283 forks.

    Trending score: 0.36; stars gained: +0; forks gained: +0.

    Language: Scala

    Topics: big-data, gpu, rapids, spark

Trending in Scala

  1. 1. lichess-org/lila

    ♞ lichess.org: the forever free, adless and open source chess server ♞

    GitHub repository with 18,311 stars and 2,682 forks.

    Trending score: 1.10; stars gained: +7; forks gained: +2.

    Language: Scala

    Topics: scala, chess, play-framework, non-profit, functional-programming, type-safe

  2. 2. delta-io/delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

    GitHub repository with 8,835 stars and 2,107 forks.

    Trending score: 0.60; stars gained: +3; forks gained: -1.

    Language: Scala

    Topics: acid, analytics, big-data, delta-lake, spark

  3. 3. playframework/playframework

    The Community Maintained High Velocity Web Framework For Java and Scala.

    GitHub repository with 12,621 stars and 4,031 forks.

    Trending score: 0.47; stars gained: +1; forks gained: +0.

    Language: Scala

    Topics: scala, java, reactive, web-framework, restful, play

  4. 4. unum-io/tyda

    Typed Dataset api for Scala 3

    GitHub repository with 6 stars and 4 forks.

    Trending score: 0.42; stars gained: +1; forks gained: +2.

    Language: Scala

  5. 5. softwaremill/chimp

    Build type-safe, boilerplate-less MCP servers and clients in Scala

    GitHub repository with 82 stars and 7 forks.

    Trending score: 0.42; stars gained: +1; forks gained: +0.

    Language: Scala

  6. 6. starlake-ai/quack-on-demand

    Production-grade Arrow FlightSQL gateway in front of DuckDB Quack + DuckLake. Multi-tenant pools, pluggable auth (DB/JWT/OIDC), table-level ACLs, role-aware routing, and a live admin console

    GitHub repository with 21 stars and 1 forks.

    Trending score: 0.39; stars gained: +1; forks gained: +0.

    Language: Scala

Trending topic: big-data

  1. 1. ClickHouse/ClickHouse

    ClickHouse® is a real-time analytics database management system

    GitHub repository with 47,823 stars and 8,467 forks.

    Trending score: 2.96; stars gained: +53; forks gained: +10.

    Language: C++

    Topics: ai, analytics, big-data, clickhouse, cloud-native, cpp

  2. 2. StarRocks/starrocks

    The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

    GitHub repository with 11,756 stars and 2,435 forks.

    Trending score: 2.61; stars gained: +10; forks gained: +1.

    Language: Java

    Topics: analytics, big-data, cloudnative, database, datalake, delta-lake

  3. 3. apache/beam

    Apache Beam is a unified programming model for Batch and Streaming data processing.

    GitHub repository with 8,605 stars and 4,577 forks.

    Trending score: 2.18; stars gained: +5; forks gained: +4.

    Language: Java

    Topics: python, java, big-data, beam, batch, golang

  4. 4. apache/datafusion

    Apache DataFusion SQL Query Engine

    GitHub repository with 8,848 stars and 2,153 forks.

    Trending score: 2.07; stars gained: +6; forks gained: +3.

    Language: Rust

    Topics: arrow, big-data, dataframe, datafusion, olap, python

  5. 5. Eventual-Inc/Daft

    High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

    GitHub repository with 5,546 stars and 483 forks.

    Trending score: 1.26; stars gained: +2; forks gained: +1.

    Language: Rust

    Topics: machine-learning, python, data-engineering, distributed-computing, rust, big-data

  6. 6. vespa-engine/vespa

    The AI search platform

    GitHub repository with 6,946 stars and 717 forks.

    Trending score: 1.18; stars gained: +8; forks gained: +0.

    Language: Java

    Topics: vespa, search-engine, big-data, ai, serving-recommendation, machine-learning