zaratsian/Spark

Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References

GitHub repository with 69 stars and 37 forks.

Language: Jupyter Notebook

Topics: spark, pyspark, nlp, machine-learning, text-analysis

Open provider repository

24h trend summary

Trending score 0.03, activity score 0.17, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-02: 69 stars and 37 forks.

Similar repositories

  1. 1. DataTalksClub/data-engineering-zoomcamp

    Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

    GitHub repository with 41,902 stars and 8,303 forks.

    Trending score: 1.49; stars gained: +23; forks gained: +3.

    Language: Jupyter Notebook

    Topics: course, data-engineering, dbt, docker, free, kafka

  2. 2. zaratsian/Spark

    Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References

    GitHub repository with 69 stars and 37 forks.

    Trending score: 0.03; stars gained: +0; forks gained: +0.

    Language: Jupyter Notebook

    Topics: spark, pyspark, nlp, machine-learning, text-analysis

  3. 3. phelps-sg/python-bigdata

    Data science and Big Data with Python

    GitHub repository with 135 stars and 164 forks.

    Trending score: 0.00; stars gained: +0; forks gained: +0.

    Language: Jupyter Notebook

    Topics: data-science, python, hbase, numpy, numerical-methods, notebook-jupyter

Trending in Jupyter Notebook

  1. 1. NVIDIA/cosmos

    NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

    GitHub repository with 9,194 stars and 588 forks.

    Trending score: 2.37; stars gained: +326; forks gained: +20.

    Language: Jupyter Notebook

  2. 2. GoogleCloudPlatform/generative-ai

    Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform

    GitHub repository with 16,980 stars and 4,249 forks.

    Trending score: 1.87; stars gained: +8; forks gained: +6.

    Language: Jupyter Notebook

    Topics: generative-ai, llm, vertex-ai, langchain, gemini, gemini-api

  3. 3. DataTalksClub/llm-zoomcamp

    LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.

    GitHub repository with 5,614 stars and 1,015 forks.

    Trending score: 1.87; stars gained: +93; forks gained: +13.

    Language: Jupyter Notebook

  4. 4. Biohub/esm

    GitHub repository with 2,682 stars and 332 forks.

    Trending score: 1.82; stars gained: +48; forks gained: +12.

    Language: Jupyter Notebook

  5. 5. nerdai/llm-agents-from-scratch

    Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A

    GitHub repository with 131 stars and 45 forks.

    Trending score: 1.70; stars gained: +32; forks gained: +14.

    Language: Jupyter Notebook

  6. 6. openai/openai-cookbook

    Examples and guides for using the OpenAI API

    GitHub repository with 73,986 stars and 12,524 forks.

    Trending score: 1.57; stars gained: +24; forks gained: +12.

    Language: Jupyter Notebook

    Topics: chatgpt, gpt-4, openai, openai-api

Trending topic: spark

  1. 1. apache/doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    GitHub repository with 15,437 stars and 3,812 forks.

    Trending score: 2.65; stars gained: +11; forks gained: +7.

    Language: Java

    Topics: agent, ai, bigquery, database, dbt, delta-lake

  2. 2. DataTalksClub/data-engineering-zoomcamp

    Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

    GitHub repository with 41,902 stars and 8,303 forks.

    Trending score: 1.49; stars gained: +23; forks gained: +3.

    Language: Jupyter Notebook

    Topics: course, data-engineering, dbt, docker, free, kafka

  3. 3. tobymao/sqlglot

    Python SQL Parser and Transpiler

    GitHub repository with 9,303 stars and 1,158 forks.

    Trending score: 0.95; stars gained: +5; forks gained: +3.

    Language: Python

    Topics: transpiler, sql, python, parser, optimizer, bigquery

  4. 4. getredash/redash

    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

    GitHub repository with 28,616 stars and 4,601 forks.

    Trending score: 0.88; stars gained: +3; forks gained: +1.

    Language: Python

    Topics: redash, python, visualization, analytics, bi, redshift

  5. 5. lakehq/sail

    Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

    GitHub repository with 2,845 stars and 163 forks.

    Trending score: 0.88; stars gained: +7; forks gained: +0.

    Language: Rust

    Topics: apache-iceberg, apache-spark, arrow, artificial-intelligence, big-data, data-engineering

  6. 6. ytsaurus/ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

    GitHub repository with 2,195 stars and 205 forks.

    Trending score: 0.84; stars gained: +2; forks gained: +0.

    Language: C++

    Topics: big-data, clickhouse, distributed-database, lakehouse, olap-database, spark