zaratsian/Spark
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
GitHub repository with 69 stars and 37 forks.
Language: Jupyter Notebook
Topics: spark, pyspark, nlp, machine-learning, text-analysis
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
GitHub repository with 69 stars and 37 forks.
Language: Jupyter Notebook
Topics: spark, pyspark, nlp, machine-learning, text-analysis
Trending score 0.03, activity score 0.17, stars gained +0, forks gained +0.
2026-06-02: 69 stars and 37 forks.
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
GitHub repository with 41,902 stars and 8,303 forks.
Trending score: 1.49; stars gained: +23; forks gained: +3.
Language: Jupyter Notebook
Topics: course, data-engineering, dbt, docker, free, kafka
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
GitHub repository with 69 stars and 37 forks.
Trending score: 0.03; stars gained: +0; forks gained: +0.
Language: Jupyter Notebook
Topics: spark, pyspark, nlp, machine-learning, text-analysis
Data science and Big Data with Python
GitHub repository with 135 stars and 164 forks.
Trending score: 0.00; stars gained: +0; forks gained: +0.
Language: Jupyter Notebook
Topics: data-science, python, hbase, numpy, numerical-methods, notebook-jupyter
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
GitHub repository with 9,194 stars and 588 forks.
Trending score: 2.37; stars gained: +326; forks gained: +20.
Language: Jupyter Notebook
Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform
GitHub repository with 16,980 stars and 4,249 forks.
Trending score: 1.87; stars gained: +8; forks gained: +6.
Language: Jupyter Notebook
Topics: generative-ai, llm, vertex-ai, langchain, gemini, gemini-api
LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.
GitHub repository with 5,614 stars and 1,015 forks.
Trending score: 1.87; stars gained: +93; forks gained: +13.
Language: Jupyter Notebook
GitHub repository with 2,682 stars and 332 forks.
Trending score: 1.82; stars gained: +48; forks gained: +12.
Language: Jupyter Notebook
Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A
GitHub repository with 131 stars and 45 forks.
Trending score: 1.70; stars gained: +32; forks gained: +14.
Language: Jupyter Notebook
Examples and guides for using the OpenAI API
GitHub repository with 73,986 stars and 12,524 forks.
Trending score: 1.57; stars gained: +24; forks gained: +12.
Language: Jupyter Notebook
Topics: chatgpt, gpt-4, openai, openai-api
Apache Doris is an easy-to-use, high performance and unified analytics database.
GitHub repository with 15,437 stars and 3,812 forks.
Trending score: 2.65; stars gained: +11; forks gained: +7.
Language: Java
Topics: agent, ai, bigquery, database, dbt, delta-lake
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
GitHub repository with 41,902 stars and 8,303 forks.
Trending score: 1.49; stars gained: +23; forks gained: +3.
Language: Jupyter Notebook
Topics: course, data-engineering, dbt, docker, free, kafka
Python SQL Parser and Transpiler
GitHub repository with 9,303 stars and 1,158 forks.
Trending score: 0.95; stars gained: +5; forks gained: +3.
Language: Python
Topics: transpiler, sql, python, parser, optimizer, bigquery
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
GitHub repository with 28,616 stars and 4,601 forks.
Trending score: 0.88; stars gained: +3; forks gained: +1.
Language: Python
Topics: redash, python, visualization, analytics, bi, redshift
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
GitHub repository with 2,845 stars and 163 forks.
Trending score: 0.88; stars gained: +7; forks gained: +0.
Language: Rust
Topics: apache-iceberg, apache-spark, arrow, artificial-intelligence, big-data, data-engineering
YTsaurus is a scalable and fault-tolerant open-source big data platform.
GitHub repository with 2,195 stars and 205 forks.
Trending score: 0.84; stars gained: +2; forks gained: +0.
Language: C++
Topics: big-data, clickhouse, distributed-database, lakehouse, olap-database, spark