linkedin/openhouse
Open Control Plane for Tables in Data Lakehouse
GitHub repository with 389 stars and 78 forks.
Language: Java
Topics: big-data, catalog, datalake, datalakehouse, declarative, iceberg, management, tables
Open Control Plane for Tables in Data Lakehouse
GitHub repository with 389 stars and 78 forks.
Language: Java
Topics: big-data, catalog, datalake, datalakehouse, declarative, iceberg, management, tables
2026-06-10: 389 stars and 78 forks.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
GitHub repository with 12,931 stars and 3,661 forks.
Trending score: 2.03; stars gained: +5; forks gained: -1.
Language: Java
Topics: analytics, big-data, data-science, database, databases, datalake
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
GitHub repository with 11,786 stars and 2,449 forks.
Trending score: 1.93; stars gained: +5; forks gained: +2.
Language: Java
Topics: analytics, big-data, cloudnative, database, datalake, delta-lake
Apache Hive
GitHub repository with 5,978 stars and 4,792 forks.
Trending score: 1.05; stars gained: +1; forks gained: -1.
Language: Java
Topics: apache, big-data, database, hadoop, hive, java
Apache Ignite
GitHub repository with 5,067 stars and 1,938 forks.
Trending score: 0.91; stars gained: +2; forks gained: -1.
Language: Java
Topics: big-data, cache, cloud, data-management-platform, database, distributed-sql-database
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
GitHub repository with 3,299 stars and 1,336 forks.
Trending score: 0.81; stars gained: +0; forks gained: +2.
Language: Java
Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark
Apache Fluss is a streaming storage built for real-time analytics.
GitHub repository with 1,940 stars and 561 forks.
Trending score: 0.78; stars gained: +0; forks gained: +0.
Language: Java
Topics: streaming, fluss, lakehouse, real-time-analytics, big-data, hacktoberfest
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
GitHub repository with 25,066 stars and 2,364 forks.
Trending score: 4.94; stars gained: +514; forks gained: +54.
Language: Java
Topics: a11y, accessibility, ai, bounding-box, document-parsing, eaa
Ghidra is a software reverse engineering (SRE) framework
GitHub repository with 69,674 stars and 7,648 forks.
Trending score: 3.84; stars gained: +105; forks gained: +11.
Language: Java
Topics: disassembler, reverse-engineering, software-analysis
AgentScope Java: Agent-Oriented Programming for Building LLM Applications
GitHub repository with 3,837 stars and 819 forks.
Trending score: 3.82; stars gained: +104; forks gained: +22.
Language: Java
Topics: agent, agentic, agentic-ai, ai, llm
Agentic AI Framework for Java Developers
GitHub repository with 10,020 stars and 2,232 forks.
Trending score: 3.45; stars gained: +80; forks gained: +23.
Language: Java
Topics: artificial-intelligence, java, spring-ai, agentic, context-engineering, multi-agent
Ghidra MCP Server — 200+ MCP tools for AI-powered reverse engineering. GUI plugin + headless server, lazy tool loading, convention enforcement, batch operations, Ghidra Server integration, and Docker deployment.
GitHub repository with 2,440 stars and 32 forks.
Trending score: 3.42; stars gained: +86; forks gained: +6.
Language: Java
Topics: binary-analysis, ghidra, java, mcp, model-context-protocol, python
Halo 是一款强大易用的开源建站工具,从个人博客、知识库,到企业官网、在线商城,Halo 都能助您轻松实现,一站式满足您的多样化建站需求。
GitHub repository with 39,039 stars and 10,297 forks.
Trending score: 3.32; stars gained: +60; forks gained: +9.
Language: Java
Topics: halo, cms, halocms, content-management-system, blog, blog-engine
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
GitHub repository with 2,964 stars and 173 forks.
Trending score: 2.70; stars gained: +21; forks gained: +0.
Language: Rust
Topics: apache-iceberg, apache-spark, arrow, artificial-intelligence, big-data, data-engineering
ClickHouse® is a real-time analytics database management system
GitHub repository with 48,009 stars and 8,511 forks.
Trending score: 2.67; stars gained: +11; forks gained: +4.
Language: C++
Topics: ai, analytics, big-data, clickhouse, cloud-native, cpp
Apache Spark - A unified analytics engine for large-scale data processing
GitHub repository with 43,456 stars and 29,225 forks.
Trending score: 2.10; stars gained: +9; forks gained: +4.
Language: Scala
Topics: big-data, java, jdbc, python, r, scala
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
GitHub repository with 12,931 stars and 3,661 forks.
Trending score: 2.03; stars gained: +5; forks gained: -1.
Language: Java
Topics: analytics, big-data, data-science, database, databases, datalake
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
GitHub repository with 11,786 stars and 2,449 forks.
Trending score: 1.93; stars gained: +5; forks gained: +2.
Language: Java
Topics: analytics, big-data, cloudnative, database, datalake, delta-lake
The Open Source Feature Store for AI/ML
GitHub repository with 7,092 stars and 1,344 forks.
Trending score: 1.70; stars gained: +3; forks gained: +3.
Language: Python
Topics: machine-learning, features, ml, big-data, feature-store, python