kkrugler/flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
GitHub repository with 53 stars and 18 forks.
Language: Java
Topics: web-crawler, web-crawling, crawler, crawling, spider, flink
Continuous scalable web crawler built on top of Flink and crawler-commons
GitHub repository with 53 stars and 18 forks.
Language: Java
Topics: web-crawler, web-crawling, crawler, crawling, spider, flink
2026-06-04: 53 stars and 18 forks.
Apache Nutch is an extensible and scalable web crawler
GitHub repository with 3,155 stars and 1,266 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Java
Topics: apache, crawling, hadoop, java, nutch, web-crawler
GitHub repository with 722 stars and 104 forks.
Trending score: 2.91; stars gained: +47; forks gained: +5.
Language: Java
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
GitHub repository with 23,671 stars and 2,206 forks.
Trending score: 2.84; stars gained: +700; forks gained: +69.
Language: Java
Topics: json, markdown, pdf, ai, document-parsing, html
WebHomeTV 基于FongMi二次开发,增强了 WebHome 自定义首页、App Native SDK、网盘链接检测 和 Nostr推荐首页。 这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用:开发者可以用 HTML/CSS/JavaScript 定制首页,再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。
GitHub repository with 353 stars and 105 forks.
Trending score: 2.73; stars gained: +30; forks gained: +5.
Language: Java
AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.
GitHub repository with 975 stars and 57 forks.
Trending score: 2.63; stars gained: +20; forks gained: +5.
Language: Java
Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq
Apache Kafka - A distributed event streaming platform
GitHub repository with 32,711 stars and 15,249 forks.
Trending score: 2.24; stars gained: +6; forks gained: +1.
Language: Java
Topics: java, kafka, scala, streaming
Apache Doris is an easy-to-use, high performance and unified analytics database.
GitHub repository with 15,433 stars and 3,811 forks.
Trending score: 2.23; stars gained: +5; forks gained: +0.
Language: Java
Topics: olap, database, hudi, iceberg, real-time, sql
The API to search, scrape, and interact with the web at scale. 🔥
GitHub repository with 128,674 stars and 7,663 forks.
Trending score: 4.43; stars gained: +426; forks gained: +10.
Language: TypeScript
Topics: ai, ai-agents, ai-crawler, ai-scraping, ai-search, crawler
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
GitHub repository with 1,286 stars and 148 forks.
Trending score: 1.18; stars gained: +16; forks gained: +0.
Language: Rust
Topics: ai-agents, cli, llm, markdown, mcp, rust
Low latency web data collector
GitHub repository with 2,522 stars and 207 forks.
Trending score: 0.49; stars gained: +2; forks gained: +1.
Language: Rust
Topics: crawler, rust, spider, headless-chrome, scraping, automation
Golang Crawling and scraping framework
GitHub repository with 200 stars and 26 forks.
Trending score: 0.35; stars gained: +1; forks gained: +0.
Language: Go
Topics: go-framework, golang, web-crawler, web-scraping, crawler, go
High-performance web crawling engine with bindings for 11 languages
GitHub repository with 103 stars and 13 forks.
Trending score: 0.33; stars gained: +1; forks gained: +0.
Language: Rust
Topics: crawling, csharp, elixir, ffi, golang, java
Powerful Google Maps Crawler / Scraper tool with REST API, Docker support & multi-format export
GitHub repository with 74 stars and 7 forks.
Trending score: 0.28; stars gained: +1; forks gained: +0.
Language: Python
Topics: browser-automation, data-extraction, docker, google, google-maps, maps