kkrugler/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

GitHub repository with 53 stars and 18 forks.

Language: Java

Topics: web-crawler, web-crawling, crawler, crawling, spider, flink

Open provider repository

Latest metric snapshot

2026-06-04: 53 stars and 18 forks.

Similar repositories

1. apache/nutch

Apache Nutch is an extensible and scalable web crawler

GitHub repository with 3,155 stars and 1,266 forks.

Trending score: 0.04; stars gained: +0; forks gained: +0.

Language: Java

Topics: apache, crawling, hadoop, java, nutch, web-crawler

Trending in Java

1. Lucas0623z/NoteLite

GitHub repository with 722 stars and 104 forks.

Trending score: 2.91; stars gained: +47; forks gained: +5.

Language: Java
2. opendataloader-project/opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

GitHub repository with 23,671 stars and 2,206 forks.

Trending score: 2.84; stars gained: +700; forks gained: +69.

Language: Java

Topics: json, markdown, pdf, ai, document-parsing, html
3. fish2018/webhtv

WebHomeTV 基于FongMi二次开发，增强了 WebHome 自定义首页、App Native SDK、网盘链接检测和 Nostr推荐首页。这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用：开发者可以用 HTML/CSS/JavaScript 定制首页，再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。

GitHub repository with 353 stars and 105 forks.

Trending score: 2.73; stars gained: +30; forks gained: +5.

Language: Java
4. juanjuandog/FinSight-AI

AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

GitHub repository with 975 stars and 57 forks.

Trending score: 2.63; stars gained: +20; forks gained: +5.

Language: Java

Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq
5. apache/kafka

Apache Kafka - A distributed event streaming platform

GitHub repository with 32,711 stars and 15,249 forks.

Trending score: 2.24; stars gained: +6; forks gained: +1.

Language: Java

Topics: java, kafka, scala, streaming
6. apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

GitHub repository with 15,433 stars and 3,811 forks.

Trending score: 2.23; stars gained: +5; forks gained: +0.

Language: Java

Topics: olap, database, hudi, iceberg, real-time, sql

kkrugler/flink-crawler

Latest metric snapshot

Similar repositories

1. apache/nutch

Trending in Java

1. Lucas0623z/NoteLite

2. opendataloader-project/opendataloader-pdf

3. fish2018/webhtv

4. juanjuandog/FinSight-AI

5. apache/kafka

6. apache/doris

Trending topic: web-crawler

1. firecrawl/firecrawl

2. 0xMassi/webclaw

3. spider-rs/spider

4. gosom/scrapemate

5. kreuzberg-dev/kreuzcrawl

6. Ayyouboss0011/SherlockMaps