kkrugler/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

GitHub repository with 53 stars and 18 forks.

Language: Java

Topics: web-crawler, web-crawling, crawler, crawling, spider, flink

Open provider repository

Latest metric snapshot

2026-06-04: 53 stars and 18 forks.

Similar repositories

  1. 1. apache/nutch

    Apache Nutch is an extensible and scalable web crawler

    GitHub repository with 3,155 stars and 1,266 forks.

    Trending score: 0.04; stars gained: +0; forks gained: +0.

    Language: Java

    Topics: apache, crawling, hadoop, java, nutch, web-crawler

Trending in Java

  1. 1. Lucas0623z/NoteLite

    GitHub repository with 722 stars and 104 forks.

    Trending score: 2.91; stars gained: +47; forks gained: +5.

    Language: Java

  2. 2. opendataloader-project/opendataloader-pdf

    PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

    GitHub repository with 23,671 stars and 2,206 forks.

    Trending score: 2.84; stars gained: +700; forks gained: +69.

    Language: Java

    Topics: json, markdown, pdf, ai, document-parsing, html

  3. 3. fish2018/webhtv

    WebHomeTV 基于FongMi二次开发,增强了 WebHome 自定义首页、App Native SDK、网盘链接检测 和 Nostr推荐首页。 这个项目的核心目标是让 CSP 站点首页可以变成一个真正可开发的网页应用:开发者可以用 HTML/CSS/JavaScript 定制首页,再通过 App 暴露的 Native 能力完成搜索、播放、跨域请求、资源代理、最近观看、网盘检测和状态同步。

    GitHub repository with 353 stars and 105 forks.

    Trending score: 2.73; stars gained: +30; forks gained: +5.

    Language: Java

  4. 4. juanjuandog/FinSight-AI

    AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

    GitHub repository with 975 stars and 57 forks.

    Trending score: 2.63; stars gained: +20; forks gained: +5.

    Language: Java

    Topics: ai-agent, financial-research, llm-evaluation, pgvector, postgresql, rabbitmq

  5. 5. apache/kafka

    Apache Kafka - A distributed event streaming platform

    GitHub repository with 32,711 stars and 15,249 forks.

    Trending score: 2.24; stars gained: +6; forks gained: +1.

    Language: Java

    Topics: java, kafka, scala, streaming

  6. 6. apache/doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    GitHub repository with 15,433 stars and 3,811 forks.

    Trending score: 2.23; stars gained: +5; forks gained: +0.

    Language: Java

    Topics: olap, database, hudi, iceberg, real-time, sql

Trending topic: web-crawler

  1. 1. firecrawl/firecrawl

    The API to search, scrape, and interact with the web at scale. 🔥

    GitHub repository with 128,674 stars and 7,663 forks.

    Trending score: 4.43; stars gained: +426; forks gained: +10.

    Language: TypeScript

    Topics: ai, ai-agents, ai-crawler, ai-scraping, ai-search, crawler

  2. 2. 0xMassi/webclaw

    Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

    GitHub repository with 1,286 stars and 148 forks.

    Trending score: 1.18; stars gained: +16; forks gained: +0.

    Language: Rust

    Topics: ai-agents, cli, llm, markdown, mcp, rust

  3. 3. spider-rs/spider

    Low latency web data collector

    GitHub repository with 2,522 stars and 207 forks.

    Trending score: 0.49; stars gained: +2; forks gained: +1.

    Language: Rust

    Topics: crawler, rust, spider, headless-chrome, scraping, automation

  4. 4. gosom/scrapemate

    Golang Crawling and scraping framework

    GitHub repository with 200 stars and 26 forks.

    Trending score: 0.35; stars gained: +1; forks gained: +0.

    Language: Go

    Topics: go-framework, golang, web-crawler, web-scraping, crawler, go

  5. 5. kreuzberg-dev/kreuzcrawl

    High-performance web crawling engine with bindings for 11 languages

    GitHub repository with 103 stars and 13 forks.

    Trending score: 0.33; stars gained: +1; forks gained: +0.

    Language: Rust

    Topics: crawling, csharp, elixir, ffi, golang, java

  6. 6. Ayyouboss0011/SherlockMaps

    Powerful Google Maps Crawler / Scraper tool with REST API, Docker support & multi-format export

    GitHub repository with 74 stars and 7 forks.

    Trending score: 0.28; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: browser-automation, data-extraction, docker, google, google-maps, maps