commoncrawl/web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

GitHub repository with 69 stars and 93 forks.

Topics: crawling, dataset, language-detection

Open provider repository

24h trend summary

Trending score 0.42, activity score 0.67, stars gained +1, forks gained +2.

Latest metric snapshot

2026-06-05: 69 stars and 93 forks.

Similar repositories

1. apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

GitHub repository with 9,142 stars and 747 forks.

Trending score: 1.08; stars gained: +7; forks gained: -1.

Language: Python

Topics: apify, automation, beautifulsoup, crawler, crawling, headless
2. kreuzberg-dev/kreuzcrawl

High-performance web crawling engine with bindings for 11 languages

GitHub repository with 103 stars and 13 forks.

Trending score: 0.49; stars gained: +2; forks gained: +1.

Language: Rust

Topics: crawling, csharp, elixir, ffi, golang, java
3. commoncrawl/web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

GitHub repository with 69 stars and 93 forks.

Trending score: 0.42; stars gained: +1; forks gained: +2.

Topics: crawling, dataset, language-detection
4. MarshalX/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

GitHub repository with 351 stars and 46 forks.

Trending score: 0.41; stars gained: +1; forks gained: +0.

Language: Python

Topics: crawler, parser, telegram, telegram-org, telegram-updates, crawling
5. ScrapeRouter/awesome-scraping

The definitive list of the latest libraries, tools, APIs and providers for web scraping. The only daily-updated collection of web scraping resources.

GitHub repository with 9 stars and 2 forks.

Trending score: 0.18; stars gained: +0; forks gained: +0.

Language: Python

Topics: ai-scraping, awesome, crawler, crawling, http, scraper
6. bitmakerla/estela

estela, an elastic web scraping cluster 🕸

GitHub repository with 196 stars and 18 forks.

Trending score: 0.09; stars gained: +0; forks gained: +0.

Language: TypeScript

Topics: crawling, django, kubernetes, python, react, scraping

commoncrawl/web-languages

24h trend summary

Latest metric snapshot

Similar repositories

Trending topic: crawling