commoncrawl/web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

GitHub repository with 69 stars and 93 forks.

Topics: crawling, dataset, language-detection

Open provider repository

24h trend summary

Trending score 0.42, activity score 0.67, stars gained +1, forks gained +2.

Latest metric snapshot

2026-06-05: 69 stars and 93 forks.

Similar repositories

  1. 1. apify/crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    GitHub repository with 9,142 stars and 747 forks.

    Trending score: 1.08; stars gained: +7; forks gained: -1.

    Language: Python

    Topics: apify, automation, beautifulsoup, crawler, crawling, headless

  2. 2. kreuzberg-dev/kreuzcrawl

    High-performance web crawling engine with bindings for 11 languages

    GitHub repository with 103 stars and 13 forks.

    Trending score: 0.49; stars gained: +2; forks gained: +1.

    Language: Rust

    Topics: crawling, csharp, elixir, ffi, golang, java

  3. 3. commoncrawl/web-languages

    Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

    GitHub repository with 69 stars and 93 forks.

    Trending score: 0.42; stars gained: +1; forks gained: +2.

    Topics: crawling, dataset, language-detection

  4. 4. MarshalX/telegram-crawler

    🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

    GitHub repository with 351 stars and 46 forks.

    Trending score: 0.41; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: crawler, parser, telegram, telegram-org, telegram-updates, crawling

  5. 5. ScrapeRouter/awesome-scraping

    The definitive list of the latest libraries, tools, APIs and providers for web scraping. The only daily-updated collection of web scraping resources.

    GitHub repository with 9 stars and 2 forks.

    Trending score: 0.18; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-scraping, awesome, crawler, crawling, http, scraper

  6. 6. bitmakerla/estela

    estela, an elastic web scraping cluster 🕸

    GitHub repository with 196 stars and 18 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: crawling, django, kubernetes, python, react, scraping

Trending topic: crawling

  1. 1. apify/crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    GitHub repository with 9,142 stars and 747 forks.

    Trending score: 1.08; stars gained: +7; forks gained: -1.

    Language: Python

    Topics: apify, automation, beautifulsoup, crawler, crawling, headless

  2. 2. kreuzberg-dev/kreuzcrawl

    High-performance web crawling engine with bindings for 11 languages

    GitHub repository with 103 stars and 13 forks.

    Trending score: 0.49; stars gained: +2; forks gained: +1.

    Language: Rust

    Topics: crawling, csharp, elixir, ffi, golang, java

  3. 3. commoncrawl/web-languages

    Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

    GitHub repository with 69 stars and 93 forks.

    Trending score: 0.42; stars gained: +1; forks gained: +2.

    Topics: crawling, dataset, language-detection

  4. 4. MarshalX/telegram-crawler

    🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

    GitHub repository with 351 stars and 46 forks.

    Trending score: 0.41; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: crawler, parser, telegram, telegram-org, telegram-updates, crawling

  5. 5. ScrapeRouter/awesome-scraping

    The definitive list of the latest libraries, tools, APIs and providers for web scraping. The only daily-updated collection of web scraping resources.

    GitHub repository with 9 stars and 2 forks.

    Trending score: 0.18; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai-scraping, awesome, crawler, crawling, http, scraper

  6. 6. bitmakerla/estela

    estela, an elastic web scraping cluster 🕸

    GitHub repository with 196 stars and 18 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: crawling, django, kubernetes, python, react, scraping