Norconex/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
GitHub repository with 202 stars and 70 forks.
Language: Java
Topics: search-engine, web-crawler, java, collector-http, flexible, crawler, crawlers, filesystem-crawler, collector-fs