supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
How to download and setup supercrawler
Open terminal and run command
git clone https://github.com/brendonboshell/supercrawler.git
git clone is used to create a copy or clone of supercrawler repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with supercrawler https://github.com/brendonboshell/supercrawler/archive/master.zip
Or simply clone supercrawler with SSH
[email protected]:brendonboshell/supercrawler.git
If you have some problems with supercrawler
You may open issue on supercrawler support forum (system) here: https://github.com/brendonboshell/supercrawler/issuesSimilar to supercrawler repositories
Here you may see supercrawler alternatives and analogs
scrapy Sasila Price-monitor webmagic colly headless-chrome-crawler Lulu newcrawler scrapple goose-parser arachnid gopa scrapy-zyte-smartproxy node-crawler arachni newspaper webster spidy N2H4 easy-scraping-tutorial antch pomp talospider podcastcrawler FileMasta lux scrapy-redis haipproxy DotnetSpider TumblThree