266 Forks
931 Stars
931 Watchers

stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm

How to download and setup stormcrawler

Open terminal and run command
git clone https://github.com/apache/stormcrawler.git
git clone is used to create a copy or clone of stormcrawler repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with stormcrawler https://github.com/apache/stormcrawler/archive/master.zip

Or simply clone stormcrawler with SSH
[email protected]:apache/stormcrawler.git

If you have some problems with stormcrawler

You may open issue on stormcrawler support forum (system) here: https://github.com/apache/stormcrawler/issues

Similar to stormcrawler repositories

Here you may see stormcrawler alternatives and analogs

 tensorflow    scrapy    CNTK    diaspora    Qix    handson-ml    Sasila    Price-monitor    infinit    diplomat    olric    qTox    LightGBM    h2o-3    catboost    distributed    tns    webmagic    colly    headless-chrome-crawler    scrapy-cluster    Lulu    newcrawler    scrapple    goose-parser    arachnid    crawler    scrapy-zyte-smartproxy    EvaEngine.js    dgraph