17 Forks
51 Stars
51 Watchers

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

How to download and setup flink-crawler

Open terminal and run command
git clone https://github.com/ScaleUnlimited/flink-crawler.git
git clone is used to create a copy or clone of flink-crawler repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with flink-crawler https://github.com/ScaleUnlimited/flink-crawler/archive/master.zip

Or simply clone flink-crawler with SSH
[email protected]:ScaleUnlimited/flink-crawler.git

If you have some problems with flink-crawler

You may open issue on flink-crawler support forum (system) here: https://github.com/ScaleUnlimited/flink-crawler/issues

Similar to flink-crawler repositories

Here you may see flink-crawler alternatives and analogs

 scrapy    Sasila    Price-monitor    webmagic    colly    headless-chrome-crawler    Lulu    newcrawler    scrapple    goose-parser    arachnid    gopa    scrapy-zyte-smartproxy    node-crawler    arachni    newspaper    isp-data-pollution    webster    cdp4j    spidy    stopstalk-deployment    N2H4    memorious    easy-scraping-tutorial    antch    pomp    Harvester    diffbot-php-client    talospider    corpuscrawler