nutch-solr-integration
An ultra small PoC to show how to combine Apache Nutch and Apache Solr, crawling through web pages and storing the results in Solr for quering
How to download and setup nutch-solr-integration
Open terminal and run command
git clone https://github.com/basraven/nutch-solr-integration.git
git clone is used to create a copy or clone of nutch-solr-integration repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with nutch-solr-integration https://github.com/basraven/nutch-solr-integration/archive/master.zip
Or simply clone nutch-solr-integration with SSH
[email protected]:basraven/nutch-solr-integration.git
If you have some problems with nutch-solr-integration
You may open issue on nutch-solr-integration support forum (system) here: https://github.com/basraven/nutch-solr-integration/issuesSimilar to nutch-solr-integration repositories
Here you may see nutch-solr-integration alternatives and analogs
scrapy Sasila colly headless-chrome-crawler Lulu gopa newspaper isp-data-pollution webster cdp4j spidy stopstalk-deployment N2H4 memorious easy-scraping-tutorial antch pomp Harvester diffbot-php-client talospider corpuscrawler Python-Crawling-Tutorial learn.scrapinghub.com crawling-projects dig-etl-engine crawlkit scrapy-selenium spidyquotes zcrawl podcastcrawler