robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
How to download and setup robots.txt
Open terminal and run command
git clone https://github.com/jonasjacek/robots.txt.git
git clone is used to create a copy or clone of robots.txt repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with robots.txt https://github.com/jonasjacek/robots.txt/archive/master.zip
Or simply clone robots.txt with SSH
[email protected]:jonasjacek/robots.txt.git
If you have some problems with robots.txt
You may open issue on robots.txt support forum (system) here: https://github.com/jonasjacek/robots.txt/issuesSimilar to robots.txt repositories
Here you may see robots.txt alternatives and analogs
scrapy learn-anything elasticsearch Sasila colly headless-chrome-crawler Lulu gopa MHTextSearch Mailpile newspaper isp-data-pollution webster cdp4j spidy stopstalk-deployment N2H4 memorious easy-scraping-tutorial antch pomp Harvester diffbot-php-client talospider corpuscrawler Python-Crawling-Tutorial learn.scrapinghub.com crawling-projects dig-etl-engine crawlkit