wiki-scraper
This web crawler uses Scrapy py to crawl Wikipedia. It prints the page title, total word count, and page category (using openpyxl) to an Excel workbook, in order to analyze the verbosity of articles by category.
How to download and setup wiki-scraper
Open terminal and run command
git clone https://github.com/marinakiseleva/wiki-scraper.git
git clone is used to create a copy or clone of wiki-scraper repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with wiki-scraper https://github.com/marinakiseleva/wiki-scraper/archive/master.zip
Or simply clone wiki-scraper with SSH
[email protected]:marinakiseleva/wiki-scraper.git
If you have some problems with wiki-scraper
You may open issue on wiki-scraper support forum (system) here: https://github.com/marinakiseleva/wiki-scraper/issuesSimilar to wiki-scraper repositories
Here you may see wiki-scraper alternatives and analogs
scrapy requests-html Sasila webmagic colly headless-chrome-crawler Embed artoo instagram-scraper django-dynamic-scraper scrapy-cluster Lulu newcrawler panther facebook_data_analyzer ImageScraper scrapple parsel nickjs jsoup-annotations jekyll Musoq goose-parser arachnid lambdasoup gopa geeksforgeeks.pdf scrapy-zyte-smartproxy sqrape comic-dl