punjabi_news_website_crawlers
This project contain three Python file for creating the Punjabi News Corpus by crawling three respective Punjabi News websites, i.e. punjabitribuneonline.com, punjabijagran.com, and jagbani.punjabkesari.in
How to download and setup punjabi_news_website_crawlers
Open terminal and run command
git clone https://github.com/GurjotSinghMahi/punjabi_news_website_crawlers.git
git clone is used to create a copy or clone of punjabi_news_website_crawlers repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with punjabi_news_website_crawlers https://github.com/GurjotSinghMahi/punjabi_news_website_crawlers/archive/master.zip
Or simply clone punjabi_news_website_crawlers with SSH
[email protected]:GurjotSinghMahi/punjabi_news_website_crawlers.git
If you have some problems with punjabi_news_website_crawlers
You may open issue on punjabi_news_website_crawlers support forum (system) here: https://github.com/GurjotSinghMahi/punjabi_news_website_crawlers/issuesSimilar to punjabi_news_website_crawlers repositories
Here you may see punjabi_news_website_crawlers alternatives and analogs
scrapy Sasila colly headless-chrome-crawler Lulu gopa newspaper isp-data-pollution webster cdp4j spidy stopstalk-deployment N2H4 memorious easy-scraping-tutorial antch pomp Harvester diffbot-php-client talospider corpuscrawler Python-Crawling-Tutorial learn.scrapinghub.com crawling-projects dig-etl-engine crawlkit scrapy-selenium spidyquotes zcrawl podcastcrawler