Most popular scraping repositories and open source projects

conformist

Bend CSVs to your will with declarative schemas.

6   61   61  

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC...

10   61   61  

pomp

Screen scraping and web crawling framework

10   60   60  

ksoup

Kotlin Wrapper for Jsoup

6   60   60  

mangalivre-api

API não-oficial do mangá livre feita com Node.js e Express.js.

10   58   58  

Pahe.ph-Scraper

Pahe.ph [Pahe.in] Movies Website Scraper

16   58   58  

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy...

14   57   57  

angel.co-companies-list-scraping

34   57   57  

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

21   57   57  

Tor_Spider

Python project to crawl and scrap the lesser known deep web or one can...

16   57   57  

WebScrapper

Telegram Bot to scrap webpages using Requests, html5lib and Beautifuls...

43   57   57  

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler dev...

10   56   56  

actor-facebook-scraper

Scrape public Facebook pages, posts, reviews and comments

32   56   56  

pycaching

A Python 3 interface for working with Geocaching.com website.

42   55   55  

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract d...

11   55   55  

Euro2016_TerminalApp

:soccer: Instantly find :trophy:EURO 2016 live-streams & highlights, n...

10   54   54  

local-api-client-typescript

Official JavaScript/TypeScript library for interacting with Kameleo Cl...

1   54   54  

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The offici...

20   53   53  

learn.scrapinghub.com

Scrapinghub Learning Center. Report issues in Jira: Report issues in J...

24   53   53  

mtnt

Code for the collection and analysis of the MTNT dataset

4   53   53  

SearchEngineScrapy

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com,...

18   53   53  

amazon_scraper

Amazon products scraper with using of rotating proxies and headless Ch...

16   53   53  

Miyou

An anime discovery, streaming site made with React.js. It uses AniList...

20   53   53  

scraper-fourone-jobs

This is a anti-scraping cracker for extracting apply information of on...

11   52   52  

local-api-client-python

Official Python library for interacting with Kameleo Client

3   52   52  

media-search-engine

Search geolocations for (social) media posts in databases like Belling...

8   52   52  

playlist2links

This bash script allows to extract video links from a youtube playlist

10   51   51  

whatsapp-tracking

Scraping the status of WhatsApp contacts

13   51   51  

puppeteer-botcheck

🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!

6   51   51  

api-client

API client to develop tools for competitive programming

16   51   51  

pge-outages

Tracking PG&E outages

7   50   50  

sample-web-scraping-with-electron

Sample project for web scraping with Electron

14   50   50  

foundation

🧱 A uniform template to use as a foundation for Puppeteer bot construc...

7   50   50  

chegg-scraper

Download Chegg homework-help questions to self-sufficient HTML files

17   50   50  

dart-scraper

한국 금융감독원에서 운영하는 다트(Dart) 시스템을 이용한 기업 재무제표...

21   49   49  

pypatent

Search for and retrieve US Patent and Trademark Office Patent Data

15   47   47  

linkedin-scrapper

LinkedIn scrapper is advanced search result scrapper script build with...

21   47   47  

scrapers

scrapers for building your own image databases

6   46   46  

ogpParser

Open Graph Protocol Parser for Node.js

9   46   46  

maps-to-lead

Esse projeto tem como objetivo obter leads em formato JSON e enviar pa...

8   46   46  

datasette-scraper

Add website scraping abilities to Datasette

1   46   46  

torrent-tracker-scraper

A UDP torrent tracker scraper library written in Python 3

14   45   45  

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-base...

12   44   44  

xdsl-exporter

xDSL Prometheus Exporter

2   44   44  

oversmash

Overwatch API library for player details and career stats

6   43   43  

jason-the-miner

⛏ A versatile Web scraper for Node.js

11   43   43  

bluebird

Unofficial Python client for Twitter

11   43   43  

go-ps4

Search your favorite PS4 games from Playstation Store using the Comman...

6   42   42  

image-collector

Download images from Google Image Search

22   42   42  

hext

Domain-specific language for extracting structured data from HTML docu...

3   42   42