Most popular scraping repositories and open source projects

newcrawler

Free Web Scraping Tool with Java

115   581   581  

secret-agent

The web scraper that's nearly impossible to block - now called @ulixee...

42   578   578  

dataflowkit

Extract structured data from web sites. Web sites scraping.

74   577   577  

social-media-profiles-regexs

:card_index: Extract social media profiles and more with regular expre...

74   556   556  

facebook_data_analyzer

Analyze facebook copy of your data with ruby language. Download zip fi...

56   542   542  

lookyloo

Lookyloo is a web interface that allows users to capture a website pag...

62   526   526  

nickjs

Web scraping library made by the Phantombuster team. Modern, simple &...

57   499   499  

gogoanime-api

Anime Streaming, Discovery API made with Cheerio and Express. Uses dat...

138   498   498  

comic-dl

Comic-dl is a command line tool to download manga and comics from vari...

68   495   495  

socialreaper

Social media scraping / data collection library for Facebook, Twitter,...

96   495   495  

jekyll

Jekyll-based static site for The Programming Historian

224   490   490  

scrapple

A framework for creating semi-automatic web content extractors

41   489   489  

api.consumet.org

A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Nov...

130   473   473  

linkedin

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker...

92   471   471  

spidermon

Scrapy Extension for monitoring spiders execution.

86   455   455  

search-engine-parser

Lightweight package to query popular search engines and scrape for res...

78   395   395  

post-tuto-deployment

Build and deploy a machine learning app from scratch πŸš€

105   387   387  

tinking

🧢 Extract data from any website without code, just clicks.

23   384   384  

List-of-user-agents

List of major web + mobile browser user agent strings. +1 Bonus script...

229   380   380  

scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape person...

162   379   379  

coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sou...

184   369   369  

dude

dude uncomplicated data extraction: A simple framework for writing web...

22   367   367  

PHPScraper

A universal web-util for PHP.

58   362   362  

lambdasoup

Functional HTML scraping and rewriting with CSS in OCaml

29   355   355  

linkedin-profile-scraper

πŸ•΅οΈβ€β™‚οΈ LinkedIn profile scraper returning structured profile data in J...

114   346   346  

jikan-rest

The REST API for Jikan

240   345   345  

reaper

Social media scraping / data collection tool for the Facebook, Twitter...

71   340   340  

quetre

A libre front-end for Quora

21   340   340  

scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

89   337   337  

pdf.tocgen

A CLI toolset to generate table of contents for PDF files automaticall...

15   334   334  

elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

48   324   324  

ScrapySharp

reborn of https://bitbucket.org/rflechner/scrapysharp

74   317   317  

geeksforgeeks.pdf

Topic wise PDFs of Geeks for Geeks articles. (Last updated in October...

125   315   315  

gopa

GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://inde...

82   296   296  

Sasila

δΈ€δΈͺ灡活、友ε₯½ηš„ηˆ¬θ™«ζ‘†ζžΆ

76   293   293  

memorious

Lightweight web scraping toolkit for documents and structured data.

53   287   287  

PulsarRPA

Automate webpages at scale, scrape web data completely and accurately...

59   287   287  

google-search-results-python

Google Search Results via SERP API pip Python Package

71   287   287  

docker-selenium-lambda

The simplest demo of chrome automation by python and selenium in AWS L...

77   283   283  

fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed...

37   278   278  

juriscraper

An API to scrape American court websites for metadata.

76   269   269  

Musoq

Use SQL on various data sources

14   259   259  

arachnid

Crawl all unique internal links found on a given website, and extract...

64   252   252  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

42   250   250  

jsoup-annotations

Jsoup Annotations POJO

19   240   240  

dark-knowledge

πŸ˜ˆπŸ“š A curated library of research papers and presentations for counter-...

12   238   238  

bbb-face-recognizer

Face recognition system using MTCNN, FACENET, SVM and FAST API to trac...

32   233   233  

idt

Image Dataset Tool (idt) is a cli tool designed to make the otherwise...

29   232   232  

anime-dl

Anime-dl is a command-line program to download anime from CrunchyRoll...

38   225   225  

goose-parser

Universal scraping tool, which allows you to extract data using multip...

16   223   223