Topic

scraping

Repositories (1626)

Deals-Scraper
Deals-Scraper JustSxm Python

Deals Scraper is a Canadian tool to find good deals on websites like Facebook Marketplace, Kijiji, Ebay, Amazon and Lespacs

89
ARGUS
ARGUS datawizard1337 Python

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different website...

88
feedbridge
feedbridge dewey Go

Plugin based RSS feed generator for sites that don't offer any. Serves RSS, Atom and JSON Feeds.

88
shopify-spy
shopify-spy ndgigliotti Python

Extract structured data from Shopify websites.

88
billy
billy openstates Python

legacy backend for Open States

87
html-table-extractor
html-table-extractor yuanxu-li Python

extract data from html table

86
newser
newser lnenad Go

Newser is a simple utility to generate a pdf with you favorite news articles

86
spiderbuf
spiderbuf hhuayuan Python

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习,在矛与盾的攻防中不断提高技术水平,...

86
top-github-scraper
top-github-scraper khuyentran1401 HTML

Scape top GitHub repositories and users based on keywords

85
amazon_scraper
amazon_scraper ScrapingAnt JavaScript

Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt

85
introWebScraping
introWebScraping ksahin Java

Code exemple for my blog posts

83
open-australian-legal-corpus-creator
open-australian-legal-corpus-creator isaacus-dev Python

The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and...

82
Whatsapp-Net
Whatsapp-Net OfirKP JavaScript

Generate a network graph of connections from your WhatsApp groups data

81
linkedin-bot
linkedin-bot FujiwaraChoki Python

Automate your LinkedIn Outreach with Selenium and GeckoDriver.

81
Solana_Twitter_Token_NFT_Sniper_Bot
Solana_Twitter_Token_NFT_Sniper_Bot solagent99 TypeScript

🔥Solana Token/NFT snipping Bot w/ twiter - Raydium, Pumpfun Snipping Bot w/ scrapping Twitter.

81
google-covid19-mobility-reports
google-covid19-mobility-reports vitorbaptista HTML

Data extraction of Google's COVID-19 Mobility Reports

80
pypatent
pypatent daneads Python

Search for and retrieve US Patent and Trademark Office Patent Data

79
requests-random-user-agent
requests-random-user-agent DavidWittman Python

Configures the requests library to randomly select a desktop User-Agent

79
pydork
pydork blacknon Python

Scraping and listing text and image searches on Google, Bing, DuckDuckGo, Baidu, Yahoo japan.

79
spidercreator
spidercreator carlosplanchon Python

Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal manual coding. I...

79
UltimateTab
UltimateTab BenoitBellegarde TypeScript

Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.

79
linkedin-scraper
linkedin-scraper fabriziomiano Python

Tool to scrape linkedin

78
WebScraper
WebScraper MLArtist Python

Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blo...

78
abx-dl
abx-dl ArchiveBox JavaScript

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...

78
instagram-users-scraper
instagram-users-scraper floriandiud TypeScript

Instagram Scraper. Scrape Instagram followers, following list, and post authors. Download CSV files with Instagram users from followers, following, ta...

77
rota
rota alpkeskin Go

A high-performance proxy rotation engine with automated IP management and real-time health monitoring

77
feedsearch-crawler
feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

77
agentql-mcp
agentql-mcp tinyfish-io JavaScript

Model Context Protocol server that integrates AgentQL's data extraction capabilities.

76
Miyou
Miyou debsishu JavaScript

An anime discovery, streaming site made with React.js. It uses AniList API and video data from GogoAnime. No ads and no VPN required.

76
map-email-scraper
map-email-scraper MickeyUK JavaScript

A open source tool for collating publically available contact information for businesses.

76
goClone
goClone shurco Go

🌱 goClone - clone websites in seconds

76
linkpreview
linkpreview linkpreview JavaScript

Open Graph, Twitter Card, Oembed preview. Shows visual cards that mimic link previews in Social Media like facebook, twitter, vk and other sites that...

75
gsocanalyzer
gsocanalyzer Sparsh1212 JavaScript

A blazingly fast tool to analyze all the selected organizations in Google Summer of Code in the form of graphical analytics.

75
outscraper-python
outscraper-python outscraper Python

The library provides convenient access to the Outscraper API from applications written in the Python language. Allows using Outscraper's services from...

75
webdext
webdext seagatesoft HTML

Intelligent Web Data Extractor

74
venom
venom PreferredAI Java

Your preferred open source focused crawler for the deep web.

74
chegg-scraper
chegg-scraper ThreeGiantNoobs HTML

Download Chegg homework-help questions to self-sufficient HTML files

74
copycat
copycat gidlov PHP

A PHP Scraping Class

73
Captcha-Tools
Captcha-Tools Matthew17-21 Go

All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha, Anticaptcha, and Capsolver API's!

73
docudigger
docudigger Disane87 TypeScript

Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

72
api-client
api-client online-judge-tools Python

API client to develop tools for competitive programming

71
Instagram-downloader
Instagram-downloader fernandod1 Python

Instagram user's photos and videos downloader. Download all media files from any username. Working 2022!

71
linkedin-scrapper
linkedin-scrapper info3g Python

LinkedIn scrapper is advanced search result scrapper script build with python selenium and beautifulsoup modules to find all people of different profi...

71
undetected_geckodriver
undetected_geckodriver bytexenon Python

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems

70
mangalivre-api
mangalivre-api nezzzumi JavaScript

API não-oficial do mangá livre feita com Node.js e Express.js.

70
SourceScraper
SourceScraper OpenByteDev TypeScript

Simple library which helps you to retrieve the source of various video streaming sites.

69
rebrowser-bot-detector
rebrowser-bot-detector rebrowser JavaScript

Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.

68
instagram-without-api-node
instagram-without-api-node orsifrancesco JavaScript

A simple Node.js code to get unlimited instagram public pictures by every user without api, without credentials.

68
TikScraperPHP
TikScraperPHP pablouser1 JavaScript

Wrapper for TikTok API

68
moneyman
moneyman daniel-hauser TypeScript

Automatically save transactions from all major Israeli banks and credit card companies, using GitHub actions (or a self hosted docker image)

67