Topic

scraping

Repositories (1766)

mechaml
mechaml yannham OCaml

OCaml functional web scraping library

92
outscraper-python
outscraper-python outscraper Python

The library provides convenient access to the Outscraper API from applications written in the Python language. Allows using Outscraper's services from...

92
feedsearch-crawler
feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

92
apify-client-python
apify-client-python apify Python

Apify API client for Python

91
newser
newser lnenad Go

Newser is a simple utility to generate a pdf with you favorite news articles

91
Pinterest-infinite-crawler
Pinterest-infinite-crawler mirusu400 Python

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

90
ARGUS
ARGUS datawizard1337 Python

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different website...

89
html-table-extractor
html-table-extractor yuanxu-li Python

extract data from html table

88
feedbridge
feedbridge dewey Go

Plugin based RSS feed generator for sites that don't offer any. Serves RSS, Atom and JSON Feeds.

88
actor-whitepaper
actor-whitepaper apify Python

This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon....

88
top-github-scraper
top-github-scraper khuyentran1401 HTML

Scape top GitHub repositories and users based on keywords

88
map-email-scraper
map-email-scraper MickeyUK JavaScript

A open source tool for collating publically available contact information for businesses.

88
shopify-spy
shopify-spy ndgigliotti Python

Extract structured data from Shopify websites.

88
WebScraper
WebScraper MLArtist Python

Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blo...

88
billy
billy openstates Python

legacy backend for Open States

87
amazon_scraper
amazon_scraper ScrapingAnt JavaScript

Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt

87
pydork
pydork blacknon Python

Scraping and listing text and image searches on Google, Bing, DuckDuckGo, Baidu, Yahoo japan.

87
dom_query
dom_query niklak Rust

A Flexible Rust Crate for DOM Querying and Manipulation

86
playwright
playwright playwright-php PHP

Playwright PHP library for browser automation: navigation, E2E tests, assertions, screenshots, and so much more!

85
pypatent
pypatent daneads Python

Search for and retrieve US Patent and Trademark Office Patent Data

85
Whatsapp-Net
Whatsapp-Net OfirKP JavaScript

Generate a network graph of connections from your WhatsApp groups data

84
lightweight-tweet-scraper
lightweight-tweet-scraper nermalcat69 JavaScript

Scrape Tweets while Scrolling

82
introWebScraping
introWebScraping ksahin Java

Code exemple for my blog posts

82
undetected_geckodriver
undetected_geckodriver bytexenon Python

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems

81
Solana_Twitter_Token_NFT_Sniper_Bot
Solana_Twitter_Token_NFT_Sniper_Bot solagent99 TypeScript

🔥Solana Token/NFT snipping Bot w/ twiter - Raydium, Pumpfun Snipping Bot w/ scrapping Twitter.

81
Porn-Novel-Scraper
Porn-Novel-Scraper ystemsrx Python

A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本

81
google-covid19-mobility-reports
google-covid19-mobility-reports vitorbaptista HTML

Data extraction of Google's COVID-19 Mobility Reports

80
Outlook-account-creator
Outlook-account-creator Skuxblan Python

Python tool that automatically create outlook account with auto-captcha

80
serpapi-javascript
serpapi-javascript serpapi TypeScript

Scrape and parse search engine results using SerpApi.

80
api-client
api-client online-judge-tools Python

API client to develop tools for competitive programming

80
linkedin-scrapper
linkedin-scrapper info3g Python

LinkedIn scrapper is advanced search result scrapper script build with python selenium and beautifulsoup modules to find all people of different profi...

80
linkedin-scraper
linkedin-scraper fabriziomiano Python

Tool to scrape linkedin

79
UltimateTab
UltimateTab BenoitBellegarde TypeScript

Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.

79
requests-random-user-agent
requests-random-user-agent DavidWittman Python

Configures the requests library to randomly select a desktop User-Agent

78
webforai
webforai inaridiy TypeScript

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

78
crypto-vision
crypto-vision nirholas TypeScript

The most comprehensive cryptocurrency API. Real-time prices, OHLCV, order books & market cap for 10,000+ tokens across 500+ exchanges. DeFi TVL, yield...

77
Captcha-Tools
Captcha-Tools Matthew17-21 Go

All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha, Anticaptcha, and Capsolver API's!

77
venom
venom PreferredAI Java

Your preferred open source focused crawler for the deep web.

76
linkpreview
linkpreview linkpreview JavaScript

Open Graph, Twitter Card, Oembed preview. Shows visual cards that mimic link previews in Social Media like facebook, twitter, vk and other sites that...

75
gsocanalyzer
gsocanalyzer Sparsh1212 JavaScript

A blazingly fast tool to analyze all the selected organizations in Google Summer of Code in the form of graphical analytics.

75
Pasta
Pasta Kr0ff Python

A PasteBin scrapper that doesnt rely on the PasteBin scrape API

75
Google-Patents-Scraper
Google-Patents-Scraper wenyalintw Python

Automatically download all PDF files of searching results & their patent families found on Google Patents.

75
Miyou
Miyou debsishu JavaScript

An anime discovery, streaming site made with React.js. It uses AniList API and video data from GogoAnime. No ads and no VPN required.

75
webdext
webdext seagatesoft HTML

Intelligent Web Data Extractor

74
wajik-anime-api
wajik-anime-api wajik45 TypeScript

REST API streaming dan download Anime subtitle Indonesia | sub Indo

74
selectorlib
selectorlib scrapehero HTML

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

74
mangahook-api
mangahook-api kiraaziz JavaScript

free open source manga api , including fetch all manga , single manga also support search . beside od next js demo .

74
Website-Crawler
Website-Crawler pc8544 Java

Extract data from websites in LLM ready JSON or CSV format. Crawl or Scrape entire website with Website Crawler

74
Instagram-downloader
Instagram-downloader fernandod1 Python

Instagram user's photos and videos downloader. Download all media files from any username. Working 2022!

74
chegg-scraper
chegg-scraper ThreeGiantNoobs HTML

Download Chegg homework-help questions to self-sufficient HTML files

74