Most popular crawling repositories and open source projects

puppet-master saasify-sh TypeScript

Puppeteer as a service hosted on Saasify.

28 6 28

amharic_spell_corrector yididiyan Python

Amharic Spelling Corrector based on SymSpell - Spelling corrector which is 1 million times faster through Symmetric Delete spelling correction algori...

27 12 27

translators krtk-dev TypeScript

🌐 Comparison of Google, Papago, and Kakao Translator

27 6 27

crawlbase-python crawlbase Python

Fast python library for the Crawlbase API

25 2 25

scrapy-requests rafyzg Python

Scrapy middleware to handle javascript pages using requests-html

25 1 25

scraper capturr TypeScript

All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.

25 4 25

crawlkit crawlkit JavaScript

A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

24 7 24

popular_restaurants_from_officials jy617lee Jupyter Notebook

서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트

24 8 24

SentimentPoliticalCompass JulianMar11 Jupyter Notebook

framework to analyze newspapers with respect to their political conviction using entity sentiments of party representatives.

24 3 24

arxiv2text dsdanielpark Jupyter Notebook

Converting PDF files to text, mainly with a focus on arXiv papers.

24 2 24

crawl-original-google-images thaoshibe Python

python scripts for crawling original image from Google Images

24 3 24

zcrawl zcrawl Go

An open source web crawling platform

23 4 23

trend-monitoring thisishoon Python

실시간 트렌드 데이터 분석/모니터링 시스템 tremo

23 4 23

Mimo-Crawler NikosRig JavaScript

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

23 2 23

proxycrawl-node crawlbase JavaScript

ProxyCrawl Node library for scraping and crawling

23 5 23

udemy-crawler petehouston JavaScript

Crawling Udemy course info and save into JSON format.

23 7 23

PyCarGr Florents-Tselai Python

PyCarGr - Unofficial car.gr API

23 14 23

app-crawler maguowei Python

crawling App by uiautomator2 & mitmproxy

23 9 23

DCinsideAlarm aldlfkahs Python

DC인사이드, 아카라이브 새글 알림 프로그램

23 7 23

DDMKL ByungjunKim Jupyter Notebook

한국 현대문학 박사학위 논문 서지 데이터 분석

23 5 23

GlassFrog 4xx404 Python

Keyword Search & Information Gathering Tool

23 4 23

ragno fukamachi Common Lisp

Common Lisp Web crawling library based on Psychiq.

22 2 22

product-integrations oxylabs PHP

Code examples and general information

22 10 22

crawler mediamonks PHP

Crawl your own website with various clients for SEO and indexing purposes.

21 4 21

SlackWebhooksGithubCrawler Gruppio JavaScript

Search for Slack Webhooks token publicly exposed on Github

21 1 21

crawling-framework tokenmill Java

Easily crawl news portals or blog sites using Storm Crawler.

21 4 21

html-article-extractor woojubb JavaScript

A web page content extractor

21 1 21

gakido HappyHackingSpace Python

gakido (餓鬼道) - the hungry ghost

21 1 21

proxycrawl-php crawlbase PHP

ProxyCrawl PHP library for scraping and crawling websites

21 5 21

the-seinfeld-chronicles 4m4n5 Jupyter Notebook

A dataset for textual analysis on arguably the best written comedy television show ever.

21 2 21

mida teamnsrg Go

MIDA: A Tool for Measuring the Internet

20 4 20

rsl-editor onurkanbakirci TypeScript

The open content licensing editor for the AI-first Internet. Easily create, edit, and manage your RSL (Really Simple Licensing) documents.

20 1 20

path-finder-rl VMS-Solutions Jupyter Notebook

Method For Establishing Database For Global Value Chain For Parts Procurement

20 13 20

scrapyteer miroshnikov TypeScript

Web crawling & scraping framework for Node.js on top of headless Chrome browser

20 1 20

fastcrawler fast-crawler

Modern, fast (high-performance) asynchronous scraping framework based on standard Python type hints and Pydantic.

20 2 20

afreecatv-chat-crawler cha2hyun Python

⚡️ 웹소켓을 이용한 아프리카TV 실시간 채팅 크롤링

20 3 20

xXx___dead___xXx dumblr JavaScript

b̶̡̪̬͒l̸̰̗̝̀ỏ̷̡̩g̴͇̑g̶̲̱̽͐i̵̹͗n̶̤̥͂̅̆g̴̮̾̅͜ ̷̧͎͆i̷̛͒͜͠n̸̥̺͒ ̶͚͚͊̿͜t̸̺͙̭̆̊̈́ḧ̶̟́̐e̸̱͔̟̓̓͝ ̶̨͔̾͛̑d̵̥̣̏ȧ̷̼̊r̷̰̝̥̅̌͝k̵̟̥̞̉̍͛

19 1 19

scrapy-fieldstats stummjr Python

A Scrapy extension to log items coverage when the spider shuts down

19 4 19

web-search-engine-UIC mirkomantovani Python

CS 582 Information Retrieval at University of Illinois at Chicago. Multithreaded crawling of UIC domain, inverted index, page rank, SEO with Context P...

19 4 19

mobile-de-car-data-collector robertciotoiu Java

Crawl, scrape and persist Mobile.de car listings data in a smart & responsible way

19 3 19

abx-spec-behaviors ArchiveBox JavaScript

🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser environments, p...

19 0 19

old_ver_bot sinramyeon Python

파이썬 슬랙 크롤링 봇입니다. It's slack bot made by python+flask+bs4. version of go below

18 7 18

XML-Parser ElyaConrad JavaScript

A Node.js XML DOM, Parser & Stringifier.

18 7 18

FundCrawler SivanLaai Python

天天基金爬虫，抓取市面上所有基金信息\基金净值\基金成分\基金公司\基金经理

18 6 18

scrapy-mcp-server scrapoxy

MCP server that enables self-healing automatic repair of Scrapy spiders. When websites change, your scrapers fix themselves.

17 1 17

scrapy-scraper ivan-sincek Python

Web crawler and scraper based on Scrapy and Playwright's headless browser.

17 7 17

deephotel gkzz Python

scraping TripAdvisor, Booking.com with Scrapy

17 10 17

webscrape-tutorial octobug1 Python

A basic tutorial to web scraping using python for beginners

17 0 17

CSCI572-Information_Retrieval_And_Web_Search_Engines Keerthivasan13 Java

Search Engine projects

17 17 17

Awesome-Web-Scraping luminati-io

A list of libraries, tools, and APIs for web scraping and data processing. Find everything you need for extracting, managing, and processing data from...

17 6 17

crawling

Repositories (1350)