Most popular crawling repositories and open source projects

trasparenzai.github.io TrasparenzAI JavaScript

Documentazione della piattaforma per l'analisi e la consultazione della trasparenza amministrativa delle Pubbliche Amministrazioni

4 1 4

order-metrics-data-automation ssaadh Ruby

OrderMetrics.io Automation for data from there to Google Sheets (spreadsheets). Mainly used for e-commerce Shopify, Facebook advertising, Google Adwor...

4 0 4

lebonscrap wbwlkr Python

LeBonScrap is a spider which collect data from Leboncoin.fr, crawl all the pagination links to scrap every ads of the list from one search result of t...

4 1 4

johnny-cache Sonictherocketman Python

A simple forward caching proxy. Useful for reducing the bandwidth of polling or crawling public sites.

4 1 4

Crawl-Google-Play MOHED1224 Python

Google Play crawler script using Python

4 0 4

bonobo-selenium python-bonobo Python

PRE-ALPHA - Write web crawlers using Bonobo

4 2 4

data-scraper complexorganizations Go

❤️ The data scraper for big data

4 5 4

simple-crawler hseghetti Shell

Simple crawler using apache nutch and elasticsearch

4 1 4

FirstSelenium BaseMax Python

Some sample codes for using selenium in Python just for fun.

4 0 4

Emotion-Regconition-Youtube ToVinhKhang Jupyter Notebook

Emotion Recognition for Vietnamese Social Media Text (Youtube Comments)

4 0 4

ya-local-graph esemi Python

Граф рок и метал исполнителей с Я.музыки

4 2 4

laravel-crawler ahmedmohamed1101140 PHP

use the app to scrap the product amount from souq amazon or jumia login and give it a try

4 0 4

web-crawler asanakoy C++

Web-Crawler for simple.wikipedia.org on C++

4 1 4

bitsky-builder bitskyai Shell

Build BitSky Desktop Application, Web Application, and Docker images

4 0 4

easy-puppeteer-crawling-boilerplate tsugitta Dockerfile

Simple boilerplate to start crawling with Puppeteer + TypeScript + DB(TypeORM) + Docker

4 2 4

FALL DevanshRaghav75 Python

A automated penetration testing tool

4 0 4

crawling-from-scratch ZenRows Python

Repository for the Mastering Web Scraping in Python: Crawling from Scratch blogpost with the final code.

4 1 4

estela-cli bitmakerla Python

estela Command Line Client 🕸

4 4 4

scrapy-source hideaki-kawahara

Sample code for scraping with Python Scrapy.

4 0 4

strainer internetarchive Go

Heritrix frontier files manipulation tool.

4 0 4

Naver-dictionary-crawler entiff Jupyter Notebook

Crawling Naver dictionary example

4 1 4

STUDY_Python Jiyeon1104 Jupyter Notebook

🎈Python 학습 내용을 올린 레파지토리입니다. 🎈

4 0 4

sce-domain-discovery nasa-jpl-memex Java

Domain Discovery for the Sparkler Crawl Environment

4 8 4

buscando-meu-carro FelipeGaleao Jupyter Notebook

O buscando-meu-carro é um repositório que contém um projeto Python que utiliza técnicas de scrapping para criar um Data Warehouse (DW) contendo inform...

4 0 4

Crawl-weather-data-in-cities-and-provinces-of-Vietnam charliehuynhorz Python

This project provides a simple Python script that crawls current weather data from Thời tiết 24h for all 63 provinces and cities of Vietnam. The data...

3 0 3

deadlink-checker-python arif98741 Python

A Python tool to crawl websites and check for broken/dead links with detailed reporting in both text and PDF formats.

3 0 3

dss_prjt_crawling jungryo Jupyter Notebook

맛집사이트와 지도 크롤링으로, 경로 내 중간지점의 맛집을 추천 알고리즘 구현 및 시각화한 크롤링 프로젝트

3 8 3

ticketseer occidere Kotlin

뮤지컬, 콘서트 등의 각종 티켓 정보 업데이트와 상영 현황 알림을 보내는 시스템

3 2 3

scraping-cnbcindonesia-api vnurhaqiqi Python

Indonesia news api by scraping from CNBC Indonesia

3 1 3

waybacksteroids LucasKatashi Go

waybacksteroids — Fast multi-domain Wayback Machine endpoint enumerator.

3 0 3

frog-cloud myawesomebike TypeScript

ScreamingFrog in Docker with an API

3 4 3

Search-It pradeep583 Python

A lightweight web search engine built using BM25 for keyword relevance, BERT embeddings for semantic similarity, and PageRank for link-based importanc...

3 1 3

SMART-SEARCH-ENGINE VETURISRIRAM Python

This repository includes implementation of an Intelligent Search Engine from scratch.

3 1 3

MissingSemester_Crawling hufslion9th Python

2021 HUFS Missing Semester : Crawling

3 0 3

PyWebCrawling MohanSha Python

Web scraping and automation using python

3 1 3

anjinmascanner arrester Python

anjinma scanner 1.0 version is [GUI] Web Scanner (URL, Connect, Header, Cookie, IP, Port, Directory, vulnerability, Crawling etc)

3 2 3

daily-menu wolfika JavaScript

Scraper utility tool to fetch daily menus.

3 5 3

Cheerio Decodo JavaScript

Cheerio.js proxy authentication example for Decodo

3 0 3

awesome-web-crawler ilovedevs

List of best web crawlers to extract data from the web. Find web crawling tools for different needs.

3 1 3

Github-Commits-Crawling EtzionR Python

Scraping all of the GitHub-commits dates of a given user

3 0 3

CourseraCrawler khanof89 Python

This python script crawls course title, ratings, description and instructors from coursera.org

3 1 3

persian-news-NLP aliamiri1380

a dataset for classifying persian news in 4 classes

3 0 3

price-extract capturr TypeScript

Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.

3 0 3

Nightmare Decodo JavaScript

Nightmare.js proxy authentication example for Decodo

3 0 3

crawl4ai-mcp-server amienbou121 Python

🕷️ Enable AI agents to scrape and crawl the web effortlessly with this lightweight Model Context Protocol server, integrating seamlessly into your wor...

3 0 3

aiocrawler sashgorokhov Python

WIP Asynchronous web scraping heavily inspired by scrapy

3 2 3

php-crawler buzz8year PHP

Deep crawling PHP server-client application (extendable, OOP, strategy/factory patterns, console-client, linux/windows, cron-friendly, vm/screen-frien...

3 1 3

store-gpt-scraper apify-projects TypeScript

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract...

3 2 3

SentiNews junhoKim-iib Python

뉴스 감성 분석 Django 프로젝트입니다.

3 0 3

Theater-Noti SeonHyungJo JavaScript

내가 보고싶은 영화는 이 상영관에서 언제 예매가 가능할까?

3 1 3

crawling

Repositories (1350)