crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
How to download and setup crawlee-python
Open terminal and run command
git clone https://github.com/apify/crawlee-python.git
git clone is used to create a copy or clone of crawlee-python repositories.
You pass git clone a repository URL. it supports a few different network protocols and corresponding URL formats.
Also you may download zip file with crawlee-python https://github.com/apify/crawlee-python/archive/master.zip
Or simply clone crawlee-python with SSH
[email protected]:apify/crawlee-python.git
If you have some problems with crawlee-python
You may open issue on crawlee-python support forum (system) here: https://github.com/apify/crawlee-python/issuesSimilar to crawlee-python repositories
Here you may see crawlee-python alternatives and analogs
scrapy requests-html Sasila webmagic colly headless-chrome-crawler Embed artoo instagram-scraper django-dynamic-scraper scrapy-cluster Lulu newcrawler panther facebook_data_analyzer ImageScraper scrapple parsel nickjs jsoup-annotations jekyll Musoq goose-parser arachnid lambdasoup crawler geeksforgeeks.pdf scrapy-zyte-smartproxy sqrape comic-dl