370 Forks
5496 Stars
5496 Watchers

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

How to download and setup crawlee-python

Open terminal and run command
git clone https://github.com/apify/crawlee-python.git
git clone is used to create a copy or clone of crawlee-python repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with crawlee-python https://github.com/apify/crawlee-python/archive/master.zip

Or simply clone crawlee-python with SSH
[email protected]:apify/crawlee-python.git

If you have some problems with crawlee-python

You may open issue on crawlee-python support forum (system) here: https://github.com/apify/crawlee-python/issues