130 Forks
1206 Stars
1206 Watchers

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

How to download and setup trafilatura

Open terminal and run command
git clone https://github.com/adbar/trafilatura.git
git clone is used to create a copy or clone of trafilatura repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with trafilatura https://github.com/adbar/trafilatura/archive/master.zip

Or simply clone trafilatura with SSH
[email protected]:adbar/trafilatura.git

If you have some problems with trafilatura

You may open issue on trafilatura support forum (system) here: https://github.com/adbar/trafilatura/issues