745 Forks
2428 Stars
2428 Watchers

heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

How to download and setup heritrix3

Open terminal and run command
git clone https://github.com/internetarchive/heritrix3.git
git clone is used to create a copy or clone of heritrix3 repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with heritrix3 https://github.com/internetarchive/heritrix3/archive/master.zip

Or simply clone heritrix3 with SSH
[email protected]:internetarchive/heritrix3.git

If you have some problems with heritrix3

You may open issue on heritrix3 support forum (system) here: https://github.com/internetarchive/heritrix3/issues