5 Forks
8 Stars
8 Watchers

leechcrawler

Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

How to download and setup leechcrawler

Open terminal and run command
git clone https://github.com/DFKI/leechcrawler.git
git clone is used to create a copy or clone of leechcrawler repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with leechcrawler https://github.com/DFKI/leechcrawler/archive/master.zip

Or simply clone leechcrawler with SSH
[email protected]:DFKI/leechcrawler.git

If you have some problems with leechcrawler

You may open issue on leechcrawler support forum (system) here: https://github.com/DFKI/leechcrawler/issues