694 Forks
1650 Stars
1650 Watchers

tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

How to download and setup tika

Open terminal and run command
git clone https://github.com/apache/tika.git
git clone is used to create a copy or clone of tika repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with tika https://github.com/apache/tika/archive/master.zip

Or simply clone tika with SSH
[email protected]:apache/tika.git

If you have some problems with tika

You may open issue on tika support forum (system) here: https://github.com/apache/tika/issues