Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
What is the DFKI/leechcrawler GitHub project? Description: "Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.". Written in Java. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard — paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the leechcrawler issue tracker:
Open GitHub Issues