2 repositories on SrcLog
OCR, Archive, Index and Search: Implementation agnostic OCR framework.
datasets with text data for use in NLP, Text analysis, information extraction, ML research.