2 repositories on SrcLog
A scalable, mature and versatile web crawler based on Apache Storm
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.