Data collection (scraping+dynamic crawling) for domain "Computer Scientists" from 13 websites include Wikipedia, Google Scholar, DBLP etc and merging them to create a high quality tabular dataset.
What is the anmolagarwal999/Domain-specific-data-collection-from-structured-and-unstructured-sources GitHub project? Description: "Data collection (scraping+dynamic crawling) for domain "Computer Scientists" from 13 websites include Wikipedia, Google Scholar, DBLP etc and merging them to create a high quality tabular dataset.". Written in Jupyter Notebook. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard — paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the Domain-specific-data-collection-from-structured-and-unstructured-sources issue tracker:
Open GitHub Issues