wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

nlp

View on GitHub Website

1.1k Stars

46 Forks

1.1k Watchers

other License

100 SrcLog Score

Cost to Build

$261.6K

Market Value

$794.3K

How is this calculated?

Growth over time

8 data points · 2021-08-01 → 2026-04-01

Stars Forks Watchers

💬

How do you feel about this project?

Ask AI about wit

Question copied to clipboard

What is the google-research-datasets/wit GitHub project? Description: "WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.". Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone wit

Clone via HTTPS

git clone https://github.com/google-research-datasets/wit.git

Clone via SSH

[email protected]:google-research-datasets/wit.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the wit issue tracker:

Open GitHub Issues

Similar to wit

lectures spaCy HanLP compromise gensim stanford-tensorflow-tutorials nltk awesome-nlp TextBlob ailearning CoreNLP ansj_seg rasa tensorflow_cookbook allennlp flashtext TagUI franc mycroft-core practical-pytorch text_classification nlp_tasks DeepPavlov snips-nlu Awesome-pytorch-list kcws Awesome-Chinese-NLP sentiment prose DeepLearn