wit

wit

google-research-datasets

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

1.1k Stars
46 Forks
1.1k Watchers
other License
100 SrcLog Score
Cost to Build
$261.6K
Market Value
$794.3K

Growth over time

8 data points  ·  2021-08-01 → 2026-04-01
Stars Forks Watchers
💬

How do you feel about this project?

Ask AI about wit

Question copied to clipboard

What is the google-research-datasets/wit GitHub project? Description: "WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.". Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone wit

Clone via HTTPS

git clone https://github.com/google-research-datasets/wit.git

Clone via SSH

[email protected]:google-research-datasets/wit.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the wit issue tracker:

Open GitHub Issues