AToMiC (v0.1) Dataset Released
AToMiC (v0.1) Dataset is avaliable on HuggingFace Hub
See about or our white paper for more details about the task.
PurposePermalink
Multimedia retrieval evaluation and tool developement
Dataset descriptionsPermalink
split | # Texts | # Images | # Qrels |
---|---|---|---|
Training | 5,030,748 | 3,723,512 | 5,030,748 |
Validation | 38,859 | 30,365 | 38,859 |
Test | 30,938 | 20,732 | 30,938 |
Total | 5,100,545 | 3,774,609 | 5,100,545 |
- Format:
- Texts: parquet
- Images parquet with embedded images
- Qrels: space separated TREC Qrel format
- Source:
- Image–Text tuples (Qrels) from WIT
- Images from Wikimedia
- Language: English
RequirementsPermalink
Code snippets:Permalink
from datasets import load_dataset
dataset = load_dataset(
"TREC-AToMiC/AToMiC-Images-v0.1",
split='train'
)
print(dataset)
Other processing usages, see HuggingFace Datasets usage