AToMiC (v0.1) Dataset Released

less than 1 minute read

AToMiC (v0.1) Dataset is avaliable on HuggingFace Hub

See about or our white paper for more details about the task.

Purpose

Multimedia retrieval evaluation and tool developement

Dataset descriptions

split	# Texts	# Images	# Qrels
Training	5,030,748	3,723,512	5,030,748
Validation	38,859	30,365	38,859
Test	30,938	20,732	30,938
Total	5,100,545	3,774,609	5,100,545

Format:
- Texts: parquet
- Images parquet with embedded images
- Qrels: space separated TREC Qrel format
Source:
- Image–Text tuples (Qrels) from WIT
- Images from Wikimedia
Language: English

Requirements

HuggingFace Datasets >= 2.6.0

Code snippets:

from datasets import load_dataset

dataset = load_dataset(
		"TREC-AToMiC/AToMiC-Images-v0.1",
		split='train'
	  )
print(dataset)

Other processing usages, see HuggingFace Datasets usage

Share on

Twitter Facebook LinkedIn

🆕 TREC 2024 AToMiC Track Guidelines

5 minute read

Hey, look who’s back? Welcome to the TREC 2024 AToMiC (Authoring Tools for Multimedia Content) track. This page is your go-to for all the essentials: importa...

TREC 2023 AToMiC - Deadline Extension

less than 1 minute read

Dear Time-Warriors,

TREC 2023 AToMiC - Test queries

1 minute read

We are pleased to announce the release of the test topics for the TREC-AToMiC task. These topics have been carefully selected from the AToMiC text collectio...

TREC 2023 AToMiC - Development queries

3 minute read

Release of the development topics for the TREC-AToMiC task. These topics are an addition on top of the validation set of AToMiC and aim to be closer to what ...

AToMiC