AToMiC (v0.1) Dataset Released

less than 1 minute read

AToMiC (v0.1) Dataset is avaliable on HuggingFace Hub

See about or our white paper for more details about the task.

Purpose

Multimedia retrieval evaluation and tool developement

Dataset descriptions

split	# Texts	# Images	# Qrels
Training	5,030,748	3,723,512	5,030,748
Validation	38,859	30,365	38,859
Test	30,938	20,732	30,938
Total	5,100,545	3,774,609	5,100,545

Format:
- Texts: parquet
- Images parquet with embedded images
- Qrels: space separated TREC Qrel format
Source:
- Image–Text tuples (Qrels) from WIT
- Images from Wikimedia
Language: English

Requirements

HuggingFace Datasets >= 2.6.0

Code snippets:

from datasets import load_dataset

dataset = load_dataset(
		"TREC-AToMiC/AToMiC-Images-v0.1",
		split='train'
	  )
print(dataset)

Other processing usages, see HuggingFace Datasets usage

Share on

Twitter Facebook LinkedIn

🆕 TREC 2024 AToMiC Test Topics and Deadline Extension

less than 1 minute read

The 2024 test topics are now available! The new submission deadline is August 28, 2024 (AoE).

🆕 TREC 2024 AToMiC Track Guidelines

5 minute read

Hey, look who’s back? Welcome to the TREC 2024 AToMiC (Authoring Tools for Multimedia Content) track. This page is your go-to for all the essentials: importa...

TREC 2023 AToMiC - Deadline Extension

less than 1 minute read

Dear Time-Warriors,

TREC 2023 AToMiC - Test queries

1 minute read

We are pleased to announce the release of the test topics for the TREC-AToMiC task. These topics have been carefully selected from the AToMiC text collectio...

AToMiC

Purpose

Dataset descriptions

Requirements

Code snippets:

Share on

You may also enjoy

🆕 TREC 2024 AToMiC Test Topics and Deadline Extension

🆕 TREC 2024 AToMiC Track Guidelines

TREC 2023 AToMiC - Deadline Extension

TREC 2023 AToMiC - Test queries