News Media Bias Plus (2024)

Multi-modal image and text bias classification dataset (The Vector Institute)

90k rows | 2024 | The Vector Institute

The dataset includes around 90,000 news articles, curated from a broad spectrum of reliable sources, including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.

NMB+ has images, and multi-modal labels for the text + image pair of each news article.

📑 Contents

FieldDescription

unique_id

Unique identifier for each news item. Each unique_id is associated with the image (top image) for the same news article.

outlet

Publisher of the news article.

headline

Headline of the news article.

article_text

Full text content of the news article.

image_description

Description of the image paired with the article.

image

File path of the image associated with the article.

date_published

Publication date of the news article.

source_url

Original URL of the news article.

canonical_link

Canonical URL of the news article, if different from the source URL.

new_categories

Categories assigned to the article.

news_categories_confidence_scores

Confidence scores for the assigned categories.

text_label

Annotation for the textual content, indicating:

'Likely'or 'Unlikely'to be disinformation.

multimodal_label

Annotation for the combined text snippet (first paragraph of the news story) and image content, assessing:

'Likely'or 'Unlikely'to be disinformation.

🤗HuggingFace Dataset (Request access)

Website (Official Docs)

📰 Blog Post

Last updated