News Media Bias Plus (2024)
Multi-modal image and text bias classification dataset (The Vector Institute)
90k rows | 2024 | The Vector Institute
The dataset includes around 90,000 news articles, curated from a broad spectrum of reliable sources, including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.
NMB+ has images, and multi-modal labels for the text + image pair of each news article.
📑 Contents
unique_id
Unique identifier for each news item. Each unique_id
is associated with the image (top image) for the same news article.
outlet
Publisher of the news article.
headline
Headline of the news article.
article_text
Full text content of the news article.
image_description
Description of the image paired with the article.
image
File path of the image associated with the article.
date_published
Publication date of the news article.
source_url
Original URL of the news article.
canonical_link
Canonical URL of the news article, if different from the source URL.
new_categories
Categories assigned to the article.
news_categories_confidence_scores
Confidence scores for the assigned categories.
text_label
Annotation for the textual content, indicating:
'Likely'
or 'Unlikely'
to be disinformation.
multimodal_label
Annotation for the combined text snippet (first paragraph of the news story) and image content, assessing:
'Likely'
or 'Unlikely'
to be disinformation.
🤗HuggingFace Dataset (Request access)
Website (Official Docs)
📰 Blog Post
Last updated