🔬
The Fair-ly Project
  • Welcome to RumorMill
    • Recent Papers Timeline
  • Fair-ly Toolkit
    • Chrome Extension
    • Python Package
      • TextAnalyzer Pipeline
      • MultimodalAnalyzer Pipeline
    • Hosted APIs
  • Research
    • Sequence Classification
      • Binary
      • Multi-class
    • Named-Entity Recognition
      • Token Classification
    • Multimodal
      • Image + Text Pair Classification
    • Datasets
      • News Media Bias Plus (2024)
      • BEADs Dataset (2024)
      • GUS Dataset (2024)
      • BABE Dataset (2022)
  • Learn
    • Blog Posts
      • Training a model for multi-label NER
      • Binary Classification w/ BERT
  • Join the Project
    • To Do List
    • Discord Server
    • GitHub Repo
  • Misc
    • Privacy Policy
Powered by GitBook
On this page
Edit on GitHub
  1. Research
  2. Datasets

News Media Bias Plus (2024)

Multi-modal image and text bias classification dataset (The Vector Institute)

PreviousDatasetsNextBEADs Dataset (2024)

Last updated 7 months ago

90k rows | 2024 |

The dataset includes around 90,000 news articles, curated from a broad spectrum of , including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.

NMB+ has images, and multi-modal labels for the text + image pair of each news article.

📑 Contents

Field
Description

unique_id

Unique identifier for each news item. Each unique_id is associated with the image (top image) for the same news article.

outlet

Publisher of the news article.

headline

Headline of the news article.

article_text

Full text content of the news article.

image_description

Description of the image paired with the article.

image

File path of the image associated with the article.

date_published

Publication date of the news article.

source_url

Original URL of the news article.

canonical_link

Canonical URL of the news article, if different from the source URL.

new_categories

Categories assigned to the article.

news_categories_confidence_scores

Confidence scores for the assigned categories.

text_label

Annotation for the textual content, indicating:

'Likely'or 'Unlikely'to be disinformation.

multimodal_label

Annotation for the combined text snippet (first paragraph of the news story) and image content, assessing:

'Likely'or 'Unlikely'to be disinformation.

🤗HuggingFace Dataset (Request access)

Website (Official Docs)

📰 Blog Post

The Vector Institute
reliable
sources
vector-institute/newsmediabias-plus · Datasets at Hugging Facehuggingface
Logo
News Media Bias Plus
New multimodal dataset will help in the development of ethical AI systems - Vector Institute for Artificial IntelligenceVector Institute for Artificial Intelligence
Logo
Logo