News Media Bias Plus (2024)
Multi-modal image and text bias classification dataset (The Vector Institute)
90k rows | 2024 | The Vector Institute
The dataset includes around 90,000 news articles, curated from a broad spectrum of reliable sources, including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.
NMB+ has images, and multi-modal labels for the text + image pair of each news article.
📑 Contents
Field | Description |
---|---|
| Unique identifier for each news item. Each |
| Publisher of the news article. |
| Headline of the news article. |
| Full text content of the news article. |
| Description of the image paired with the article. |
| File path of the image associated with the article. |
| Publication date of the news article. |
| Original URL of the news article. |
| Canonical URL of the news article, if different from the source URL. |
| Categories assigned to the article. |
| Confidence scores for the assigned categories. |
| Annotation for the textual content, indicating:
|
| Annotation for the combined text snippet (first paragraph of the news story) and image content, assessing:
|
🤗HuggingFace Dataset (Request access)
Website (Official Docs)
📰 Blog Post
Last updated