Image + Text Pair Classification
Classifying the use of an image with a text sequence as biased/fair.
Last updated
Classifying the use of an image with a text sequence as biased/fair.
Last updated
Though images are often used (for better, or for worse) in articles or social media posts, incorporating them to the bias or fake news classification pipeline is still relatively unexplored in research.
Image/text pair classification relies on the same type of text embeddings used in sequence classification and NER (created by a text encoder like BERT). This time, we also process images with an image encoder, then fuse the text and image encodings together, for classification tasks such as binary classification.
90k rows | 2024
The dataset includes around 90,000 news articles, curated from a broad spectrum of reliable sources, including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.
NMB+ has images, and multi-modal labels for the text + image pair of each news article.
Fine-tune Llama 3.2 Vision Instruct QLORA for image/text classification: 💻Notebook
Train your own VLM for bias detection: 💻(4) Notebooks
BERT (or other text encoder models) processes a text sequence into a encoding sequence, where self-attention heads encode the contextual words' meaning into each token representation.
ResNet (or other image encoder models) processes an image into a convolutional representation.
We combine/pool the text and image representations into one set of features that we can classify. There are many techniques, such as:
Concatenation: Plugging the representations together, one after another.
Dot product alignment: Using the dot product of the representations as the representation.
Fusion layer: a linear layer(s) to process the representations before classification.
The aligned embeddings are passed into a classification head, with an output logit, that is activated (typically with a sigmoid or softmax function), for a probability that falls between 0-1.
A threshold is sometimes applied to the output (e.g. probability > 0.5 is "Biased").
Metrics:
When evaluating models' performance at binary classification, you should try to understand the way positive (biased), negative (neutral) fall into the categories: correct (true) predictions, and incorrect (false) predictions.
Your individual requirements will guide your interpretation (e.g. maybe you REALLY want to avoid false positives).
Confusion Matrix: Used to visualize the levels of correct and incorrect classifications made, the goal
Field | Description |
---|---|
Precision:
Recall:
F1 Score:
unique_id
Unique identifier for each news item. Each unique_id
is associated with the image (top image) for the same news article.
outlet
Publisher of the news article.
headline
Headline of the news article.
article_text
Full text content of the news article.
image_description
Description of the image paired with the article.
image
File path of the image associated with the article.
date_published
Publication date of the news article.
source_url
Original URL of the news article.
canonical_link
Canonical URL of the news article, if different from the source URL.
new_categories
Categories assigned to the article.
news_categories_confidence_scores
Confidence scores for the assigned categories.
text_label
Annotation for the textual content, indicating:
'Likely'
or 'Unlikely'
to be disinformation.
multimodal_label
Annotation for the combined text snippet (first paragraph of the news story) and image content, assessing:
'Likely'
or 'Unlikely'
to be disinformation.