Token Classification
Labeling words/subwords as "biased," or deeper labels like parts-of-speech.
Last updated
Labeling words/subwords as "biased," or deeper labels like parts-of-speech.
Last updated
Named-entity recognition (NER) is a cornerstone of NLP tools. NER enables token-level label classification (i.e. of words or subwords).
In the context of bias classification, this could be used to identify the "biased words" in a text sequence, offering a different view and granularity than sequence level classification.
Similarly to sequence classification tasks, we can finetune pre-trained NLP models like BERT to preform token classification. BERT processes the text sequence to create contextual representations for each token. Then, instead of pooling the representations and making a classification on the whole sentence, we make a classification of each token's representation. This can be done with the same type of classification head and activation functions, for multi-class or multi-label labels.
Note: NER formatting commonly follows B/I/O format (i.e. Beginning, Inside, Outside) to describe the boundaries of an entity. Entities may span multiple tokens, so we use B- tags for the beginning of the entity, and I- tags for consecutive tokens inside of the entity.
Similar to how UnBIAS's classifier was a refresh of the Dbias architecture, UnBIAS NER is a refresh of another prominent paper: Nbias. Both UnBIAS and Nbias allow for word-level label prediction of bias, enabling deeper insights into which words might be contributing the most to bias classifications.
Base Model: bert-base-uncased
Dataset: BEADs (3.67M rows)
3.67M rows | 2024
The BEADs corpus was gathered from the datasets: MBIC, Hyperpartisan news, Toxic comment classification, Jigsaw Unintended Bias, Age Bias, Multi-dimensional news (Ukraine), Social biases.
It was annotated by humans, then with semi-supervised learning, and finally human verified.
It's one of the largest and most up-to-date datasets for bias and toxicity classification, though it's currently private so you'll need to request access through HuggingFace.
🤗Hugging Face Dataset (request access)
📑 Contents
Fields | Description |
---|---|
| The sentence or sentence fragment. |
| Descriptive category of the text. |
| A compilation of words regarded as biased. |
| Specific sub-topic within the main content. |
| Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased, and neutral. |
| Indicates the presence (True) or absence (False) of toxicity. |
| Mention of any identity based on words match. |
While BEADs doesn't have a binary label for bias, the ternary labels (e.g. neutral, slightly biased, and highly biased) of the label field can categorized into biased (1), or unbiased (0). Additionally, the toxicity field contains binary labels.
📄 Research Paper
Train your own multi-label model: 💻 fairlyAspects Training Notebook
BERT (and other encoder models) process an input sequence into a encoding sequence as shown in the figure above, where self-attention heads encode the contextual words' meaning into each token representation.
These encodings are the foundation of many NLP tasks, and it's common (in BERT sequence classification) to then classify the CLS encoding into the desired classes (e.g. Neutral, Slightly Biased, Highly Biased).
The CLS token (pooler_output) is a built in pooling mechanism, but you can also use your own pooling mechanism (e.g. averaging all the representations for a mean-pooled representation).
bert-base-uncased
has 768 output features (for each token) and we can pass the CLS token into a (768 -> n) dense layer for multi-class or multi-label classification (where "n" is the number of classes).
The activation function used (e.g. softmax for multi-class, sigmoid for multi-label, etc.) turn the output logits for each of those classes, into a probability for each one.
Data engineers will usually set a threshold where the probability gets counted as a presence (can be ubiquitous or individually calcuated for each class).
When evaluating models' performance at binary classification, you should try to understand the way positive (biased), negative (neutral) fall into the categories: correct (true) predictions, and incorrect (false) predictions.
Your individual requirements will guide your interpretation (e.g. maybe you REALLY want to avoid false positives).
Confusion Matrix: Used to visualize the levels of correct and incorrect classifications made, the goal
Precision:
Recall:
F1 Score: