Finding the needle in the haystack: Fine-tuning transformers to classify protest events in a sea of news articles, with Bayesian uncertainty measures

Master thesis

Ghai, Chris (2020) Finding the needle in the haystack: Fine-tuning transformers to classify protest events in a sea of news articles, with Bayesian uncertainty measures. MA thesis, Department of Mathematics, University of Oslo, Oslo.

Read the thesis here

In this thesis, we build predictive natural language processing models to support peace and conflict researchers. We consider pre-trained transformer language models, and we fine-tune these to a dataset of news articles tagged with information about protest events. This news corpus provides facts about many different aspects of the articles, among other things; whether it contains a protest, the form of the protest, the target of the protest and the issue of the protest. The former is a binary classification task, and the latter are multiclass classification tasks. With several different tasks as our objective, we build transformer-based models and fine-tune them to solve the tasks separately and jointly. We explore many different architectures, regularisation techniques and data augmentation, and evaluate how they affect the final performance.

A problem with deep neural networks, however, is that they usually do not provide uncertainty estimates of its predictions. Yet, this is desirable and useful information to have; a user of the model can then consider how trustworthy each model prediction is, and make decisions with this knowledge. Thus, we explore how we can use dropout together with Monte Carlo integration to make predictions with uncertainty estimates.

For the binary classification task, we observe F2 scores of around 0.92 on an independent test set. This is much better than most non-transformer models, and proves that predictive models can be used to automatically detect relevant bodies of text. On the multiclass classification tasks, we achieve Matthews correlation coefficients ranging from 0.75 to 0.85 on the test set, depending on the task. These tasks are much harder to tune due to having many classes, but we observe that the resulting models are very capable of identifying useful events. We also evaluate the best models on ten different news articles picked from the Internet, and inspect their predictions with uncertainty estimates; observing that they seem to work well in practice.

An error has occurred. This application may no longer respond until reloaded. Reload 🗙