Fairness in transfer learning for natural language processing
dc.contributor.advisor
Lopez, Adam
dc.contributor.advisor
Ross, Bjorn
dc.contributor.author
Goldfarb-Tarrant, Seraphina
dc.date.accessioned
2024-06-06T15:35:15Z
dc.date.available
2024-06-06T15:35:15Z
dc.date.issued
2024-06-06
dc.description.abstract
Natural Language Processing (NLP) systems have come to permeate so many areas
of daily life that it is difficult to live a day without having one or many experiences
mediated by an NLP system. These systems bring with them many promises: more
accessible information in more languages, real-time content moderation, more datadriven
decision making, intuitive access to information via Question Answering and
chat interfaces. But there is a dark side to these promises, for the past decade of
research has shown that NLP systems can contain social biases and deploying them can
incur serious social costs. Each of these promises has been found to have unintended
consequences: racially charged errors and rampant gender stereotyping in language
translation, censorship of minority voices and dialects, Human Resource systems that
discriminate based on demographic data, a proliferation of toxic generated text and
misinformation, and many subtler issues.
Yet despite these consequences, and the proliferation of bias research attempting to
correct them, NLP systems have not improved very much. There are a few reasons
for this. First, measuring bias is difficult; there are not standardised methods of measurement,
and much research relies on one-off methods that are often insufficiently
careful and thoroughly tested. Thus many works have contradictory results that cannot
be reconciled, because of minor differences or assumptions in their metrics. Without
thorough testing, these metrics can even mislead and give the illusion of progress.
Second, much research adopts an overly simplistic view of the causes and mediators
of bias in a system. NLP systems have multiple components and stages of training,
and many works test fairness at only one stage. They do not study how different parts
of the system interact, and how fairness changes during this process. So it is unclear
whether these isolated results will hold in the full complex system. Here, we address
both of these shortcomings. We conduct a detailed analysis of fairness metrics applied
to upstream language models (models that will be used in a downstream task in transfer
learning). We find that a) the most commonly used upstream fairness metric is not predictive
of downstream fairness, such that it should not be used but that b) information
theoretic probing is a good alternative to these existing fairness metrics, as we find it is
both predictive of downstream bias and robust to different modelling choices. We then
use our findings to track how unfairness, having entered a system, persists and travels
throughout it. We track how fairness issues travel between tasks (from language modelling
to classification) in monolingual transfer learning, and between languages, in
multilingual transfer learning. We find that multilingual transfer learning often exacerbates
fairness problems and should be used with care, whereas monolingual transfer
learning generally improves fairness. Finally, we track how fairness travels between
source documents and retrieved answers to questions, in fact-based generative systems.
Here we find that, though retrieval systems strongly represent demographic data such
as gender, bias in retrieval question answering benchmarks does not come from the
model representations, but from the queries or the corpora. We reach all of our findings
only by looking at the entire transfer learning system as a whole, and we hope
that this encourages other researchers to do the same. We hope that our results can
guide future fairness research to be more consistent between works, better predictive
of real world fairness outcomes, and better able to prevent unfairness from propagating
between different parts of a system.
en
dc.identifier.uri
https://hdl.handle.net/1842/41857
dc.identifier.uri
http://dx.doi.org/10.7488/era/4580
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Goldfarb-Tarrant, S., Ungless, E., Balkir, E., and Blodgett, S. L. (2023). This prompt is measuring <mask>: evaluating bias evaluation in language models. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 2209–2225, Toronto, Canada. Association for Computational Linguistics.
en
dc.relation.hasversion
Seraphina Goldfarb-Tarrant, Adam Lopez, Roi Blanco, and Diego Marcheggiani. 2023. Bias beyond English: Counterfactual tests for bias in sentiment analysis in four languages. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4458– 4468, Toronto, Canada. Association for Computational Linguistics.
en
dc.relation.hasversion
Seraphina Goldfarb-Tarrant, Rebecca Marchant, Ricardo Muñoz Sánchez, Mugdha Pandya, and Adam Lopez. 2021. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1926–1940, Online. Association for Computational Linguistics.
en
dc.subject
Natural Language Processing (NLP)
en
dc.subject
multilingual transfer learning
en
dc.subject
fairness problems
en
dc.title
Fairness in transfer learning for natural language processing
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Goldfarb-TarrantS_2024.pdf
- Size:
- 4.92 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

