Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation
dc.contributor.advisor
Alex, Beatrice
dc.contributor.advisor
Bach, Benjamin
dc.contributor.advisor
Terras, Melissa
dc.contributor.author
Havens, Lucy Joan
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.contributor.sponsor
School of Informatics Graduate School
en
dc.date.accessioned
2024-02-07T10:13:09Z
dc.date.available
2024-02-07T10:13:09Z
dc.date.issued
2024-02-07
dc.description.abstract
This thesis proposes a recalibration of Machine Learning for social biases to minimize harms from existing approaches and practices in the field. Prioritizing quality over quantity, accuracy over efficiency, representativeness over convenience, and situated thinking over universal thinking, the thesis demonstrates an alternative approach to creating Machine Learning models. Drawing on GLAM, the Humanities, the Social Sciences, and Design, the thesis focuses on understanding and communicating biases in a specific use case. 11,888 metadata descriptions from the University of Edinburgh Heritage Collections' Archives catalog were manually annotated for gender biases and text classification models were then trained on the resulting dataset of 55,260 annotations. Evaluations of the models' performance demonstrates that annotating gender biases can be automated; however, the subjectivity of bias as a concept complicates the generalizability of any one approach.
The contributions are: (1) an interdisciplinary and participatory Bias-Aware Methodology, (2) a Taxonomy of Gendered and Gender Biased Language, (3) data annotated for gender biased language, (4) gender biased text classification models, and (5) a human-centered approach to model evaluation. The contributions have implications for Machine Learning, demonstrating how bias is inherent to all data and models; more specifically for Natural Language Processing, providing an annotation taxonomy, annotated datasets and classification models for analyzing gender biased language at scale; for the Gallery, Library, Archives, and Museum sector, offering guidance to institutions seeking to reconcile with histories of marginalizing communities through their documentation practices; and for historians, who utilize cultural heritage documentation to study and interpret the past. Through a real-world application of the Bias-Aware Methodology in a case study, the thesis illustrates the need to shift away from removing social biases and towards acknowledging them, creating data and models that surface the uncertainty and multiplicity characteristic of human societies.
en
dc.identifier.uri
https://hdl.handle.net/1842/41420
dc.identifier.uri
http://dx.doi.org/10.7488/era/4154
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Ames, S., & Havens, L. (2022). Exploring National Library of Scotland Datasets with Jupyter Notebooks. IFLA Journal, 48(1), 50–56. https://doi.org/ 10.1177/03400352211065484
en
dc.relation.hasversion
Havens, L. (2021). An Information Space with More Than a Search Bar for Discovery. ACH2021 Conference. https://lucyhavens.com/more-than-a-search-bar-for-discovery
en
dc.relation.hasversion
Havens, L., Bach, B., Terras, M., & Alex, B. (2022). Beyond Explanation: A Case for Exploratory Text Visualizations of Non-Aggregated, Annotated Datasets. Proceedings of the 1st Workshop on Perspectivist Approaches to NLP LREC2022, 73–82. https://aclanthology.org/2022.nlperspectives1.10
en
dc.relation.hasversion
Havens, L., Hosker, R., Bach, B., Terras, M., & Alex, B. (2023). Collaboration Across the Archival and Computational Sciences to Address Legacies of Gender Bias in Descriptive Metadata. Digital Humanities 2023: Book of Abstracts, 267–270. https://zenodo.org/record/7961822
en
dc.relation.hasversion
Havens, L., Terras, M., Bach, B., & Alex, B. (2024). Routledge Handbook of Heritage and Gender. Routledge
en
dc.relation.hasversion
Havens, L., Terras, M., Bach, B., & Alex, B. (2020). Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, 107–124. https : //aclanthology.org/2020.gebnlp-1.10
en
dc.relation.hasversion
Havens, L., Terras, M., Bach, B., & Alex, B. (2022). Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), 30–57. https://doi.org/10.18653/v1/2022.gebnlp-1.4
en
dc.subject
gender
en
dc.subject
bias
en
dc.subject
machine learning
en
dc.subject
natural language processing
en
dc.subject
archives
en
dc.subject
methodology
en
dc.subject
dataset
en
dc.subject
classification
en
dc.subject
model
en
dc.title
Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Havens2024.pdf
- Size:
- 12.96 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

