Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation

Havens, Lucy Joan

Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation

Simple item page

dc.contributor.advisor

Alex, Beatrice

dc.contributor.advisor

Bach, Benjamin

dc.contributor.advisor

Terras, Melissa

dc.contributor.author

Havens, Lucy Joan

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.contributor.sponsor

School of Informatics Graduate School

en

dc.date.accessioned

2024-02-07T10:13:09Z

dc.date.available

2024-02-07T10:13:09Z

dc.date.issued

2024-02-07

dc.description.abstract

This thesis proposes a recalibration of Machine Learning for social biases to minimize harms from existing approaches and practices in the field. Prioritizing quality over quantity, accuracy over efficiency, representativeness over convenience, and situated thinking over universal thinking, the thesis demonstrates an alternative approach to creating Machine Learning models. Drawing on GLAM, the Humanities, the Social Sciences, and Design, the thesis focuses on understanding and communicating biases in a specific use case. 11,888 metadata descriptions from the University of Edinburgh Heritage Collections' Archives catalog were manually annotated for gender biases and text classification models were then trained on the resulting dataset of 55,260 annotations. Evaluations of the models' performance demonstrates that annotating gender biases can be automated; however, the subjectivity of bias as a concept complicates the generalizability of any one approach. The contributions are: (1) an interdisciplinary and participatory Bias-Aware Methodology, (2) a Taxonomy of Gendered and Gender Biased Language, (3) data annotated for gender biased language, (4) gender biased text classification models, and (5) a human-centered approach to model evaluation. The contributions have implications for Machine Learning, demonstrating how bias is inherent to all data and models; more specifically for Natural Language Processing, providing an annotation taxonomy, annotated datasets and classification models for analyzing gender biased language at scale; for the Gallery, Library, Archives, and Museum sector, offering guidance to institutions seeking to reconcile with histories of marginalizing communities through their documentation practices; and for historians, who utilize cultural heritage documentation to study and interpret the past. Through a real-world application of the Bias-Aware Methodology in a case study, the thesis illustrates the need to shift away from removing social biases and towards acknowledging them, creating data and models that surface the uncertainty and multiplicity characteristic of human societies.

en

dc.identifier.uri

https://hdl.handle.net/1842/41420

dc.identifier.uri

http://dx.doi.org/10.7488/era/4154

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Ames, S., & Havens, L. (2022). Exploring National Library of Scotland Datasets with Jupyter Notebooks. IFLA Journal, 48(1), 50–56. https://doi.org/ 10.1177/03400352211065484

en

dc.relation.hasversion

Havens, L. (2021). An Information Space with More Than a Search Bar for Discovery. ACH2021 Conference. https://lucyhavens.com/more-than-a-search-bar-for-discovery

en

dc.relation.hasversion

Havens, L., Bach, B., Terras, M., & Alex, B. (2022). Beyond Explanation: A Case for Exploratory Text Visualizations of Non-Aggregated, Annotated Datasets. Proceedings of the 1st Workshop on Perspectivist Approaches to NLP LREC2022, 73–82. https://aclanthology.org/2022.nlperspectives1.10

en

dc.relation.hasversion

Havens, L., Hosker, R., Bach, B., Terras, M., & Alex, B. (2023). Collaboration Across the Archival and Computational Sciences to Address Legacies of Gender Bias in Descriptive Metadata. Digital Humanities 2023: Book of Abstracts, 267–270. https://zenodo.org/record/7961822

en

dc.relation.hasversion

Havens, L., Terras, M., Bach, B., & Alex, B. (2024). Routledge Handbook of Heritage and Gender. Routledge

en

dc.relation.hasversion

Havens, L., Terras, M., Bach, B., & Alex, B. (2020). Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, 107–124. https : //aclanthology.org/2020.gebnlp-1.10

en

dc.relation.hasversion

Havens, L., Terras, M., Bach, B., & Alex, B. (2022). Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), 30–57. https://doi.org/10.18653/v1/2022.gebnlp-1.4

en

dc.subject

gender

en

dc.subject

bias

en

dc.subject

machine learning

en

dc.subject

natural language processing

en

dc.subject