Modeling crowd work in open task systems
This thesis aims to harness modern machine learning techniques to understand how and why people interact in large and open, collaborative online platforms: task systems. The participants who interact with the task systems have a diverse set of goals and reasons for contributing and the data that is logged from their participation is often observational. These two factors present many challenges for researchers who wish to understand the motivations for continued contributions to these projects such as Wikipedia and Stack Overflow. Existing approaches to scientific investigation in such domains often take a “one-size-fits-all” approach where aggregated trends are studied and conclusions are drawn from overview statistics. In contrast to these approaches, I motivate a three-stage framework for scientific enquiry into the behaviour of participants in task systems. First I propose a modelling step where assumptions and hypotheses from Behavioural Sciences are encoded directly into a model’s structure. I will show that it is important to allow for multiple competing hypotheses in one model. It is due to the diversity of the participants’ goals and motivations that it is important to have a range of hypotheses that may account for different interaction patterns present in the data. Second, I design deep generative models for harnessing both the power of deep learning and the structured inference of variational methods to infer parameters that fit the structured models from the first step. Such methods allow us to perform maximum likelihood estimation of parameter values while harnessing amortised learning across a dataset. The inference schemes proposed here allow for posterior assignment of interaction data to specific hypotheses, giving insight into the validity of a hypoth- esis. It also naturally allows for inference over both categorical and continuous latent variables in one model - an aspect that is crucial in modelling data where competing hypotheses that describe the users’ interaction are present. Finally, in working to understand how and why people interact in such online settings, we are required to understand the model parameters that are associated with the various aspects of their interaction. In many cases, these parameters are given specific meaning by construction of the model, however, I argue that it is still important to evaluate the interpretability of such models and I, therefore, investigate several tests for performing such an evaluation. My contributions additionally entail designing bespoke models that describe people’s interactions in complex and online domains. I present examples from real-world domains where the data consist of people’s actual interactions with the system.