Investigating the epidemiology of canine health using data science techniques
Woolley, Charlotte Sarah Catherine
Research into dog health has historically relied on small scale cross-sectional studies or specialised medical and clinical data, which are subject to bias and are difficult to generalise to the wider canine population. The digital era presents an opportunity to collect sources of Big Data for health surveillance and research, defined as data that is high volume, velocity or variability. Data science techniques have made accessing, managing and analysing such datasets more achievable. Large scale cohort studies are needed to estimate the incidence of disease and to identify factors associated with long-term canine health. This project was primarily based on dog owner questionnaires from Dogslife, an internet-based cohort of Labrador Retrievers in the UK set up in 2010. In this thesis, I designed data cleaning methods for Dogslife and validated some of them on veterinary and human medical records and investigated the epidemiology of canine health using Dogslife data, Google Trends and 16S ribosomal RNA gene sequencing data derived from canine faecal samples. A decision-making algorithm for identifying, correcting or removing implausible values in growth measurements was designed and tested in combination with five different data cleaning methods, which were then applied to five datasets. The algorithm was most effective in combination with non-linear mixed effects models and increased the average sensitivity and specificity of the models alone by 7.68% and 0.42% respectively. This method was adaptable and had several useful functions including allowing for individual growth trajectories, preserving data where possible and removing duplications. A vomiting outbreak was evident in UK dogs between December 2019 and March 2020 in data from Dogslife and Google Trends search queries. The odds of a vomiting incident being reported to Dogslife was 1.51 (95% CI: 1.24 – 1.84) in comparison to the same time period in previous years (December to March, 2010 to 2019). Dogslife data identified risks for a dog experiencing a vomiting episode and differences in owner-decision making when seeking veterinary attention for vomiting during the outbreak. Compared with previous years (March 23rd to July 4th, 2010 to 2019), the COVID-19 restrictions study period (March 23rd to July 4th 2020) was associated with owners reporting increases in their dogs’ exercise and worming and decreases in insurance, titbit-feeding and vaccination. Odds of owners reporting that their dogs had an episode of coughing (0.20, 95% CI: 0.04 – 0.92) and that they took their dogs to a veterinarian with an episode of any illness (0.58, 95% CI: 0.45 – 0.76) were lower during the COVID-19 restrictions compared to before. A longitudinal sub-study of Dogslife Labrador Retriever puppies was designed to investigate associations between environmental and health factors and the development of the canine microbiome. When their puppies were three to four, seven and 12 months of age, owners submitted digestive health questionnaires and faecal samples from their puppies, which were used to produce 16S ribosomal RNA gene sequencing data. Dogs’ faecal microbiota were successfully characterised for each wave of sample collection at the different dog ages. The largest source of variation in the composition of dogs’ microbiomes was explained by differences between individual dogs, explaining approximately 50%. Additional associations were found between age, sex, coat colour, UK geographical region, household type, coprophagia, contact with other animals, recent antibiotic use and recent diarrhoea and various differences in the diversity and composition of the microbiome. Owner-derived data can be used alongside other sources of Big Data and provides valid and valuable information for the surveillance of veterinary health that contains detail about environmental factors not typically present in medical records or clinical studies. Such information is becoming easier to handle and analyse with the use of data science techniques. Furthermore, cohort studies can be used for the recruitment of participants to sub-studies that aim to answer a specialised question, such as microbiome research.