Molecular endotyping and integration with computer-aided CT imaging to stratify fibrotic lung disease
Item statusRestricted Access
Embargo end date17/01/2024
Przybylski, Alexander Antony
Interstitial lung diseases (ILDs) are a conglomerate of disorders compounded by challenges in clinical diagnostic methods. The resulting patient groups are highly heterogenous with often unpredictable prognoses or responses to treatment. Thus, current disease categorisations are inconducive for effective patient management and understanding of inter-individual variability. This dissertation explores the value of data-driven methods for investigating this heterogeneity. Our primary aim was to evaluate the utility and potential additive prognostic value of molecular markers and quantitative imaging features over routinely collected clinical data. Our overarching hypothesis was that shared pathobiological mechanisms may be implicated across ILD subtypes, and analysis on a combined cohort leads towards identification of endotype-phenotype subgroups. This exploratory study analysed data from a retrospective, observational cohort of ILD patients from the Edinburgh Lung Fibrosis Molecular Endotyping (ELFMEN) study, enrolled between 2007-2017. The first stage of the project comprised acquisition of molecular data and processing of computed tomography (CT) scans. Banked patient serum samples were assayed via Luminex (R&D Systems) for a set of analytes identified based on proteomic screening and literature. Historical patient CT scans conducted during patient follow-up were processed using commercial Imbio lung texture analysis software (Imbio, Minneapolis, MN) to generate quantitative radiological characterisation of the lungs. We have presented an evaluation of the datasets and their suitability for downstream analysis. For molecular measurements impacted by instrument limits of detection, we conducted simulation experiments to compare an adapted probabilistic matrix factorization approach against other imputation techniques. Clinically important outcomes, consisting of all-cause mortality and respiratory-cause non-elective hospitalisations, were evaluated on datasets combining clinical, molecular, and quantitative imaging parameters. Factors associated with patient prognosis were identified using Cox regression analysis and variable selection procedures utilising penalized likelihood estimation. Additive value of candidate molecular biomarkers was assessed via internal validation of prognostic models. Disease progression was additionally evaluated using longitudinal lung function data and mixed-effects models. Our presentation of results aims to facilitate decision making in a biomarker discovery context and to provide hypothesis-generating insights into ILD mechanisms and potential targets for future confirmatory studies. Out of 52 molecular markers analysed, we highlight eight which demonstrated consistent prognostic added value over routine clinical variables. Six have been reported previously and two are potentially novel associations. Discrimination metrics were improved and calibration remained mostly consistent for models incorporating biomarkers, compared to routine clinical variables alone. Our results were broadly consistent between the two time-to-event outcomes and in sub-analysis when evaluating the added value over an established ILD prognostic model. Fewer biomarker showed associations with lung function rate of change, however a subset of those identified via survival analysis were reported. The greatest challenge in interpretation of results was due to small sample size with respect to the number of candidate parameters, resulting in instability of variable selection. While no unique set of biomarkers formed the optimal model, the eight biomarkers reported were most frequently selected in various combinations across bootstrap samples. In extended analysis including quantitative CT (QCT) data, we found QCT features were interpretable and aligned with existing clinical knowledge. Novel radiological features may be derived, and in agreement with previous studies, we found that the pulmonary vessel volume may serve as a strong prognostic factor. There remain however significant challenges that limit the applicability of QCT in practice, especially in the context of retrospective study designs. Sensitivity to scan acquisition parameters results in technical variability and further methodological improvements are required. Overall, this thesis presents a unique ILD resource in terms of cohort, datasets and analyses which will support advances in collaborative ILD research. Future work will focus on generalisability and validation of our findings in prospective studies.