Evaluating the utility of gene expression data from patient-matched samples for studying breast cancer
Breast cancer is a heterogeneous disease with distinct subtypes and many different clinical presentations. Neoadjuvant therapy of breast cancer offers a window of opportunity to study translational changes in tumours as a result of treatment alone and may help to identify tumour response status. Pairs of samples collected from different sites or sequentially from the same individual can potentially provide additional prognostic information for the risk stratification of breast cancer. Here, we seek to aggregate multiple studies of valuable, multi-sampled, patient-matched cohorts for meta-analysis to check for an enhanced ability to make new and significant findings about the underlying mechanisms of tumour treatment response. Multiple sequentially-matched datasets of pre- and on-treatment matched primary tumour and lymph node samples were collected and examined for differentially expressed genes and pathways indicative of pathological response. Machine learning methods were applied to identify biomarkers of response from the on-treatment samples, and profiling comparisons were made to assess the additional value of matched patient samples to accurately predict risk. Lastly, five sequentially sampled datasets were aggregated for meta-analysis by combining the normalised pre- to on-treatment expression level differences to identify commonalities in the response to therapy across both endocrine and chemotherapy treatment strategies. The gene, AAGAB, was identified through iterative differential analysis, and was found to be 78% accurate in validation for the prediction of pathological complete response in neoadjuvant chemotherapy treated breast cancer. AAGAB demonstrated significant separation of patient survival curves (log rank p = 0.0036), and the on-treatment samples more accurately reflected the patient risk than the pretreatment samples. Matched lymph node tissue of primary breast cancer was more successful at capturing the patient’s risk of recurrence than the primary biopsy, correctly identifying 83% (10/12) of the recurring patients compared to 25% (3/12) in the primary. Underlying differential expression analysis also showed a considerable number of high profile breast cancer genes over-represented in the lymph node. Aggregation of multiple sequential studies resulted in low post integration concordance values with the reference patient data (<30% profiling agreement), and is not recommended for this type of analysis. However, combining the pairwise change values for gene expression level data was successful, and resulted in the creation of highly accurate models for predicting patient response (F1 accuracy score, 0.92) as well as the identification of potential common escape pathways to breast cancer therapies. Analysis of the matched pre- and on-treatment samples revealed the intrinsic value of multiple on-treatment biopsies. These samples offer valuable new targets for biomarker identification that show significant increases in accuracy for the prediction of response and long term outcome in neoadjuvant chemotherapy. Additional sampling of involved metastatic lymph node also improves the prognostic capabilities for clinicians by providing a potentially more accurate view of the per-patient risk profile. Lastly, the pairwise expression change values show the direction of tumour change, which can be used to create new models for the prediction and classification of patient risk and for furthering our understanding of the mechanisms behind patient non-response.