Studying the ability of finding single and interaction effects with Random Forest, and its application in psychiatric genetics
Item Status
Embargo End Date
Date
Authors
Abstract
Psychotic disorders such as schizophrenia and bipolar disorder have a strong genetic
component. The aetiology of psychoses is known to be complex, including additive
effects from multiple susceptibility genes, interactions between genes, environmental
risk factors, and gene by environment interactions. With the development of new
technologies such as genome-wide association studies and imputation of ungenotyped
variants, the amount of genomic data has increased dramatically leading to the
necessary use of Machine Learning techniques. Random Forest has been widely used
to study the underlying genetic factors of psychiatric disorders such as epistasis and
gene-gene interactions. Several authors have investigated the ability of this algorithm
in finding single and interaction effects, but have reported contradictory results.
Therefore, in order to examine Random Forest ability of detecting single and
interaction effects based on different variable importance measures, I conducted a
simulation study assessing whether the algorithm was able to detect single and
interaction models under different correlation conditions. The results suggest that the
optimal Variable Importance Measures to use in real situations under correlation is the
unconditional unscaled permutation variable importance measure. Several studies
have shown bias in one of the most popular variable importance measures, the Gini
importance. Hence, in a second simulation study I study whether the Gini variable
importance is influenced by the variability of predictors, the precision of measuring
them, and the variability of the error. Evidence of other biases in this variable
importance was found. The results from the first simulation study were used to study
whether genes related to 29 molecular biomarkers, which have been associated with
schizophrenia, influence risk for schizophrenia in a case-control study of 26476 cases
and 31804 controls from 39 different European ancestry cohorts. Single effects from
ACAT2 and TNC genes were detected to contribute risk for schizophrenia. ACAT2 is a
gene in the chromosome 6 which is related to energy metabolism. Transcriptional
differences have been shown in schizophrenia brain tissue studies. TNC is expressed
in the brain where is involved in the migration of the neurons and axons. In addition,
we also used the simulation results to examine whether interactions between genes
associated with abnormal emotion/affect behaviour influence risk for psychosis and
cognition in humans, in a case-control study of 2049 cases and 1794 controls. Before
correcting for multiple testing, significant interactions between CRHR1 and ESR1, and
between MAPT and ESR1, and among CRHR1, ESR1 and TOM1L2, and among MAPT,
ESR1 and TOM1L2 were observed in abnormal fear/anxiety-related behaviour
pathway. There was no evidence for epistasis after Bonferroni correction.
This item appears in the following Collection(s)

