We explore several algorithmic approaches to determining causal variants for detecting autism. These include game theoretic approaches (Shapley values), novel applications of the maximum flow algorithm, family-based statistical studies, and machine learning approaches.
We have designed a novel algorithm based on maximum flow, which utilizes the presence of linkage disequilibrium to perform biologically-informed dimensionality reduction and improve model interpretability. We apply this methodology to whole genome sequence (WGS) data from 2182 children with ASD and 379 control patients in order to identify the contribution of variants in simple repeat sequences (SRS) to the ASD phenotype.
We have also explored the role of small inherited deletions in multiplex families with idiopathic autism. We show that a hidden Markov model can accurately identify inherited deletions in nuclear families using variant calls from whole genome sequencing data.
We have repurposed the Shapley value, a game theoretic approach to quantify the marginal contribution of a “player” (in this case, a single nucleotide polymorphism) to the “game outcome” (in this case, prediction of autism).
Many of these approaches are explored in family contexts, especially multiplex (two relatives have autism) and simplex (one family member is affected) compositions.