I develop methods to analyze genome sequencing data in the context of other ‘omics and clinical health data to prioritize and functionally interpret genetic variants with roles in human disease. See my CV for details.
Image from our Nat Comm, 2023 paper.
Functional genomics for “N of 1” analyses
The genome is a big space, and accurately pinpointing variants that underlie specific human health conditions is a formidable challenge. Traditionally, genes have been treated as black box functional units, but we now know that individual variants within and between genes can have wildly different impacts. Because comprehensive, in vivo (in a living system) functional assessment of all possible genetic variants is (currently) infeasible, we instead turn to in silico (computational) variant functionality predictions. We develop integrative tools for assessing the functionality of specific genomic positions and are interested in leveraging multimodal biological and biomedical data to derive new insights on the function of genetic variants. [30535108, 33580225]
Integration of clinical phenotyping
Patient clinical phenotyping data is an essential component in interpreting the impact of genetic variants on human health. Phenotyping data can be noisy, unstructured, and difficult to obtain, and utilizing this information often requires deep clinical intuition. We are interested in developing computational approaches for streamlining the process of utilizing (standardized) phenotype data for automating diagnostic gene prioritization and interpretation. [37828001, medRxiv, bioRxiv]
Deriving insights from population-level analyses
Even though the genome is a big space, it is also a finite space. This means that as the number of sequenced genomes continues to grow, we will begin to observe all possible variants (and recurrence of functional variants in phenotypically-matched cohorts). Indeed, the number of sequenced tumor genomes has surpassed 10s of thousands, collective cohorts of sequenced Mendelian patients is exceeding 100s of thousands, and sequenced diverse, healthy populations is set to pass a million or more. By integrating variant functionality information, evolutionary constraint and mutational models, we will have the power to detect extremely rare variants that play roles in human cancers and other diseases. [32711844, bioRxiv]
Publications
= project lead, = corresponding author, = team science
- Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations. bioRxiv, 2024.
- Simulation of undiagnosed patients with novel genetic conditions. Nature Communications, 2023.
- Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genetics in Medicine, 2021.
- Innovative methodological approaches for data integration to derive patterns across diverse, large-scale biomedical datasets. Pac Symp Biocomput, 2021.
- How medical mysteries push back the frontier of genomics knowledge. UDN PEER Newsletter, 2021.
- PertInInt: An integrative, analytical approach to rapidly uncover cancer driver genes with perturbed interactions and functionalities. Cell Systems, 2020.
- Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets. Pac Symp Biocomput, 2020.
- Systematic, domain-based aggregation of protein structures highlights DNA-, RNA-, and other ligand-binding positions. Nucleic Acids Research, 2019.
- Pervasive variation of transcription factor orthologs contributes to regulatory network divergence. PLoS Genetics, 2015.
- Formatt: Correcting protein structural alignments by sequence peeking. ACM-BCB’11, 2011.
- Phenotypic overlap between rare disease patients and variant carriers in a large population cohort informs biological mechanisms. medRxiv, 2024.
- VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. medRxiv, 2024.
- Deep learning for diagnosing patients with rare genetics diseases. medRxiv, 2023.
- Polygenic risk scores for autoimmune related diseases are significantly different and skewed in cancer exceptional responders. medRxiv, 2023.
- RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci. Genome Biology, 2024.
- The contribution of mosaicism to genetic diseases and de novo pathogenic variants. Am J Med Genet Part A, 2023.
- Formatt: Correcting protein structural alignments by incorporating sequence alignment. BMC Bioinformatics, 2012.
- Evolving soft robotic locomotion in PhysX. ACM-GECCO’09, 2009.
- A patient-centric information commons for a national undiagnosed diseases network. In preparation, 2024.