Project Description
Introduction: This interdisciplinary proposal focuses on the application of high-performance computing (HPC) approaches to the analysis of next-generation sequencing (NGS) of the genomes brown and polar bears. It aims to improve and speed up the detection of common variants in cohorts of related genomes to establish evolutionary trajectories of the corresponding species. The work will be performed by a graduate student under the supervision of Dr. Andrey Grigoriev, Professor at the Biology Dept and Center for Computational and Integrative Biology at Rutgers-Camden. Remote work is the most likely mode of operation in this project.
Genomes of all organisms and species undergo constant change and mutations are of varying scales. Structural variants (SVs) typically affect much larger genome intervals compared to single nucleotide variants (SNVs) or short insertions/deletions (indels). Currently, comparative genomics efforts mostly focus on SNV/indels in protein coding regions, while the role of SVs (especially outside those regions) generally remains a mystery. There is an unmet need and a growing interest in understanding the effect of SVs in evolution using NGS.
Project details: Current SV finding pipelines have low accuracy and take a long time to run on many samples. Also, none of such pipelines combines multiple-sample evidence for variant detection. This precludes fast detection of the commonalities in related species and their integration with gene expression and other large-scale phenotypic datasets. We hypothesize that parallel search in multiple samples, combining weaker evidence at similar locations in similar subspecies will further improve SV prediction accuracy. Once representative groups of such variants are established, we also will investigate the possibilities of applying machine learning methods for these purposes.
We will build upon our recent success with GROM (Genome Rearrangement Omni-Mapper), a very fast and accurate novel algorithm we developed(1,2) to identify all types of genome variants (from large SVs down to one base pair change) in a single run. It has been used for the comparisons of elephant and mammoth genomes(3) and we expect it to perform well in the proposed multi-species comparisons.
References
1. Smith SD, Kawash JK, Grigoriev A. (2015) GROM-RD: resolving genomic biases to improve read depth detection of copy number variants. PeerJ, 3:e836.
2. Smith, S., Kawash, J., Grigoriev, A. (2017) Lightning-fast genome variant detection with GROM. GigaScience 6(10), 1-7.
3. Smith, S., Kawash, J., Karaiskos, S., Biluck, I., Grigoriev, A. (2017) Evolutionary adaptation revealed by comparative genome analysis of woolly mammoths and elephants. DNA Research 24(4): 359-369.