Brian W. Davis and Terje Raudsepp
Texas A&M University, College Station Texas
A reference genome is the most essential asset to modern genetics research. This singular community resource has the power to shape every aspect of species investigations including population structure and health, evolution, morphological variation, and heritable disease. Additionally, an understanding of the genetic variation within a species is extremely important for identifying variation associated with phenotypes and disease by comparing affected individuals to the entire population. The current alpaca reference has been helpful but is extremely fragmented, existing in 204,000 small individual segments that are stitched together into ~77,000 regions with unknown gaps between each region. No database of genetic variation exists for alpaca at the genome level, and very little information as to the placement of genes in the genome exists. This project will correct all three of these deficiencies by using cutting-edgegenomic technologies and computational methods to construct a chromosome-level genome that rivals that of other important agricultural species. This follows with a characterization of genomic variation across all publicly available and privately held alpaca sequence data and an annotation to unambiguously identify the location and structure of genes in alpaca. The finished deliverables will be provided to the research community immediately upon completion. All the above goals were completed with the added bonuses of characterization of novel genome structure (SAC-SAT and NOR), as well as a much larger variant database of 120 individuals from both North and South America.