1D.M. Church, 1J. Wang, 1W. Hlavina, 1Y. Kapustin, 1P. Meric, 2M. Shumway, 1D. Maglott
1National Center for Biotechnology Information, Bethesda, MD, USA, 2The Institute for Genome Research, Rockville, MD, USA
The genome sequence of the laboratory mouse, Mus musculus (C57BL/6J strain) is the second mammalian genome to be finished. Build 36 (Feb, 2006 data freeze) represents a highly accurate, essentially finished version of the mouse genome. All gaps and unfinished regions of the assembly have been curated in order to achieve the highest quality assembly. It is clear that the remaining 105 gaps are in regions containing segmental duplications and additional effort will be need to complete these regions. The mouse is an excellent organism in which to assess variation due to the development of highly inbred strains. Each strain represents a single haplotype, facilitating both mapping experiments and genome assembly. In addition to sequence from the C57BL/6J strain, finished sequence from other strains has been assembled in a strain specific fashion and these partial assemblies have been annotated using our genome annotation pipeline. Recently, the Celera whole genome mouse assembly was made available. This assembly does not represent a single haplotype, but rather was assembled using sequence from multiple strains. Annotation data from build 35 provides evidence of the utility of treating multiple assemblies. Despite there being roughly 50Mb of sequence from 129/Sv, we have uncovered two regions of large scale copy number variation. This variation leads to differences in gene content in these regions. On a larger scale, comparison of annotation between the reference assembly and Celera uncover roughly 200 genes unique to the Celera assembly and about 1000 genes unique to the reference. Genomic alignments of the Celera and reference assemblies suggest that about 7% of the reference sequence is not represented in the Celera assembly, but about 4% of the Celera assembly is not represented in the reference. Updated annotation and alignment analysis based on mouse Build 36 will be discussed.
Other abstracts in same session