• Home/
  • False gene and chromosome losses affected by assembly and sequence errors

False gene and chromosome losses affected by assembly and sequence errors

bioRxiv 2021
Kim J. et al

Many genome assemblies have been found to be incomplete and contain misassemblies. The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. Here we evaluate these new vertebrate genome assemblies relative to the previous references for the same species, including a mammal (platypus), two birds (zebra finch, Anna’s hummingbird), and a fish (climbing perch). We found that 3 to 11% of genomic sequence was entirely missing in the previous reference assemblies, which included nearly entire GC-rich and repeat-rich microchromosomes with high gene density. Genome-wide, between 25 to 60% of the genes were either completely or partially missing in the previous assemblies, and this was in part due to a bias in GC-rich 5’-proximal promoters and 5’ exon regions. Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the VGP assemblies.


Juwan KimChul LeeByung June KoDongAhn YooSohyoung WonAdam PhillippyOlivier FedrigoGuojie ZhangKerstin HoweJonathan WoodRichard DurbinGiulio FormentiSamara BrownLindsey CantinClaudio V. MelloSeoae ChoArang RhieHeebal KimErich D. Jarvis


This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media. To find out more about the cookies we use, see our Privacy Policy.