Moritz Smolka, Luis F. Paulin, Christopher M. Grochowski, Medhat Mahmoud, Sairam Behera, Mira Gandhi, Karl Hong, Davut Pehlivan, Sonja W. Scholz, Claudia M.B. Carvalho, Christos Proukakis, Fritz J Sedlazeck.
Long-read Structural Variation (SV) calling remains a challenging but highly accurate way to identify complex genomic rearrangements. Here, we present Sniffles2, which is faster and more accurate than state-of-the-art SV caller across different coverages, sequencing technologies, and SV types. Furthermore, Sniffles2 solves the problem of family to population-level SV calling to produce fully genotyped VCF files by introducing a gVCF file concept. Across 31 Mendelian samples, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. This way, we were able to identify multiple mosaic SVs across a multiple system atrophy patient brain. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements. In summary, we demonstrate the utility and versatility of Sniffles2 to identify SVs from the mosaic to population levels.