Erik Torstensson, Gaurav Goyal, Anna Johnning, Fredrik Westerlund, Tobias Ambjörnsson.
Optical DNA mapping (ODM) is based on fluorescent labeling, stretching and imaging of single DNA molecules to obtain sequence-specific fluorescence profiles, DNA barcodes. These barcodes can be mapped to theoretical counterparts obtained from DNA reference sequences, which in turn allow for DNA identification in complex samples and for detecting structural changes in individual DNA molecules. There are several types of DNA labeling schemes for ODM and for each labeling type one or several types of match scoring methods are used. By combining the information from multiple labeling schemes one can potentially improve mapping confidence; however, combining match scores from different labeling assays has not been implemented yet. In this study, we introduce two theoretical methods for dealing with analysis of DNA molecules with multiple label types. In our first method, we convert the alignment scores, given as output from the different assays, into p-values using carefully crafted null models. We then combine the p-values for different label types using standard methods to obtain a combined match score and an associated combined p-value. In the second method, we use a block bootstrap approach to check for the uniqueness of a match to a database for all barcodes matching with a combined p-value below a predefined threshold. For obtaining experimental dual-labeled DNA barcodes, we introduce a novel assay where we cut plasmid DNA molecules from bacteria with restriction enzymes and the cut sites serve as sequence-specific markers, which together with barcodes obtained using the established competitive binding labeling method, form a dual-labeled barcode. All experimental data in this study originates from this assay, but we point out that our theoretical framework can be used to combine data from all kinds of available optical DNA mapping assays. We test our multiple labeling frameworks on barcodes from two different plasmids and synthetically generated barcodes (combined competitive-binding- and nick-labeling). It is demonstrated that by simultaneously using the information from all label types, we can substantially increase the significance when we match experimental barcodes to a database consisting of theoretical barcodes for all sequenced plasmids.