Sorghum is a drought-tolerant staple crop for half a billion people in Africa and Asia, an important source of animal feed throughout the world and a biofuel feedstock of growing importance. Cultivated sorghum and its inter-fertile wild relatives constitute the primary gene pool for sorghum. Understanding and characterizing the diversity within this valuable resource is fundamental for its effective utilization in crop improvement. Here, we report analysis of a sorghum pan-genome to explore genetic diversity within the sorghum primary gene pool. We assembled 13 genomes representing cultivated sorghum and its wild relatives, and integrated them with 3 other published genomes to generate a pan-genome of 44,079 gene families with 222.6 Mb of new sequence identified. The pan-genome displays substantial gene-content variation, with 64% of gene families showing presence/absence variation among genomes. Comparisons between core genes and dispensable genes suggest that dispensable genes are important for sorghum adaptation. Extensive genetic variation was uncovered within the pan-genome, and the distribution of these variations was influenced by variation of recombination rate and transposable element content across the genome. We identified presence/absence variants that were under selection during sorghum domestication and improvement, and demonstrated that such variation had important phenotypic outcomes that could contribute to crop improvement. The constructed sorghum pan-genome represents an important resource for sorghum improvement and gene discovery.
- Extensive variation within the pan-genome of cultivated and wild sorghum