Complete genome assembly of a virulent barcoded M. tuberculosis Erdman strain
publicationMicrobiology Resource Announcements, February 2025
Overview
The non-human primate (NHP) model of tuberculosis is one of the closest experimental systems to human Mtb infection, and relies on a specific barcoded M. tuberculosis Erdman strain used for aerosol infection studies. Despite its widespread use, this strain lacked a complete, high-quality reference genome — a gap that limits the accuracy of variant calling and downstream genomic analyses.
We assembled and annotated a complete genome sequence (Erdman_SF2024) of this strain using a combination of long-read (Oxford Nanopore) and short-read (Illumina) sequencing, producing a closed circular chromosome of 4,416,075 bp (65.61% GC). Annotations were transferred from the well-characterized H37Rv reference genome and supplemented with de novo predictions, yielding 4,011 coding sequences.
Compared to the existing Erdman reference (ATCC35801), Erdman_SF2024 has fewer predicted indels and pseudogenes among essential genes, suggesting those discrepancies in ATCC35801 may be assembly errors. The strain was phylogenetically placed in the Mtb Erdman sub-lineage L4.1.2.1.
We also performed a mappability analysis to define repetitive regions of low alignment confidence — a resource that can be used to mask problematic sites in future short-read variant calling studies.
Data availability
- GenBank: CP172229
- SRA: SRX26089191 (Nanopore), SRX26089190 (Illumina) — BioProject PRJNA1161419
- GitHub: maxgmarin/erdman-asm-explore
Citation
Maximilian G. Marin, Michael R. Chase, Natalia Quinones, Shoko Wakabayashi, Douaa Mugahid, Sarah M. Fortune, Maha R. Farhat, Michael C. Chao. Complete genome sequence of a virulent barcoded Mycobacterium tuberculosis str. Erdman commonly used for non-human primate infection studies. Microbiology Resource Announcements (2025). https://doi.org/10.1128/mra.01232-24