Complete genome assembly of a virulent barcoded M. tuberculosis Erdman strain

publication

Microbiology Resource Announcements, February 2025

M. tuberculosis genomics genome assembly long-read sequencing annotation

Overview

The non-human primate (NHP) model of tuberculosis is one of the closest experimental systems to human Mtb infection, and relies on a specific barcoded M. tuberculosis Erdman strain used for aerosol infection studies. Despite its widespread use, this strain lacked a complete, high-quality reference genome — a gap that limits the accuracy of variant calling and downstream genomic analyses.

We assembled and annotated a complete genome sequence (Erdman_SF2024) of this strain using a combination of long-read (Oxford Nanopore) and short-read (Illumina) sequencing, producing a closed circular chromosome of 4,416,075 bp (65.61% GC). Annotations were transferred from the well-characterized H37Rv reference genome and supplemented with de novo predictions, yielding 4,011 coding sequences.

Compared to the existing Erdman reference (ATCC35801), Erdman_SF2024 has fewer predicted indels and pseudogenes among essential genes, suggesting those discrepancies in ATCC35801 may be assembly errors. The strain was phylogenetically placed in the Mtb Erdman sub-lineage L4.1.2.1.

We also performed a mappability analysis to define repetitive regions of low alignment confidence — a resource that can be used to mask problematic sites in future short-read variant calling studies.

Data availability

Citation

Maximilian G. Marin, Michael R. Chase, Natalia Quinones, Shoko Wakabayashi, Douaa Mugahid, Sarah M. Fortune, Maha R. Farhat, Michael C. Chao. Complete genome sequence of a virulent barcoded Mycobacterium tuberculosis str. Erdman commonly used for non-human primate infection studies. Microbiology Resource Announcements (2025). https://doi.org/10.1128/mra.01232-24