What is the problem?
Studying the genetic code of bacteria (DNA) can give us vital information like the genes they carry and how they are related. This can help us detect outbreaks and track how antimicrobial resistance genes spread, so we can contain them better. Putting the whole genome together gives us a clearer story than just looking at the short DNA fragments that come out of a sequencer, just like binding a book tells a clearer story than looking at the scattered pages. We can be more certain of where certain genes are, which can indicate whether it was borrowed from another bacteria, for example.
Traditionally, the most accurate way to put together, or assemble, a bacterial genome into the correct sequence of the A, T, C and G nucleotide ‘rungs’ of the DNA ladder, has been to use a combination of two different technologies in a ‘hybrid’ method: highly accurate but hard-to-assemble short-read Illumina sequences, and error-prone but easy to put together long-reads. Long read sequences make it easier to get the overall structure right- a bit like doing a 1000-piece instead of a 500,000-piece jigsaw puzzle of a modern art painting. The bigger ‘jigsaw’ pieces are more likely to include a unique part that helps orientate where you fit in the overall picture. The highly-accurate illumina short-reads then swoop in to spell-check everything. The problem with using both methods in this way is that it is expensive to run two different experiments for a single sample, and this acts as a barrier to using bacterial whole genome sequencing on a larger scale in public health surveillance.
Luckily, continued improvements in long-read sequencing utilising powerful computers and the latest machine learning techniques have driven down the error-rates of this technology.
What did we do?
To test this out, we assembled the genomes of 96 bacteria, taken from human bloodstream infections across England, using both hybrid and long-read only methods. We found that whilst both methods allowed us to create very high-quality genomes, the long-read only method was actually better than some of the hybrid methods at putting the genomes together.
So what?
This is great news for public health professionals! It means that creating complete and accurate bacterial genomes is now much more cost-effective than it was before, and takes us one step closer to integrating this into our routine surveillance of disease-causing bacteria. It could help us detect outbreaks earlier and pinpoint where they came from more accurately, especially when different bacteria start swapping smaller bits of DNA with each other and may not be linked to an outbreak with our current methods. Ultimately, looking at whole bacterial genomes better enables us to track their spread at a higher resolution compared to the more fragmented genomes from short-read sequencing, and will allow us to more effectively prevent people from getting infected.