This has been a long time coming but last month, the Ellison Institute of Technology launched EIT Pathogena. This is a website where anyone anywhere in the world can work out what species of Mycobacteria are in a sample and, if it is Mycobacterium tuberculosis, which, as the name suggests, is the causative agent of tuberculosis, also work out which antibiotics are likely to be effective. Lastly it also tells you if the genome of that sample is sufficiently similar to any other samples you’ve uploaded that they could be part of the same outbreak.
So how does it do this? Well you have to have put your Mycobacterial sample through a genetic sequencing machine — this gives you two output files (called FASTQ files) which contain lots of short stretches of DNA found in the sample which will have come from the patient, other bacteria, the odd virus and probably some Mycobacteria. Historically sifting through these files and working out what is what and then seeing if you can build a genome from some of the short stretches (a bit like a really big jigsaw, just one where the pieces overlap and some have mistakes) is the job of a Bioinformatician and is difficult.
EIT Pathogena makes that simple; all you have to do is drag and drop the FASTQ files onto the web portal and it will upload them, then automatically remove and forget any bits of human DNA (as these could be used to identify the patient in theory) before working out what species are present etc.
We have written all the computer code that handles all the short stretches of DNA. Much of the software used to predict which antibiotic is likely to work was originally written as part of our earlier CRyPTIC project but has been rewritten by our Research Software Engineers (RSE) to bring it inline with modern software engineering practices.
If you like looking at code, head over to GitHub and check out gnomonicus which in turn uses gumpy and piezo. All of these are written in Python3 — Jeremy Westhead who is one of our RSEs noticed that we could speed up this part of the pipeline significantly by rewriting gumpy in Rust. He called this new version grumpy of course! All of this software has a license allowing anyone to use it for research but prohibits using it for commercial purposes.