Getting results in the 100,000 Genomes Project – The Journey
The first attempt to sequence a whole human genome took 13 years – all 3 billion letters of the genetic code. Super fast new technology means this can be done in as little as 24 hours today.
In practice, because we batch genomes together for efficiency, it takes us 3 days to sequence a whole genome.
But sequencing is only the beginning.
Looking for a needle in a haystack
When we look at your genome, we are looking for a needle – a glitch – in a vast haystack. The first thing we need to do is make the haystack a bit smaller.
Luckily human genomes are 99.8% the same. But that’s still around 4 million potential differences, most of which are healthy variations that make us the individuals we are.
Sequencing your genome produces two files of info. One is the raw data – all six billion letters. The other is what’s called a variant call file. That’s the 4 million. This is the ‘small’ haystack we now work with.
The process so far has largely been automatic. (Though at the time we started this project, no-one in the world had sequenced 100,000 whole genomes – and we’ve made something sound easy that even 5 years ago would have been thought impossible. Hats off to Illumina who do this for us in Cambridge.)
Next comes the bit that takes the time.
Call in the Bioinformaticians
Bioinformaticians – scientists who are brilliant at organising information and spotting patterns – trawl through the 4 million, looking for the glitches that might possibly account for someone’s symptoms. They decipher how each one might affect a person and pull from the many, many hundreds of potential ‘needles’ – changes that might possibly be responsible for a problem. This bit is called ‘annotation’.
But they’ve still got hundreds and hundreds of potential suspect glitches. A small haybale’s worth.
“Trawling through the 4 million used to take a year or more.”
Some of these now get discarded thanks to a filtering system which uses special tools that can access huge databases of knowledge. Out for instance go changes where there’s good evidence that they’re commonly found in the population and don’t cause a problem. Out also go changes that don’t fit the disease in your family.