The 100,000 Genomes Project’s primary goal is to transform the NHS: embedding genomic medicine for earlier diagnosis and more effective treatments. At the same time, the Project is tasked with making the resulting data available to researchers across the world to better interpret genomic data − leading to improved clinical understanding and patient outcomes.
In mid June, we took a big step towards achieving our research goals, with the first groups of scientists accessing data from the Project’s main programme. These research groups form domains known as the Genomics England Clinical Interpretation Partnership (GeCIP). This initial phase brings in 34 researchers from three domains; two disease focused: neurology and colorectal cancer, and one “cross-cutting”: Machine Learning. See the infographic below for details:
A pre-GeCIP group of researchers have already begun working on separate batches of data from the Project’s Pilot phase: on Chronic Lymphocytic Leukaemia or CLL, led by Professor Anna Shuh, and Rare Disease, led by Genomics England.
The Research Environment
Work on this scale has never been attempted before and we need to learn how best to interact with the data. So, we’ve populated our initial learning environment with a subset of data from the cancer and rare disease arms of the Project, comprising 1,207 individuals. The first researchers are helping us test the suitability of the environment, before we scale up access to researchers to a much bigger data resource in the near future.
One of the biggest challenges in our GeCIP work has been to create systems that give access to researchers, while also ensuring the safety and security of participants’ data, so Genomics England has had to build a unique solution from the ground up.
This solution is an ‘airlock’ between the data and the outside world. Think of the data repository as a huge sterile laboratory: researchers wishing to access it go through the airlock, which ensures that they and the tools they wish to use are properly authorised. Once cleared, they can enter and work on the data, but ‘sealed off’ from the outside world. When they wish to leave, they go back through the airlock, which ensures that what they take out is similarly appropriate and authorised. To protect participants’ data, only analysis results can be taken out – not the individual-level data itself.
In effect, we have created a completely new, virtual and globally accessible Research Environment. It is designed to evolve – embracing as yet unknown tools and techniques – with our growing understanding of the power of genomic medicine. Underpinning everything, however, is our obligation of trust: ensuring that 100,000 Genomes Project participants’ data is always safe and secure.
The opening of this virtual space to the GeCIP is a significant step and I would like to thank everyone for investing their time, patience and commitment to make this work. I know that this global collaboration – bringing together the best minds in genomic research − will deliver real advances in genomic medicine and greatly improved patient outcomes.
– Professor Mark Caulfield