Genomics England’s 100,000 Genomes project gets EMC support
Project, which will sequence genomes from 70,000 people, will rely on an Isilon data lake for all the data collected during genome sequencing
Genomics England, the company owned by the Department of Health to deliver the 100,000 Genomes project, is to adopt EMC technology to support the data collection and analytics involved with sequencing sets of genes.
The 100,000 Genomes project is focused on patients with rare diseases and their families, as well as patients with common cancers. It will sequence 100,000 genomes from around 70,000 people, with participants being NHS patients with a rare disease, plus their families, and patients with cancer.
Once a genome has been sequenced, the information, which amounts to hundreds of gigabytes per genome sequence, is then stored digitally. The data in the project is expected to increase 10 fold over the next two years, and will be key to provide the agility needed to analyse and compare the immense data sets.
De-identified data from the 100,000 Genomes project will also be made available to approved researchers from academia and industry to help accelerate the development of new treatments and diagnostic tests that are targeted at the genetic characteristics of individual patients.
EMC said its storage is to be adopted by Genomics England, notably utilising the company’s VCE vScale, with EMC Isilon and EMC XtremIO solutions.
Previously Genomics England used EMC Isilon for storage of its sequence library. Now Genomics England plans to use an Isilon data lake for all the data collected during genome sequencing.
Once captured at the sequencing centre in Cambridge, the file will be stored on Genomics England’s IT infrastructure. The Isilon data lake will enable initially 17PB of data to be stored and made available for multi-protocol analytics, including Hadoop. Alongside the Isilon data lake, 24 X-Bricks of all-flash XtremIO is in place to support their virtualised applications. EMC’s Data Domain and Networker will also be used to provide back-up services.
The Genomics England IT environment uses both on-premise servers and Infrastructure-as-a Service, provided by cloud service providers on G-Cloud. One of Genomics England’s key legacies is expected to be an ecosystem of cloud service providers providing low cost, elastic compute on demand through G-Cloud, bringing the benefits of scale to smaller research groups.
Dave Brown, head of informatics infrastructure at Genomics England said, “This project is at the cutting edge of science and technology. EMC’s data lake platform provides the secure data storage that we need, with the flexibility and power to undertake complex analysis.”
“There are few better examples of the fundamental impact that analysis of datasets can have on society. It’s a privilege to be chosen as the IT storage provider for Genomics England and to be part of a revolutionary time for genome analysis. Genomics has the potential to transform healthcare and redefine the way the NHS operates, uncovering medical treatments, benefitting patient experiences and transforming the economics of universal healthcare in the UK”, added Ross Fraser, vice president and managing director, UK&I, EMC.
“Delivering the platform for this large scale analytics in a hybrid cloud model will help accelerate the impact Big Data analytics could have on the NHS, potentially delivering billions in efficiencies in care delivery and improving patient outcomes immeasurably.”