A network diagram showing protein interactions inside a cell carousel. Red and yellow are drug targets; red is cancer, yellow is other diseases (image: Dr Bissan Al-Lazikani)
Earlier this week, Dr Bissan Al-Lazikani, Head of Data Science here at The Institute of Cancer Research, London, addressed the Parliamentary and Scientific Committee about how we use data in the fight against cancer.
She revealed how Big Data was creating new opportunities for drug discovery as well as informing the way we treat patients - leading to better cures with fewer side effects.
However, with the incredible surge in data, comes a range of challenges that need to be addressed in order for us to use it to its greatest potential.
The promises of Big Data
We are generating unprecedented quantities of complex and varied data about cancer and advances in technology have made it faster and cheaper to collect. The next step is working out how we make use of it all.
Computational approaches are increasingly allowing us to share and integrate diverse sets of data, often referred to as ‘Big Data’. That gives us the potential to uncover knowledge that could not be observed by working on smaller, individual subsets of data.
Here at the ICR, we believe that using Big Data effectively will help us meet the biggest challenge in improving cancer survival – cancer’s complexity and ability to evolve resistances to treatment.
But by definition, Big Data indicates a volume, complexity and diversity of data which challenges our current capabilities to analyse and handle. To give you an idea of the scale, the ICR’s scientific database about cancer, canSAR, contains more than 10 billion data points!
We’re generating genomic data, structural data, images of cells, tissue and the whole body, chemical and pharmacological data, radiotherapy data and clinical notes. Our ability to collect data hasn’t been matched with new systems for analysing all these different types of data together.
So what do we need to make the most of this data? I spoke to our researchers and asked what needs to happen to fully seize the opportunities afforded by Big Data.
A coordinated approach to sharing
Scientific organisations need to work together to develop compatible data management approaches that will enable effective data sharing between different organisations.
There is a risk that huge volumes of data are currently being stored in ways that will not prove compatible between organisations, and this is likely to become an increasing problem as data sharing becomes more common.
Common protocols on how to label, search and access data, together with compatible infrastructure, will allow greater sharing between collaborators.
We don’t want to, and can’t, standardise everything – sometimes it will be about creating interoperable systems that can talk to each other and convert one data format info another.
Culture change in data collection
We need to future-proof data collection during clinical trials and routine treatment to ensure data are collected and annotated in a way that maximises the benefit to patients through enabling Big Data analysis.
Certain types of data have common agreed standards, whereas other forms – in particular clinical notes outside trials – do not.
We need to establish clearer protocols and also develop education programmes, in particular for clinicians, to drive changes in the culture surrounding data storage and exchange.
Dr Bissan Al-Lazikani’s Computational Biology and Chemogenomics Team develops tools that help drug discovery efforts process large amounts of data obtained from research.
Find out more
Training people with the right skills
Research organisations need to have access to people with the skills to deliver a Big Data approach. Because we are seeing such rapid technological change, higher education institutions need to train people with fundamental skills that will allow them to adapt to future changes, rather than in specific technologies that risk becoming out of date.
We need training and internship programmes to build the skills base in mathematics, statistics and computer science, along with a relaxation of visa restrictions for people with skills relevant to Big Data, who may not necessarily possess or require PhD qualifications.
Investment in infrastructure
Current software and hardware infrastructure will soon be unable to support the vast and ever increasing volume of data transfer required.
It needs a coordinated effort to design and implement the next generation of data handling infrastructure and technologies in order to create Big Data federation highways.
We also need to implement new protocols that can analyse data distributed across different systems and sites. This needs to involve Government, industry and academia working together.
Working with patients and the public
We need to get the greatest possible benefit out of the data that we collect from patients to maximise the impact of our research. Researchers must work together with patients to ensure that research delivers value to them.
More information needs to be provided to both the public and clinicians to enhance their understanding of the value of Big Data, and of how their data is stored and accessed, and the safeguards that exist to protect their privacy and security.
We need to develop and promote standards for gaining patient consent in a flexible, secure and traceable way so that data be used in future – including in slightly different ways to what was originally envisioned as technologies develop.
And we need to ensure we collect broad enough data to allow us to ask all the questions that we may later have, maximising the research we can do and the benefits we can deliver for people with cancer.
There is already some incredible work going on at the ICR to use Big Data analysis to create new understanding about cancer. The ICR recently published its position statement on Big data and what we need to capitalise on the opportunities it offers.
The hope is that if academic organisations can work together – with Government and industry – Big Data can become a central pillar of medical research.
comments powered by