Researchers have developed a new tool that could help clinicians diagnose the most common type of breast cancer more accurately and make better treatment decisions. The tool, called EMBeddER (EMBER), integrates two types of datasets – previously not seen as compatible – to provide more comprehensive information about a person’s cancer.
In this study, led by scientists at The Institute of Cancer Research, London, the team integrated data from multiple RNA-based patient cohorts to create a comprehensive, shared visual ‘space’ to which new patient samples could easily be added. This allowed for a more precise interpretation of the disease, including the likelihood of a certain treatment being effective. Their work therefore paves the way for using RNA profiling as standard in clinical practice.
The research was funded by the European Union’s Horizon 2020 Research and Innovation Programme Marie Skłodowska-Curie and the charity Breast Cancer Now, which funds the Breast Cancer Now Toby Robins Research Centre based within the Division of Breast Cancer Research at the Institute of Cancer Research (ICR). The findings were published in the journal npj Breast Cancer.
A longstanding hurdle in precision medicine
Precision medicine, sometimes referred to as personalised medicine, aims to target treatment to the specific patient based on information about their cancer, including its genetic make-up. This information comes from RNA profiling tools, which measure the gene activity levels in cells.
Several diagnostic panels based on RNA profiling technologies have already been approved for use. For example, Prosigna – which quantifies the expression of 50 key genes – can help clinicians identify breast cancer patients with oestrogen-receptor-positive (ER+) breast cancer who are likely to benefit from chemotherapy alone, sparing them from unnecessary hormonal treatment. However, these tests only provide information about certain aspects of a person’s cancer.
As RNA sequencing is becoming cheaper, it increasingly has the potential to play a more important part in routine clinical practice. Until now, though, it has been hampered by a couple of limitations. Firstly, samples from different platforms are difficult to compare and secondly, it is not currently possible to evaluate a single sample against previously generated data from other patients.
EMBER looks set to remove these issues by enabling the integration of newly generated DNA data with retrospective patient databases. This opens the door to more accurate diagnoses and better tailored treatment approaches – in turn, leading to improved outcomes for many people with breast cancer.
Creating a unified space
By combining RNA datasets, EMBER serves as a reference point for new patient samples. When a new patient’s RNA profile is combined with data from previously profiled patients, their localisation in the EMBER space allows researchers to get additional biological information by interpreting the tumour’s molecular subtypes on a continuum. Importantly, EMBER also makes it easier to predict how the cancer will respond to endocrine therapy.
To create the model, the team focused on early stage breast cancer data, which they accessed via the Cancer Genome Atlas Program (TCGA) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) study. Across both datasets, they looked at the expression levels of 1,044 genes of interest, using various mathematical approaches to test different numerical schemes that best explained the data.
In the second part of the study, the researchers took several steps to validate EMBER. Firstly, they tested data from the and showed that the samples projected to the expected regions of the EMBER space according to their molecular subtypes. Then, they looked at the activity of various molecular pathways, finding that EMBER was able to capture information beyond the molecular subtypes, including details about cell proliferation.
The researchers were able to associate this information with survival rates among patients receiving endocrine therapy, which revealed certain markers linked to poorer outcomes. In the final stage, EMBER was shown to be superior to the currently used immunohistochemistry-based index in predicting responses to endocrine therapy.
“It outperforms the existing options”
EMBER was primarily developed by Carlos Ronchi, a PhD student at École Polytechnique Fédérale de Lausanne, Lausanne (EPFL). At the time, Ronchi was working in Professor Cathrin Brisken’s lab at EPFL, using his extensive mathematical knowledge to help the team overcome hurdles in the field of breast cancer. He has since moved to an AI company in Chicago.
Professor Brisken, who is senior author of the study, Group Leader of the Endocrine Control Mechanisms Group at the ICR and Associate Professor of Life Sciences at EPFL, said:
“Our story is remarkable in that Carlos joined my biology-based lab at EPFL because of a previous publication we had in applied maths. He learnt all about breast cancer during his PhD, so we were able to benefit from his mathematical prowess. His role is a nice example of how a transdisciplinary PhD – which the ICR can offer through its Cancer Research UK Convergence Science Initiative with Imperial – can open entirely new possibilities that benefit not only the student themselves but also research teams and, in the longer term, clinicians and patients.
“Carlos has successfully developed an approach that places the major databases into a common space. This study has shown that it is possible to add additional cohorts into this space and even individual samples, with their position in the EMBER space providing additional biological information.”
Dr Syed Haider, second author and Group Leader of the Breast Cancer Research Bioinformatics Group at the ICR, said:
“There have previously been many efforts to integrate big data in breast cancer datasets, but their application to clinical samples has been somewhat limited. The tool that Carlos has developed for data integration is different; it provides a formal basis for understanding the aggressivity of a new patient’s tumour against a large knowledge base created from retrospective patient cohorts. We are very excited to see how it can be used in the future to help clinical decision making in breast cancer.
“Now, in theory, any time a patient is diagnosed with breast cancer and RNA sequencing can be performed on their biopsy, the sample can be placed in the EMBER space, and different prognostic and predictive factors can be determined. Until now, this has not been feasible because large numbers of samples must be accumulated and run in a single batch in order to extract enough information about the tumour to guide clinical decision making.”