Dr Stephen-John Sammut
Group Leader: Cancer Dynamics
OrcID: 0000-0003-4472-904X
Phone: +44 20 3437 6036
Email: [email protected]
Also on: @stephensammut
Location: Chelsea
OrcID: 0000-0003-4472-904X
Phone: +44 20 3437 6036
Email: [email protected]
Also on: @stephensammut
Location: ChelseaBiography
Dr Stephen John Sammut graduated in medicine from the University of Malta in 2008. He developed an interest in computational biology while reading medicine and was awarded two scholarships by the Wellcome Trust Sanger Institute in Cambridge to apply novel statistical methods for protein sequence classification and analysis. His work remains part of the InterPro consortium database, which is regarded as the gold standard resource for protein family and domain information.
After completing general medical training in Cambridge in 2012, Dr Sammut commenced NIHR-funded joint clinical and academic specialist training in Medical Oncology at Cambridge University Hospitals, the EMBL-European Bioinformatics Institute and the Cancer Research UK (CRUK) Cambridge Institute. Here, he co-developed a systems biology computational framework for analysing large-scale biophysical models within their anatomical context and designed computational algorithms that leverage network graph theory to identify druggable protein targets in breast cancer.
In 2014, Dr Sammut commenced a Wellcome Trust funded PhD in breast cancer genomics at the University of Cambridge and the CRUK Cambridge Institute. Here, he specialised in the molecular characterisation of early and metastatic breast cancer. Dr Sammut charted the molecular evolution of early breast cancer during treatment with neoadjuvant chemotherapy and showed that response to therapy was associated with distinct tumour ecosystem evolutionary trajectories. In addition, his work in metastatic breast cancer showed that the adaptive immune system co-evolves with the tumour genome, providing further support to the cancer immunoediting hypothesis. In recognition of his outstanding scientific and translational research, he was awarded the Milo Keynes Prize and the Salje Medal by the University of Cambridge.
Following completion of his PhD in 2018, Dr Sammut was awarded a postdoctoral Academic Clinical Lectureship in breast cancer by the University of Cambridge. Here, he characterised the biological processes associated with response to chemo- and targeted therapies in early breast cancer and developed the first machine learning framework that combined genomic, transcriptomic and digital pathology data from diagnostic cancer biopsies to predict response to therapy. This major advance in personalised precision breast cancer medicine resulted in a landmark publication in Nature, which was cited as one of the top 10 cancer research publications by the European Association for Cancer Researchers in 2022. In recognition of this work, Dr Sammut received several awards, including the Scholar-in-Training Award by the American Association for Cancer Researchers, the McElwain Prize by the UK Association for Cancer Physicians, and the Whitney-Wood Cancer Scholarship by the Royal College of Physicians.
Dr Sammut joined the ICR in November 2022, and his research interest lies in developing and implementing methods that enable the delivery of personalised cancer medicine, including the prediction of response to treatment by using dynamic biomarker technologies that integrate serially acquired multiplatform data to model tumour biology as it is perturbed by treatment.
Related pages
Types of Publications
Journal articles
Breast cancers are complex ecosystems of malignant cells and the tumour microenvironment<sup>1</sup>. The composition of these tumour ecosystems and interactions within them contribute to responses to cytotoxic therapy<sup>2</sup>. Efforts to build response predictors have not incorporated this knowledge. We collected clinical, digital pathology, genomic and transcriptomic profiles of pre-treatment biopsies of breast tumours from 168 patients treated with chemotherapy with or without HER2 (encoded by ERBB2)-targeted therapy before surgery. Pathology end points (complete response or residual disease) at surgery<sup>3</sup> were then correlated with multi-omic features in these diagnostic biopsies. Here we show that response to treatment is modulated by the pre-treated tumour ecosystem, and its multi-omics landscape can be integrated in predictive models using machine learning. The degree of residual disease following therapy is monotonically associated with pre-therapy features, including tumour mutational and copy number landscapes, tumour proliferation, immune infiltration and T cell dysfunction and exclusion. Combining these features into a multi-omic machine learning model predicted a pathological complete response in an external validation cohort (75 patients) with an area under the curve of 0.87. In conclusion, response to therapy is determined by the baseline characteristics of the totality of the tumour ecosystem captured through data integration and machine learning. This approach could be used to develop predictors for other cancers.
DNA methylation is aberrant in cancer, but the dynamics, regulatory role and clinical implications of such epigenetic changes are still poorly understood. Here, reduced representation bisulfite sequencing (RRBS) profiles of 1538 breast tumors and 244 normal breast tissues from the METABRIC cohort are reported, facilitating detailed analysis of DNA methylation within a rich context of genomic, transcriptional, and clinical data. Tumor methylation from immune and stromal signatures are deconvoluted leading to the discovery of a tumor replication-linked clock with genome-wide methylation loss in non-CpG island sites. Unexpectedly, methylation in most tumor CpG islands follows two replication-independent processes of gain (MG) or loss (ML) that we term epigenomic instability. Epigenomic instability is correlated with tumor grade and stage, TP53 mutations and poorer prognosis. After controlling for these global trans-acting trends, as well as for X-linked dosage compensation effects, cis-specific methylation and expression correlations are uncovered at hundreds of promoters and over a thousand distal elements. Some of these targeted known tumor suppressors and oncogenes. In conclusion, this study demonstrates that global epigenetic instability can erode cancer methylomes and expose them to localized methylation aberrations in-cis resulting in transcriptional changes seen in tumors.
The rates and routes of lethal systemic spread in breast cancer are poorly understood owing to a lack of molecularly characterized patient cohorts with long-term, detailed follow-up data. Long-term follow-up is especially important for those with oestrogen-receptor (ER)-positive breast cancers, which can recur up to two decades after initial diagnosis<sup>1-6</sup>. It is therefore essential to identify patients who have a high risk of late relapse<sup>7-9</sup>. Here we present a statistical framework that models distinct disease stages (locoregional recurrence, distant recurrence, breast-cancer-related death and death from other causes) and competing risks of mortality from breast cancer, while yielding individual risk-of-recurrence predictions. We apply this model to 3,240 patients with breast cancer, including 1,980 for whom molecular data are available, and delineate spatiotemporal patterns of relapse across different categories of molecular information (namely immunohistochemical subtypes; PAM50 subtypes, which are based on gene-expression patterns<sup>10,11</sup>; and integrative or IntClust subtypes, which are based on patterns of genomic copy-number alterations and gene expression<sup>12,13</sup>). We identify four late-recurring integrative subtypes, comprising about one quarter (26%) of tumours that are both positive for ER and negative for human epidermal growth factor receptor 2, each with characteristic tumour-driving alterations in genomic copy number and a high risk of recurrence (mean 47-62%) up to 20 years after diagnosis. We also define a subgroup of triple-negative breast cancers in which cancer rarely recurs after five years, and a separate subgroup in which patients remain at risk. Use of the integrative subtypes improves the prediction of late, distant relapse beyond what is possible with clinical covariates (nodal status, tumour size, tumour grade and immunohistochemical subtype). These findings highlight opportunities for improved patient stratification and biomarker-driven clinical trials.
Purpose The glomerular filtration rate (GFR) is essential for carboplatin chemotherapy dosing; however, the best method to estimate GFR in patients with cancer is unknown. We identify the most accurate and least biased method. Methods We obtained data on age, sex, height, weight, serum creatinine concentrations, and results for GFR from chromium-51 (<sup>51</sup>Cr) EDTA excretion measurements (<sup>51</sup>Cr-EDTA GFR) from white patients ≥ 18 years of age with histologically confirmed cancer diagnoses at the Cambridge University Hospital NHS Trust, United Kingdom. We developed a new multivariable linear model for GFR using statistical regression analysis. <sup>51</sup>Cr-EDTA GFR was compared with the estimated GFR (eGFR) from seven published models and our new model, using the statistics root-mean-squared-error (RMSE) and median residual and on an internal and external validation data set. We performed a comparison of carboplatin dosing accuracy on the basis of an absolute percentage error > 20%. Results Between August 2006 and January 2013, data from 2,471 patients were obtained. The new model improved the eGFR accuracy (RMSE, 15.00 mL/min; 95% CI, 14.12 to 16.00 mL/min) compared with all published models. Body surface area (BSA)-adjusted chronic kidney disease epidemiology (CKD-EPI) was the most accurate published model for eGFR (RMSE, 16.30 mL/min; 95% CI, 15.34 to 17.38 mL/min) for the internal validation set. Importantly, the new model reduced the fraction of patients with a carboplatin dose absolute percentage error > 20% to 14.17% in contrast to 18.62% for the BSA-adjusted CKD-EPI and 25.51% for the Cockcroft-Gault formula. The results were externally validated. Conclusion In a large data set from patients with cancer, BSA-adjusted CKD-EPI is the most accurate published model to predict GFR. The new model improves this estimation and may present a new standard of care.
The inter- and intra-tumor heterogeneity of breast cancer needs to be adequately captured in pre-clinical models. We have created a large collection of breast cancer patient-derived tumor xenografts (PDTXs), in which the morphological and molecular characteristics of the originating tumor are preserved through passaging in the mouse. An integrated platform combining in vivo maintenance of these PDTXs along with short-term cultures of PDTX-derived tumor cells (PDTCs) was optimized. Remarkably, the intra-tumor genomic clonal architecture present in the originating breast cancers was mostly preserved upon serial passaging in xenografts and in short-term cultured PDTCs. We assessed drug responses in PDTCs on a high-throughput platform and validated several ex vivo responses in vivo. The biobank represents a powerful resource for pre-clinical breast cancer pharmacogenomic studies (http://caldaslab.cruk.cam.ac.uk/bcape), including identification of biomarkers of response or resistance.
The genomic landscape of breast cancer is complex, and inter- and intra-tumour heterogeneity are important challenges in treating the disease. In this study, we sequence 173 genes in 2,433 primary breast tumours that have copy number aberration (CNA), gene expression and long-term clinical follow-up data. We identify 40 mutation-driver (Mut-driver) genes, and determine associations between mutations, driver CNA profiles, clinical-pathological parameters and survival. We assess the clonal states of Mut-driver mutations, and estimate levels of intra-tumour heterogeneity using mutant-allele fractions. Associations between PIK3CA mutations and reduced survival are identified in three subgroups of ER-positive cancer (defined by amplification of 17q23, 11q13-14 or 8q24). High levels of intra-tumour heterogeneity are in general associated with a worse outcome, but highly aggressive tumours with 11q13-14 amplification have low levels of intra-tumour heterogeneity. These results emphasize the importance of genome-based stratification of breast cancer, and have important implications for designing therapeutic strategies.
Complex focal chromosomal rearrangements in cancer genomes, also called "firestorms", can be scored from DNA copy number data. The complex arm-wise aberration index (CAAI) is a score that captures DNA copy number alterations that appear as focal complex events in tumors, and has potential prognostic value in breast cancer. This study aimed to validate this DNA-based prognostic index in breast cancer and test for the first time its potential prognostic value in ovarian cancer. Copy number alteration (CNA) data from 1950 breast carcinomas (METABRIC cohort) and 508 high-grade serous ovarian carcinomas (TCGA dataset) were analyzed. Cases were classified as CAAI positive if at least one complex focal event was scored. Complex alterations were frequently localized on chromosome 8p (n = 159), 17q (n = 176) and 11q (n = 251). CAAI events on 11q were most frequent in estrogen receptor positive (ER+) cases and on 17q in estrogen receptor negative (ER-) cases. We found only a modest correlation between CAAI and the overall rate of genomic instability (GII) and number of breakpoints (r = 0.27 and r = 0.42, p < 0.001). Breast cancer specific survival (BCSS), overall survival (OS) and ovarian cancer progression free survival (PFS) were used as clinical end points in Cox proportional hazard model survival analyses. CAAI positive breast cancers (43%) had higher mortality: hazard ratio (HR) of 1.94 (95%CI, 1.62-2.32) for BCSS, and of 1.49 (95%CI, 1.30-1.71) for OS. Representations of the 70-gene and the 21-gene predictors were compared with CAAI in multivariable models and CAAI was independently significant with a Cox adjusted HR of 1.56 (95%CI, 1.23-1.99) for ER+ and 1.55 (95%CI, 1.11-2.18) for ER- disease. None of the expression-based predictors were prognostic in the ER- subset. We found that a model including CAAI and the two expression-based prognostic signatures outperformed a model including the 21-gene and 70-gene signatures but excluding CAAI. Inclusion of CAAI in the clinical prognostication tool PREDICT significantly improved its performance. CAAI positive ovarian cancers (52%) also had worse prognosis: HRs of 1.3 (95%CI, 1.1-1.7) for PFS and 1.3 (95%CI, 1.1-1.6) for OS. This study validates CAAI as an independent predictor of survival in both ER+ and ER- breast cancer and reveals a significant prognostic value for CAAI in high-grade serous ovarian cancer.
The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
<h4>Background</h4>Patient-Derived Tumour Xenografts (PDTXs) have emerged as the pre-clinical models that best represent clinical tumour diversity and intra-tumour heterogeneity. The molecular characterization of PDTXs using High-Throughput Sequencing (HTS) is essential; however, the presence of mouse stroma is challenging for HTS data analysis. Indeed, the high homology between the two genomes results in a proportion of mouse reads being mapped as human.<h4>Results</h4>In this study we generated Whole Exome Sequencing (WES), Reduced Representation Bisulfite Sequencing (RRBS) and RNA sequencing (RNA-seq) data from samples with known mixtures of mouse and human DNA or RNA and from a cohort of human breast cancers and their derived PDTXs. We show that using an In silico Combined human-mouse Reference Genome (ICRG) for alignment discriminates between human and mouse reads with up to 99.9% accuracy and decreases the number of false positive somatic mutations caused by misalignment by >99.9%. We also derived a model to estimate the human DNA content in independent PDTX samples. For RNA-seq and RRBS data analysis, the use of the ICRG allows dissecting computationally the transcriptome and methylome of human tumour cells and mouse stroma. In a direct comparison with previously reported approaches, our method showed similar or higher accuracy while requiring significantly less computing time.<h4>Conclusions</h4>The computational pipeline we describe here is a valuable tool for the molecular analysis of PDTXs as well as any other mixture of DNA or RNA species.
Pathology archives with linked clinical data are an invaluable resource for translational research, with the limitation that most cancer samples are formalin-fixed paraffin-embedded (FFPE) tissues. Therefore, FFPE tissues are an important resource for genomic profiling studies but are under-utilised due to the low amount and quality of extracted nucleic acids. We profiled the copy number landscape of 356 breast cancer patients using DNA extracted FFPE tissues by shallow whole genome sequencing. We generated a total of 491 sequencing libraries from 2 kits and obtained data from 98.4% of libraries with 86.4% being of good quality. We generated libraries from as low as 3.8 ng of input DNA and found that the success was independent of input DNA amount and quality, processing site and age of the fixed tissues. Since copy number alterations (CNA) play a major role in breast cancer, it is imperative that we are able to use FFPE archives and we have shown in this study that sWGS is a robust method to do such profiling.
Circulating tumour DNA (ctDNA) detection and monitoring have enormous potential clinical utility in oncology. We describe here a fast, flexible and cost-effective method to profile multiple genes simultaneously in low input cell-free DNA (cfDNA): Next Generation-Targeted Amplicon Sequencing (NG-TAS). We designed a panel of 377 amplicons spanning 20 cancer genes and tested the NG-TAS pipeline using cell-free DNA from two HapMap lymphoblastoid cell lines. NG-TAS consistently detected mutations in cfDNA when mutation allele fraction was > 1%. We applied NG-TAS to a clinical cohort of metastatic breast cancer patients, demonstrating its potential in monitoring the disease. The computational pipeline is available at https://github.com/cclab-brca/NGTAS_pipeline .
<h4>Background</h4>Tumor-infiltrating lymphocytes (TILs) represent a prognostic factor for survival in primary breast cancer (BC). Nonetheless, neoepitope load and TILs cytolytic activity are modest in BC, compromising the efficacy of immune-activating antibodies, which do not yet compete against immunogenic chemotherapy.<h4>Patients and methods</h4>We analyzed by functional flow cytometry the immune dynamics of primary and metastatic axillary nodes [metastatic lymph nodes (mLN)] in early BC (EBC) after exposure to T-cell bispecific antibodies (TCB) bridging CD3ε and human epidermal growth factor receptor 2 (HER2) or Carcinoembryonic Antigen-Related Cell Adhesion Molecule 5 (CEACAM5), before and after chemotherapy. Human leukocyte antigen (HLA) class I loss was assessed by whole exome sequencing and immunohistochemistry. One hundred primary BC, 64 surrounding 'healthy tissue' and 24 mLN-related parameters were analyzed.<h4>Results</h4>HLA loss of heterozygosity was observed in EBC, at a clonal and subclonal level and was associated with regulatory T cells and T-cell immunoglobulin and mucin-domain-3 expression restraining the immuno-stimulatory effects of neoadjuvant chemotherapy. TCB bridging CD3ε and HER2 or CEACAM5 could bypass major histocompatibility complex (MHC) class I loss, partially rescuing T-cell functions in mLN.<h4>Conclusion</h4>TCB should be developed in BC to circumvent low MHC/peptide complexes.
The detailed molecular characterization of lethal cancers is a prerequisite to understanding resistance to therapy and escape from cancer immunoediting. We performed extensive multi-platform profiling of multi-regional metastases in autopsies from 10 patients with therapy-resistant breast cancer. The integrated genomic and immune landscapes show that metastases propagate and evolve as communities of clones, reveal their predicted neo-antigen landscapes, and show that they can accumulate HLA loss of heterozygosity (LOH). The data further identify variable tumor microenvironments and reveal, through analyses of T cell receptor repertoires, that adaptive immune responses appear to co-evolve with the metastatic genomes. These findings reveal in fine detail the landscapes of lethal metastatic breast cancer.
Molecular profiling of breast cancer has enabled the development of more robust molecular prognostic signatures and therapeutic options for breast cancer patients. However, non-Caucasian populations remain understudied. Here, we present the mutational, transcriptional, and copy number profiles of 560 Malaysian breast tumours and a comparative analysis of breast cancers arising in Asian and Caucasian women. Compared to breast tumours in Caucasian women, we show an increased prevalence of HER2-enriched molecular subtypes and higher prevalence of TP53 somatic mutations in ER+ Asian breast tumours. We also observe elevated immune scores in Asian breast tumours, suggesting potential clinical response to immune checkpoint inhibitors. Whilst HER2-subtype and enriched immune score are associated with improved survival, presence of TP53 somatic mutations is associated with poorer survival in ER+ tumours. Taken together, these population differences unveil opportunities to improve the understanding of this disease and lay the foundation for precision medicine in different populations.
The biology of breast cancer response to neoadjuvant therapy is underrepresented in the literature and provides a window-of-opportunity to explore the genomic and microenvironment modulation of tumours exposed to therapy. Here, we characterised the mutational, gene expression, pathway enrichment and tumour-infiltrating lymphocytes (TILs) dynamics across different timepoints of 35 HER2-negative primary breast cancer patients receiving neoadjuvant eribulin therapy (SOLTI-1007 NEOERIBULIN-NCT01669252). Whole-exome data (N = 88 samples) generated mutational profiles and candidate neoantigens and were analysed along with RNA-Nanostring 545-gene expression (N = 96 samples) and stromal TILs (N = 105 samples). Tumour mutation burden varied across patients at baseline but not across the sampling timepoints for each patient. Mutational signatures were not always conserved across tumours. There was a trend towards higher odds of response and less hazard to relapse when the percentage of subclonal mutations was low, suggesting that more homogenous tumours might have better responses to neoadjuvant therapy. Few driver mutations (5.1%) generated putative neoantigens. Mutation and neoantigen load were positively correlated (R<sup>2</sup> = 0.94, p = <0.001); neoantigen load was weakly correlated with stromal TILs (R<sup>2</sup> = 0.16, p = 0.02). An enrichment in pathways linked to immune infiltration and reduced programmed cell death expression were seen after 12 weeks of eribulin in good responders. VEGF was downregulated over time in the good responder group and FABP5, an inductor of epithelial mesenchymal transition (EMT), was upregulated in cases that recurred (p < 0.05). Mutational heterogeneity, subclonal architecture and the improvement of immune microenvironment along with remodelling of hypoxia and EMT may influence the response to neoadjuvant treatment.
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type-specific expression changes: 6711 genes and 10,724 transcripts, enriched in non-protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation-the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.
Invasive lobular carcinoma (ILC) is the second most frequently occurring histological breast cancer subtype after invasive ductal carcinoma (IDC), accounting for around 10% of all breast cancers. The molecular processes that drive the development of ILC are still largely unknown. We have performed a comprehensive genomic, transcriptomic and proteomic analysis of a large ILC patient cohort and present here an integrated molecular portrait of ILC. Mutations in CDH1 and in the PI3K pathway are the most frequent molecular alterations in ILC. We identified two main subtypes of ILCs: (i) an immune related subtype with mRNA up-regulation of PD-L1, PD-1 and CTLA-4 and greater sensitivity to DNA-damaging agents in representative cell line models; (ii) a hormone related subtype, associated with Epithelial to Mesenchymal Transition (EMT), and gain of chromosomes 1q and 8q and loss of chromosome 11q. Using the somatic mutation rate and eIF4B protein level, we identified three groups with different clinical outcomes, including a group with extremely good prognosis. We provide a comprehensive overview of the molecular alterations driving ILC and have explored links with therapy response. This molecular characterization may help to tailor treatment of ILC through the application of specific targeted, chemo- and/or immune-therapies.
<h4>Background</h4>MammaPrint® is a microarray-based gene expression test cleared by the US Food and Drug Administration to assess recurrence risk in early-stage breast cancer, aimed to guide physicians in making neoadjuvant and adjuvant treatment decisions. The increase in the incidence of invasive lobular carcinomas (ILCs) over the past decades and the modest representation of ILC in the MammaPrint development data set calls for a stratified survival analysis dedicated to this specific subgroup.<h4>Study aim</h4>The current study aimed to validate the prognostic value of the MammaPrint test for breast cancer patients with early-stage ILCs.<h4>Materials and methods</h4>Univariate and multivariate survival associations for overall survival (OS), distant metastasis-free interval (DMFI), and distant metastasis-free survival (DMFS) were studied in a study population of 217 early-stage ILC breast cancer patients from five different clinical studies.<h4>Results and discussion</h4>A significant association between MammaPrint High Risk and poor clinical outcome was shown for OS, DMFI, and DMFS. A subanalysis was performed on the lymph node-negative study population. In the lymph node-negative study population, we report an up to 11 times higher change in the diagnosis of an event in the MammaPrint High Risk group. For DMFI, the reported hazard ratio is 11.1 (95% confidence interval = 2.3-53.0).<h4>Conclusion</h4>Study results validate MammaPrint as an independent factor for breast cancer patients with early-stage invasive lobular breast cancer. Hazard ratios up to 11 in multivariate analyses emphasize the independent value of MammaPrint, specifically in lymph node-negative ILC breast cancers.
Bioinformatic analysis of genomic sequencing data to identify somatic mutations in cancer samples is far from achieving the required robustness and standardisation. In this study we generated a whole exome sequencing benchmark dataset using the platinum genome sample NA12878 and developed an intersect-then-combine (ITC) approach to increase the accuracy in calling single nucleotide variants (SNVs) and indels in tumour-normal pairs. We evaluated the effect of alignment, base quality recalibration, mutation caller and filtering on sensitivity and false positive rate. The ITC approach increased the sensitivity up to 17.1%, without increasing the false positive rate per megabase (FPR/Mb) and its validity was confirmed in a set of clinical samples.
The use of circulating DNA(ctDNA) to provide a non-invasive, personalised genomic snapshot of a patients' tumour has huge potential. Over the past five years this area of research has gained huge momentum. A number of studies in metastatic breast cancer have shown the potential of ctDNA to predict prognosis and treatment response using ctDNA. Further developments have included deeper sequencing using whole exome and shallow whole genome approaches which has the potential to identify new mutations and chromosomal copy number changes which appear upon resistance to treatment. In early breast cancer, recent work utilising personalised digital PCR probes has shown huge potential in predicting disease relapse and the detection of micrometastatic disease which could lead to improved treatment and outcome for these patients. Specific pathways of resistance can also be monitored and liquid biopsy approaches for the detection of ESR1 mutations have been used which could identify patients who have become resistant to particular endocrine therapies. The identification of PIK3CA mutations in plasma has also been shown to predict a higher response rate to specific PI3K inhibitors and could be used as a non-invasive screening tool prior to treatment. Further work on the detection of exosomal miRNA and hypermethylated DNA in plasma have shown promise in terms of specificity for early breast cancer detection and could be used to monitor treatment response. This review will focus on technological advances in the field, early detection of relapse and the detection of tumour-specific genomic alterations which could predict treatment response and resistance in patients with breast cancer.
Longitudinal analysis of circulating tumor DNA (ctDNA) has shown promise for monitoring treatment response. However, most current methods lack adequate sensitivity for residual disease detection during or after completion of treatment in patients with nonmetastatic cancer. To address this gap and to improve sensitivity for minute quantities of residual tumor DNA in plasma, we have developed targeted digital sequencing (TARDIS) for multiplexed analysis of patient-specific cancer mutations. In reference samples, by simultaneously analyzing 8 to 16 known mutations, TARDIS achieved 91 and 53% sensitivity at mutant allele fractions (AFs) of 3 in 10<sup>4</sup> and 3 in 10<sup>5</sup>, respectively, with 96% specificity, using input DNA equivalent to a single tube of blood. We successfully analyzed up to 115 mutations per patient in 80 plasma samples from 33 women with stage I to III breast cancer. Before treatment, TARDIS detected ctDNA in all patients with 0.11% median AF. After completion of neoadjuvant therapy, ctDNA concentrations were lower in patients who achieved pathological complete response (pathCR) compared to patients with residual disease (median AFs, 0.003 and 0.017%, respectively, <i>P</i> = 0.0057, AUC = 0.83). In addition, patients with pathCR showed a larger decrease in ctDNA concentrations during neoadjuvant therapy. These results demonstrate high accuracy for assessment of molecular response and residual disease during neoadjuvant therapy using ctDNA analysis. TARDIS has achieved up to 100-fold improvement beyond the current limit of ctDNA detection using clinically relevant blood volumes, demonstrating that personalized ctDNA tracking could enable individualized clinical management of patients with cancer treated with curative intent.
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).
A significant proportion of biomedical resources carries information that cross references to anatomical structures across multiple scales. To improve the visualization of such resources in their anatomical context, we developed an automated methodology that produces anatomy schematics in a consistent manner,and provides for the overlay of anatomy-related resource information onto the same diagram. This methodology, called ApiNATOMY, draws upon the topology of ontology graphs to automatically lay out treemaps representing body parts as well as semantic metadata linking to such ontologies. More generally, ApiNATOMY treemaps provide an efficient and manageable way to visualize large biomedical ontologies in a meaningful and consistent manner. In the anatomy domain, such treemaps will allow epidemiologists, clinicians, and biomedical scientists to review, and interact with, anatomically aggregated heterogeneous data and model resources. Such an approach supports the visual identification of functional relations between anatomically colocalized resources that may not be immediately amenable to automation by ontology-based inferencing. We also describe the application of ApiNATOMY schematics to integrate, and add value to, human phenotype-related information—results are found at http://apinatomy.org. The long-term goal for the ApiNATOMY toolkit is to support clinical and scientific graphical user interfaces and dashboards for biomedical resource management and data analytics.
<h4>Background</h4>Neutropenic fever in patients receiving chemotherapy is a medical emergency and should be treated promptly within 1 h with antibiotics as specified within the 2009 NCAG report on chemotherapy services.<h4>Aim</h4>To determine door-to-assessment, door-to-treatment and door-to-investigation intervals for patients with febrile neutropenia who presented to the inpatient Oncology Ward, the outpatient Oncology Day Unit and the Emergency Department in Addenbrooke's Hospital, Cambridge.<h4>Design</h4>Retrospective observational audit.<h4>Methods</h4>Thirty-two patients on treatment for solid cancers who were admitted with febrile neutropenia between January and December 2010 were identified, and paper and electronic medical records were analysed to determine door to: assessment, treatment and investigation intervals.<h4>Results and conclusions</h4>Patients in this series were assessed quicker and received the first dose of antibiotics faster when they presented to an oncology ward rather than the emergency department. However, imaging was performed faster and blood results issued quicker if performed in the emergency department due to a better infrastructure that has been tailored to comply with national targets. Nonetheless, compliance with optimum standards of care was poor, with only 9% of sampled patients getting antibiotics within 1 h of presenting to hospital, and 53% within 1 h of being assessed by a clinician.
<h4>Background</h4>Previous studies have independently validated the prognostic relevance of residual cancer burden (RCB) after neoadjuvant chemotherapy. We used results from several independent cohorts in a pooled patient-level analysis to evaluate the relationship of RCB with long-term prognosis across different phenotypic subtypes of breast cancer, to assess generalisability in a broad range of practice settings.<h4>Methods</h4>In this pooled analysis, 12 institutes and trials in Europe and the USA were identified by personal communications with site investigators. We obtained participant-level RCB results, and data on clinical and pathological stage, tumour subtype and grade, and treatment and follow-up in November, 2019, from patients (aged ≥18 years) with primary stage I-III breast cancer treated with neoadjuvant chemotherapy followed by surgery. We assessed the association between the continuous RCB score and the primary study outcome, event-free survival, using mixed-effects Cox models with the incorporation of random RCB and cohort effects to account for between-study heterogeneity, and stratification to account for differences in baseline hazard across cancer subtypes defined by hormone receptor status and HER2 status. The association was further evaluated within each breast cancer subtype in multivariable analyses incorporating random RCB and cohort effects and adjustments for age and pretreatment clinical T category, nodal status, and tumour grade. Kaplan-Meier estimates of event-free survival at 3, 5, and 10 years were computed for each RCB class within each subtype.<h4>Findings</h4>We analysed participant-level data from 5161 patients treated with neoadjuvant chemotherapy between Sept 12, 1994, and Feb 11, 2019. Median age was 49 years (IQR 20-80). 1164 event-free survival events occurred during follow-up (median follow-up 56 months [IQR 0-186]). RCB score was prognostic within each breast cancer subtype, with higher RCB score significantly associated with worse event-free survival. The univariable hazard ratio (HR) associated with one unit increase in RCB ranged from 1·55 (95% CI 1·41-1·71) for hormone receptor-positive, HER2-negative patients to 2·16 (1·79-2·61) for the hormone receptor-negative, HER2-positive group (with or without HER2-targeted therapy; p<0·0001 for all subtypes). RCB score remained prognostic for event-free survival in multivariable models adjusted for age, grade, T category, and nodal status at baseline: the adjusted HR ranged from 1·52 (1·36-1·69) in the hormone receptor-positive, HER2-negative group to 2·09 (1·73-2·53) in the hormone receptor-negative, HER2-positive group (p<0·0001 for all subtypes).<h4>Interpretation</h4>RCB score and class were independently prognostic in all subtypes of breast cancer, and generalisable to multiple practice settings. Although variability in hormone receptor subtype definitions and treatment across patients are likely to affect prognostic performance, the association we observed between RCB and a patient's residual risk suggests that prospective evaluation of RCB could be considered to become part of standard pathology reporting after neoadjuvant therapy.<h4>Funding</h4>National Cancer Institute at the US National Institutes of Health.
The American Association for Cancer Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) is an international pan-cancer registry with the goal to inform cancer research and clinical care worldwide. Founded in late 2015, the milestone GENIE 9.1-public release contains data from >110,000 tumors from >100,000 people treated at 19 cancer centers from the United States, Canada, the United Kingdom, France, the Netherlands, and Spain. Here, we demonstrate the use of these real-world data, harmonized through a centralized data resource, to accurately predict enrollment on genome-guided trials, discover driver alterations in rare tumors, and identify cancer types without actionable mutations that could benefit from comprehensive genomic analysis. The extensible data infrastructure and governance framework support additional deep patient phenotyping through biopharmaceutical collaborations and expansion to include new data types such as cell-free DNA sequencing. AACR Project GENIE continues to serve a global precision medicine knowledge base of increasing impact to inform clinical decision-making and bring together cancer researchers internationally.<h4>Significance</h4>AACR Project GENIE has now accrued data from >110,000 tumors, placing it among the largest repository of publicly available, clinically annotated genomic data in the world. GENIE has emerged as a powerful resource to evaluate genome-guided clinical trial design, uncover drivers of cancer subtypes, and inform real-world use of genomic data. This article is highlighted in the In This Issue feature, p. 2007.
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.
Analysis of circulating tumor DNA (ctDNA) to monitor cancer dynamics and detect minimal residual disease has been an area of increasing interest. Multiple methods have been proposed but few studies have compared the performance of different approaches. Here, we compare detection of ctDNA in serial plasma samples from patients with breast cancer using different tumor-informed and tumor-naïve assays designed to detect structural variants (SVs), single nucleotide variants (SNVs), and/or somatic copy-number aberrations, by multiplex PCR, hybrid capture, and different depths of whole-genome sequencing. Our results demonstrate that the ctDNA dynamics and allele fractions (AFs) were highly concordant when analyzing the same patient samples using different assays. Tumor-informed assays showed the highest sensitivity for detection of ctDNA at low concentrations. Hybrid capture sequencing targeting between 1,347 and 7,491 tumor-identified mutations at high depth was the most sensitive assay, detecting ctDNA down to an AF of 0.00024% (2.4 parts per million, ppm). Multiplex PCR targeting 21-47 tumor-identified SVs per patient detected ctDNA down to 0.00047% AF (4.7 ppm) and has potential as a clinical assay.
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgb.ki.se/Pfam/, in France at http://pfam.jouy.inra.fr/ and in the US at http://pfam.wustl.edu/. The latest version (6.6) of Pfam contains 3071 families, which match 69% of proteins in SWISS-PROT 39 and TrEMBL 14. Structural data, where available, have been utilised to ensure that Pfam families correspond with structural domains, and to improve domain-based annotation. Predictions of non-domain regions are now also included. In addition to secondary structure, Pfam multiple sequence alignments now contain active site residue mark-up. New search tools, including taxonomy search and domain query, greatly add to the functionality and usability of the Pfam resource.
B cells and T cells are important components of the adaptive immune system and mediate anticancer immunity. The T cell landscape in cancer is well characterized, but the contribution of B cells to anticancer immunosurveillance is less well explored. Here we show an integrative analysis of the B cell and T cell receptor repertoire from individuals with metastatic breast cancer and individuals with early breast cancer during neoadjuvant therapy. Using immune receptor, RNA and whole-exome sequencing, we show that both B cell and T cell responses seem to coevolve with the metastatic cancer genomes and mirror tumor mutational and neoantigen architecture. B cell clones associated with metastatic immunosurveillance and temporal persistence were more expanded and distinct from site-specific clones. B cell clonal immunosurveillance and temporal persistence are predictable from the clonal structure, with higher-centrality B cell antigen receptors more likely to be detected across multiple metastases or across time. This predictability was generalizable across other immune-mediated disorders. This work lays a foundation for prioritizing antibody sequences for therapeutic targeting in cancer.
Advances in artificial intelligence have paved the way for leveraging hematoxylin and eosin-stained tumor slides for precision oncology. We present ENLIGHT-DeepPT, an indirect two-step approach consisting of (1) DeepPT, a deep-learning framework that predicts genome-wide tumor mRNA expression from slides, and (2) ENLIGHT, which predicts response to targeted and immune therapies from the inferred expression values. We show that DeepPT successfully predicts transcriptomics in all 16 The Cancer Genome Atlas cohorts tested and generalizes well to two independent datasets. ENLIGHT-DeepPT successfully predicts true responders in five independent patient cohorts involving four different treatments spanning six cancer types, with an overall odds ratio of 2.28 and a 39.5% increased response rate among predicted responders versus the baseline rate. Notably, its prediction accuracy, obtained without any training on the treatment data, is comparable to that achieved by directly predicting the response from the images, which requires specific training on the treatment evaluation cohorts.