Dr Luis Zapata Ortiz
Group Leader: Evolutionary Immunogenomics
Biography
Dr Luis Zapata Ortiz leads the Evolutionary Immunogenomics Group at the ICR. His work has focused on developing immunogenomic-based metrics to predict response to immune checkpoint inhibitor therapies. His groundbreaking research, including publications in esteemed journals such as Nature Genetics and Genome Biology, has shed light on the clinical impact of immunoediting and the strength of immune selection.
His academic journey began with a major in Biotechnology Engineering at the University of Chile, a short-stay in University of California in Davis, followed by a Ph.D. in Biomedicine at the Centre for Genomic Regulation in Barcelona. As a Marie Curie Postdoctoral Fellow at the ICR, Luis made significant contributions to understanding how evolution shapes the genome of cells within our bodies, particularly focusing on the impact of the immune system on genetic variability.
He possesses a diverse skill set, including expertise in mathematical modelling, evolutionary theory, population genetics, and proficiency in computational biology. Dr Zapata Ortiz excels in implementing statistical and machine learning algorithms and has extensive experience analysing large datasets using high-performance computing clusters.
Furthermore, Dr Zapata Ortiz has made significant contributions to the understanding of genetic variation in cancer genomes and healthy somatic tissues. His studies have demonstrated negative selection forces against alterations in essential genes and regions exposed to the immune system in the cancer genome. He also explored signals of selection on healthy somatic tissues and developed a mathematical model of dN/dS using stochastic branching processes.
Dr Zapata Ortiz's multidisciplinary expertise, ranging from mathematical modelling and population genetics to computational analysis and genomics, positions him as a leading expert in the field. His contributions have advanced our understanding of evolution's impact on the genome and its implications for disease.
Outside the lab, he enjoys time with his family, swimming, playing football and tennis with friends and competing in poker tournaments.
Related pages
Types of Publications
Journal articles
In cancer, evolutionary forces select for clones that evade the immune system. Here we analyzed >10,000 primary tumors and 356 immune-checkpoint-treated metastases using immune dN/dS, the ratio of nonsynonymous to synonymous mutations in the immunopeptidome, to measure immune selection in cohorts and individuals. We classified tumors as immune edited when antigenic mutations were removed by negative selection and immune escaped when antigenicity was covered up by aberrant immune modulation. Only in immune-edited tumors was immune predation linked to CD8 T cell infiltration. Immune-escaped metastases experienced the best response to immunotherapy, whereas immune-edited patients did not benefit, suggesting a preexisting resistance mechanism. Similarly, in a longitudinal cohort, nivolumab treatment removes neoantigens exclusively in the immunopeptidome of nonimmune-edited patients, the group with the best overall survival response. Our work uses dN/dS to differentiate between immune-edited and immune-escaped tumors, measuring potential antigenicity and ultimately helping predict response to treatment.
Remarkable progress in molecular analyses has improved our understanding of the evolution of cancer cells toward immune escape<sup>1-5</sup>. However, the spatial configurations of immune and stromal cells, which may shed light on the evolution of immune escape across tumor geographical locations, remain unaddressed. We integrated multiregion exome and RNA-sequencing (RNA-seq) data with spatial histology mapped by deep learning in 100 patients with non-small cell lung cancer from the TRACERx cohort<sup>6</sup>. Cancer subclones derived from immune cold regions were more closely related in mutation space, diversifying more recently than subclones from immune hot regions. In TRACERx and in an independent multisample cohort of 970 patients with lung adenocarcinoma, tumors with more than one immune cold region had a higher risk of relapse, independently of tumor size, stage and number of samples per patient. In lung adenocarcinoma, but not lung squamous cell carcinoma, geometrical irregularity and complexity of the cancer-stromal cell interface significantly increased in tumor regions without disruption of antigen presentation. Decreased lymphocyte accumulation in adjacent stroma was observed in tumors with low clonal neoantigen burden. Collectively, immune geospatial variability elucidates tumor ecological constraints that may shape the emergence of immune-evading subclones and aggressive clinical phenotypes.
Colorectal malignancies are a leading cause of cancer-related death<sup>1 </sup>and have undergone extensive genomic study<sup>2,3</sup>. However, DNA mutations alone do not fully explain malignant transformation<sup>4-7</sup>. Here we investigate the co-evolution of the genome and epigenome of colorectal tumours at single-clone resolution using spatial multi-omic profiling of individual glands. We collected 1,370 samples from 30 primary cancers and 8 concomitant adenomas and generated 1,207 chromatin accessibility profiles, 527 whole genomes and 297 whole transcriptomes. We found positive selection for DNA mutations in chromatin modifier genes and recurrent somatic chromatin accessibility alterations, including in regulatory regions of cancer driver genes that were otherwise devoid of genetic mutations. Genome-wide alterations in accessibility for transcription factor binding involved CTCF, downregulation of interferon and increased accessibility for SOX and HOX transcription factor families, suggesting the involvement of developmental genes during tumourigenesis. Somatic chromatin accessibility alterations were heritable and distinguished adenomas from cancers. Mutational signature analysis showed that the epigenome in turn influences the accumulation of DNA mutations. This study provides a map of genetic and epigenetic tumour heterogeneity, with fundamental implications for understanding colorectal cancer biology.
The distribution of fitness effects (DFE) defines how new mutations spread through an evolving population. The ratio of non-synonymous to synonymous mutations (dN/dS) has become a popular method to detect selection in somatic cells. However the link, in somatic evolution, between dN/dS values and fitness coefficients is missing. Here we present a quantitative model of somatic evolutionary dynamics that determines the selective coefficients of individual driver mutations from dN/dS estimates. We then measure the DFE for somatic mutant clones in ostensibly normal oesophagus and skin. We reveal a broad distribution of fitness effects, with the largest fitness increases found for TP53 and NOTCH1 mutants (proliferative bias 1-5%). This study provides the theoretical link between dN/dS values and selective coefficients in somatic evolution, and measures the DFE of mutations in human tissues.
<h4>Background</h4>Immunotherapy with immune checkpoint inhibitors (ICIs) is highly effective in microsatellite instability-high (MSI-H) metastatic colorectal cancer (mCRC); however, specific predictive biomarkers are lacking.<h4>Patients and methods</h4>Data and samples from 85 patients with MSI-H mCRC treated with ICIs were gathered. Tumor infiltrating lymphocytes (TILs) and tumor mutational burden (TMB) were analyzed in an exploratory cohort of "super" responders and "clearly" refractory patients; TILs were then evaluated in the whole cohort of patients. Primary objectives were the correlation between the number of TILs and TMB and their role as biomarkers of ICI efficacy. Main endpoints included response rate (RR), progression-free survival (PFS), and overall survival (OS).<h4>Results</h4>In the exploratory cohort, an increasing number of TILs correlated to higher TMB (Pearson's test, p = .0429). In the whole cohort, median number of TILs was 3.6 in responders compared with 1.8 in nonresponders (Mann-Whitney test, p = .0448). RR was 70.6% in patients with high number of TILs (TILs-H) compared with 42.9% in patients with low number of TILs (odds ratio = 3.20, p = .0291). Survival outcomes differed significantly in favor of TILs-H (PFS: hazard ratio [HR] = 0.42, p = .0278; OS: HR = 0.41, p = .0463).<h4>Conclusion</h4>A significant correlation between higher TMB and increased number of TILs was shown. A significantly higher activity and better PFS and OS with ICI in MSI-H mCRC were reported in cases with high number of TILs, thus supporting further studies of TIL count as predictive biomarker of ICI efficacy.<h4>Implications for practice</h4>Microsatellite instability is the result of mismatch repair protein deficiency, caused by germline mutations or somatic modifications in mismatch repair genes. In metastatic colorectal cancer (mCRC), immunotherapy (with immune checkpoint inhibitors [ICIs]) demonstrated remarkable clinical benefit in microsatellite instability-high (MSI-H) patients. ICI primary resistance has been observed in approximately 25% of patients with MSI-H mCRC, underlining the need for predictive biomarkers. In this study, tumor mutational burden (TMB) and tumor infiltrating lymphocyte (TIL) analyses were performed in an exploratory cohort of patients with MSI-H mCRC treated with ICIs, demonstrating a significant correlation between higher TMB and increased number of TILs. Results also demonstrated a significant correlation between high number of TILs and clinical responses and survival benefit in a large data set of patients with MSI-H mCRC treated with ICI. TMB and TILs could represent predictive biomarkers of ICI efficacy in MSI-H mCRC and should be incorporated in future trials testing checkpoint inhibitors in colorectal cancer.
Genetic and epigenetic variation, together with transcriptional plasticity, contribute to intratumour heterogeneity<sup>1</sup>. The interplay of these biological processes and their respective contributions to tumour evolution remain unknown. Here we show that intratumour genetic ancestry only infrequently affects gene expression traits and subclonal evolution in colorectal cancer (CRC). Using spatially resolved paired whole-genome and transcriptome sequencing, we find that the majority of intratumour variation in gene expression is not strongly heritable but rather 'plastic'. Somatic expression quantitative trait loci analysis identified a number of putative genetic controls of expression by cis-acting coding and non-coding mutations, the majority of which were clonal within a tumour, alongside frequent structural alterations. Consistently, computational inference on the spatial patterning of tumour phylogenies finds that a considerable proportion of CRCs did not show evidence of subclonal selection, with only a subset of putative genetic drivers associated with subclone expansions. Spatial intermixing of clones is common, with some tumours growing exponentially and others only at the periphery. Together, our data suggest that most genetic intratumour variation in CRC has no major phenotypic consequence and that transcriptional plasticity is, instead, widespread within a tumour.
Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.
Circulating tumour DNA (ctDNA) allows tracking of the evolution of human cancers at high resolution, overcoming many limitations of tissue biopsies. However, exploiting ctDNA to determine how a patient's cancer is evolving in order to aid clinical decisions remains difficult. This is because ctDNA is a mix of fragmented alleles, and the contribution of different cancer deposits to ctDNA is largely unknown. Profiling ctDNA almost invariably requires prior knowledge of what genomic alterations to track. Here, we leverage on a rapid autopsy programme to demonstrate that unbiased genomic characterisation of several metastatic sites and concomitant ctDNA profiling at whole-genome resolution reveals the extent to which ctDNA is representative of widespread disease. We also present a methylation profiling method that allows tracking evolutionary changes in ctDNA at single-molecule resolution without prior knowledge. These results have critical implications for the use of liquid biopsies to monitor cancer evolution in humans and guide treatment.
Cancers accumulate mutations that lead to neoantigens, novel peptides that elicit an immune response, and consequently undergo evolutionary selection. Here we establish how negative selection shapes the clonality of neoantigens in a growing cancer by constructing a mathematical model of neoantigen evolution. The model predicts that, without immune escape, tumor neoantigens are either clonal or at low frequency; hypermutated tumors can only establish after the evolution of immune escape. Moreover, the site frequency spectrum of somatic variants under negative selection appears more neutral as the strength of negative selection increases, which is consistent with classical neutral theory. These predictions are corroborated by the analysis of neoantigen frequencies and immune escape in exome and RNA sequencing data from 879 colon, stomach and endometrial cancers.
Single-cell RNA sequencing studies on gene co-expression patterns could yield important regulatory and functional insights, but have so far been limited by the confounding effects of differentiation and cell cycle. We apply a tailored experimental design that eliminates these confounders, and report thousands of intrinsically covarying gene pairs in mouse embryonic stem cells. These covariations form a network with biological properties, outlining known and novel gene interactions. We provide the first evidence that miRNAs naturally induce transcriptome-wide covariations and compare the relative importance of nuclear organization, transcriptional and post-transcriptional regulation in defining covariations. We find that nuclear organization has the greatest impact, and that genes encoding for physically interacting proteins specifically tend to covary, suggesting importance for protein complex formation. Our results lend support to the concept of post-transcriptional RNA operons, but we further present evidence that nuclear proximity of genes may provide substantial functional regulation in mammalian single cells.
<h4>Background</h4>Mosaic mutations acquired during early embryogenesis can lead to severe early-onset genetic disorders and cancer predisposition, but are often undetectable in blood samples. The rate and mutational spectrum of embryonic mosaic mutations (EMMs) have only been studied in few tissues, and their contribution to genetic disorders is unknown. Therefore, we investigated how frequent mosaic mutations occur during embryogenesis across all germ layers and tissues.<h4>Methods</h4>Mosaic mutation detection in 49 normal tissues from 570 individuals (Genotype-Tissue Expression (GTEx) cohort) was performed using a newly developed multi-tissue, multi-individual variant calling approach for RNA-seq data. Our method allows for reliable identification of EMMs and the developmental stage during which they appeared.<h4>Results</h4>The analysis of EMMs in 570 individuals revealed that newborns on average harbor 0.5-1 EMMs in the exome affecting multiple organs (1.3230 × 10<sup>-8</sup> per nucleotide per individual), a similar frequency as reported for germline de novo mutations. Our multi-tissue, multi-individual study design allowed us to distinguish mosaic mutations acquired during different stages of embryogenesis and adult life, as well as to provide insights into the rate and spectrum of mosaic mutations. We observed that EMMs are dominated by a mutational signature associated with spontaneous deamination of methylated cytosines and the number of cell divisions. After birth, cells continue to accumulate somatic mutations, which can lead to the development of cancer. Investigation of the mutational spectrum of the gastrointestinal tract revealed a mutational pattern associated with the food-borne carcinogen aflatoxin, a signature that has so far only been reported in liver cancer.<h4>Conclusions</h4>In summary, our multi-tissue, multi-individual study reveals a surprisingly high number of embryonic mosaic mutations in coding regions, implying novel hypotheses and diagnostic procedures for investigating genetic causes of disease and cancer predisposition.
In recent years, next-generation sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors, and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the AB distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of the human exome, which detects false positive variant calls that passed state-of-the-art filters. Finally, we demonstrate the use of ABB for detection of false associations proposed by rare variant association studies. Availability: https://github.com/Francesc-Muyas/ABB.
Colorectal adenomas are common precancerous lesions with the potential for malignant transformation to colorectal adenocarcinoma. Endoscopic polypectomy provides an opportunity for cancer prevention; however, recurrence rates are high. We collected formalin-fixed paraffin-embedded tissue of 15 primary adenomas with recurrence, 15 adenomas without recurrence, and 14 matched pair samples (primary adenoma and the corresponding recurrent adenoma). The samples were analysed by array-comparative genomic hybridisation (aCGH) and single-cell multiplex interphase fluorescence in situ hybridisation (miFISH) to understand clonal evolution, to examine the dynamics of copy number alterations (CNAs) and to identify molecular markers for recurrence prediction. The miFISH probe panel consisted of 14 colorectal carcinogenesis-relevant genes (COX2, PIK3CA, APC, CLIC1, EGFR, MYC, CCND1, CDX2, CDH1, TP53, HER2, SMAD7, SMAD4 and ZNF217), and a centromere probe (CEP10). The aCGH analysis confirmed the genetic landscape typical for colorectal tumorigenesis, that is, CNAs of chromosomes 7, 13q, 18 and 20q. Focal aberrations (≤10 Mbp) were mapped to chromosome bands 6p22.1-p21.33 (33.3%), 7q22.1 (31.4%) and 16q21 (29.4%). MiFISH detected gains of EGFR (23.6%), CDX2 (21.8%) and ZNF217 (18.2%). Most adenomas exhibited a major clone population which was accompanied by multiple smaller clone populations. Gains of CDX2 were exclusively seen in primary adenomas with recurrence (25%) compared to primary adenomas without recurrence (0%). Generation of phylogenetic trees for matched pair samples revealed four distinct patterns of clonal dynamics. In conclusion, adenoma development and recurrence are complex genetic processes driven by multiple CNAs whose evaluations by miFISH, with emphasis on CDX2, might serve as a predictor of recurrence.
<h4>Background</h4>Natural selection shapes cancer genomes. Previous studies used signatures of positive selection to identify genes driving malignant transformation. However, the contribution of negative selection against somatic mutations that affect essential tumor functions or specific domains remains a controversial topic.<h4>Results</h4>Here, we analyze 7546 individual exomes from 26 tumor types from TCGA data to explore the portion of the cancer exome under negative selection. Although we find most of the genes neutrally evolving in a pan-cancer framework, we identify essential cancer genes and immune-exposed protein regions under significant negative selection. Moreover, our simulations suggest that the amount of negative selection is underestimated. We therefore choose an empirical approach to identify genes, functions, and protein regions under negative selection. We find that expression and mutation status of negatively selected genes is indicative of patient survival. Processes that are most strongly conserved are those that play fundamental cellular roles such as protein synthesis, glucose metabolism, and molecular transport. Intriguingly, we observe strong signals of selection in the immunopeptidome and proteins controlling peptide exposition, highlighting the importance of immune surveillance evasion. Additionally, tumor type-specific immune activity correlates with the strength of negative selection on human epitopes.<h4>Conclusions</h4>In summary, our results show that negative selection is a hallmark of cell essentiality and immune response in cancer. The functional domains identified could be exploited therapeutically, ultimately allowing for the development of novel cancer treatments.
Tumors are composed of an evolving population of cells subjected to tissue-specific selection, which fuels tumor heterogeneity and ultimately complicates cancer driver gene identification. Here, we integrate cancer cell fraction, population recurrence, and functional impact of somatic mutations as signatures of selection into a Bayesian model for driver prediction. We demonstrate that our model, cDriver, outperforms competing methods when analyzing solid tumors, hematological malignancies, and pan-cancer datasets. Applying cDriver to exome sequencing data of 21 cancer types from 6,870 individuals revealed 98 unreported tumor type-driver gene connections. These novel connections are highly enriched for chromatin-modifying proteins, hinting at a universal role of chromatin regulation in cancer etiology. Although infrequently mutated as single genes, we show that chromatin modifiers are altered in a large fraction of cancer patients. In summary, we demonstrate that integration of evolutionary signatures is key for identifying mutational driver genes, thereby facilitating the discovery of novel therapeutic targets for cancer treatment.
Sézary syndrome is a leukemic form of cutaneous T-cell lymphoma with an aggressive clinical course. The genetic etiology of the disease is poorly understood, with chromosomal abnormalities and mutations in some genes being involved in the disease. The goal of our study was to understand the genetic basis of the disease by looking for driver gene mutations and fusion genes in 15 erythrodermic patients with circulating Sézary cells, 14 of them fulfilling the diagnostic criteria of Sézary syndrome. We have discovered genes that could be involved in the pathogenesis of Sézary syndrome. Some of the genes that are affected by somatic point mutations include ITPR1, ITPR2, DSC1, RIPK2, IL6, and RAG2, with some of them mutated in more than one patient. We observed several somatic copy number variations shared between patients, including deletions and duplications of large segments of chromosome 17. Genes with potential function in the T-cell receptor signaling pathway and tumorigenesis were disrupted in Sézary syndrome patients, for example, CBLB, RASA2, BCL7C, RAMP3, TBRG4, and DAD1. Furthermore, we discovered several fusion events of interest involving RASA2, NFKB2, BCR, FASN, ZEB1, TYK2, and SGMS1. Our work has implications for the development of potential therapeutic approaches for this aggressive disease.
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
<h4>Objectives</h4>Here we report on a long-term outbreak from 2009 to 2012 with an XDR Pseudomonas aeruginosa on two wards at a university hospital in southern Germany.<h4>Methods</h4>Whole-genome sequencing was performed on the outbreak isolates and a core genome was constructed for molecular epidemiological analysis. We applied a time-place-sequence algorithm to improve estimation of transmission probabilities.<h4>Results</h4>By using conventional infection control methods we identified 49 P. aeruginosa strains, including eight environmental isolates that belonged to ST308 (by MLST) and carried the metallo-β-lactamase IMP-8. Phylogenetic analysis on the basis of a non-recombinant core genome that contained 22 outbreak-specific SNPs revealed a pattern of four dominant clades with a strong phylogeographic structure and allowed us to determine the potential temporal origin of the outbreak to July 2008, 1 year before the index case was diagnosed. Superspreaders at the root of clades exhibited a high number of probable and predicted transmissions, indicating their exceptional position in the outbreak.<h4>Conclusions</h4>Our results suggest that the initial expansion of dominant sublineages was driven by a few superspreaders, while environmental contamination seemed to sustain the outbreak for a long period despite regular environmental control measures.
Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ∼80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ∼25-50 bp, which is significantly shorter than the length of CO-associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs. DOI: http://dx.doi.org/10.7554/eLife.01426.001.
SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/
<h4>Background</h4>Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases.<h4>Results</h4>Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions.<h4>Conclusions</h4>We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.
Despite the concern of within-tumor genetic diversity, this diversity is in fact limited by the kinship among cells in the tumor. Indeed, genomic studies have amply supported the 'Nowell dogma' whereby cells of the same tumor descend from a single progenitor cell. In parallel, genomic data also suggest that the diversity could be >10-fold larger if tumor cells are of multiple origins. We develop an evolutionary hypothesis that a single tumor may often harbor multiple cell clones of independent origins, but only one would be large enough to be detected. To test the hypothesis, we search for independent tumors within a larger one (or tumors-in-tumor). Very high density sampling was done on two cases of colon tumors. Case 1 indeed has 13 independent clones of disparate sizes, many having heavy mutation burdens and potentially highly tumorigenic. In Case 2, despite a very intensive search, only two small independent clones could be found. The two cases show very similar movements and metastasis of the dominant clone. Cells initially move actively in the expanding tumor but become nearly immobile in late stages. In conclusion, tumors-in-tumor are plausible but could be very demanding to find. Despite their small sizes, they can enhance the within-tumor diversity by orders of magnitude. Such increases may contribute to the missing genetic diversity associated with the resistance to cancer therapy.
<h4>Background</h4>Carcinogenesis is driven by interactions between genetic mutations and the local tumor microenvironment. Recent research has identified hundreds of cancer driver genes; however, these studies often include a mixture of different molecular subtypes and ecological niches and ignore the impact of the immune system.<h4>Results</h4>In this study, we compare the landscape of driver genes in tumors that escaped the immune system (escape +) versus those that did not (escape -). We analyze 9896 primary tumors from The Cancer Genome Atlas using the ratio of non-synonymous to synonymous mutations (dN/dS) and find 85 driver genes, including 27 and 16 novel genes, in escape - and escape + tumors, respectively. The dN/dS of driver genes in immune escaped tumors is significantly lower and closer to neutrality than in non-escaped tumors, suggesting selection buffering in driver genes fueled by immune escape. Additionally, we find that immune evasion leads to more mutated sites, a diverse array of mutational signatures and is linked to tumor prognosis.<h4>Conclusions</h4>Our findings highlight the need for improved patient stratification to identify new therapeutic targets for cancer treatment.
Mismatch repair (MMR)-deficient cancer evolves through the stepwise erosion of coding homopolymers in target genes. Curiously, the MMR genes MutS homolog 6 (MSH6) and MutS homolog 3 (MSH3) also contain coding homopolymers, and these are frequent mutational targets in MMR-deficient cancers. The impact of incremental MMR mutations on MMR-deficient cancer evolution is unknown. Here we show that microsatellite instability modulates DNA repair by toggling hypermutable mononucleotide homopolymer runs in MSH6 and MSH3 through stochastic frameshift switching. Spontaneous mutation and reversion modulate subclonal mutation rate, mutation bias and HLA and neoantigen diversity. Patient-derived organoids corroborate these observations and show that MMR homopolymer sequences drift back into reading frame in the absence of immune selection, suggesting a fitness cost of elevated mutation rates. Combined experimental and simulation studies demonstrate that subclonal immune selection favors incremental MMR mutations. Overall, our data demonstrate that MMR-deficient colorectal cancers fuel intratumor heterogeneity by adapting subclonal mutation rate and diversity to immune selection.
Cancer evolution lays the groundwork for predictive oncology. Testing evolutionary metrics requires quantitative measurements in controlled clinical trials. We mapped genomic intratumor heterogeneity in locally advanced prostate cancer using 642 samples from 114 individuals enrolled in clinical trials with a 12-year median follow-up. We concomitantly assessed morphological heterogeneity using deep learning in 1,923 histological sections from 250 individuals. Genetic and morphological (Gleason) diversity were independent predictors of recurrence (hazard ratio (HR) = 3.12 and 95% confidence interval (95% CI) = 1.34-7.3; HR = 2.24 and 95% CI = 1.28-3.92). Combined, they identified a group with half the median time to recurrence. Spatial segregation of clones was also an independent marker of recurrence (HR = 2.3 and 95% CI = 1.11-4.8). We identified copy number changes associated with Gleason grade and found that chromosome 6p loss correlated with reduced immune infiltration. Matched profiling of relapse, decades after diagnosis, confirmed that genomic instability is a driving force in prostate cancer progression. This study shows that combining genomics with artificial intelligence-aided histopathology leads to the identification of clinical biomarkers of evolution.