Dr Norman Davey
Group Leader: Short Linear Motif
OrcID: 0000-0001-6988-4850
Phone: +44 20 3437 7662
Email: [email protected]
Also on: daveylab
Location: Chelsea
OrcID: 0000-0001-6988-4850
Phone: +44 20 3437 7662
Email: [email protected]
Also on: daveylab
Location: ChelseaBiography
Dr Norman Davey received his PhD (2009) from the Conway Institute of Biomolecular & Biomedical Research at University College Dublin, Ireland, working on short, linear motif discovery methods.
He subsequently moved to the European Molecular Biology Laboratory (EMBL), Heidelberg, Germany, as an EIPOD postdoctoral fellow to work on various aspects of motif biology including the prominent role of SLiMs in regulatory decision making, splice isoform-specific functionality, and viral pathogenesis.
In 2013, he joined the Department of Physiology at the University of California, San Francisco (UCSF) as a postdoctoral fellow with Professor David O. Morgan characterizing novel motifs in the cell cycle. In September 2014, he returned to University College Dublin to start his own group studying motif function.
Dr Davey continues to utilise evolutionary, proteomic, and genomic data to examine two major open questions about intrinsically disordered regions: (i) what are the modules that are responsible for their functionality and (ii) how do perturbations in the cell modulate the functionality of these modules.
Related pages
Types of Publications
Journal articles
After two decades of research, intrinsically disordered regions (IDRs) are established as a widespread phenomenon. The growing understanding of the significant functional role of IDRs has challenged the structure-function paradigm, proving irrefutably that a stably folded structure is not a strict requirement for function. Nonetheless, (un)structure-function relationships remain at the core of IDR-mediated interactions. An IDR can populate a continuously transitioning continuum of structural conformations from fully disordered to stable globular states. In these ensembles, only subsets of conformations are binding competent, with intramolecular IDR contacts serving as important intermolecular binding determinants. Here, we review our current understanding of different types of intramolecular IDR interactions, their effects on IDR complex formation and their modes of biological regulation.
The anaphase-promoting complex or cyclosome (APC/C) is a ubiquitin ligase that polyubiquitinates specific substrates at precise times in the cell cycle, thereby triggering the events of late mitosis in a strict order. The robust substrate specificity of the APC/C prevents the potentially deleterious degradation of non-APC/C substrates and also averts the cell-cycle errors and genomic instability that could result from mistimed degradation of APC/C targets. The APC/C recognizes short linear sequence motifs, or degrons, on its substrates. The specific and timely modification and degradation of APC/C substrates is likely to be modulated by variations in degron sequence and context. We discuss the extensive affinity, specificity, and selectivity determinants encoded in APC/C degrons, and we describe some of the extrinsic mechanisms that control APC/C-substrate recognition. As an archetype for protein motif-driven regulation of cell function, the APC/C-substrate interaction provides insights into the general properties of post-translational regulatory systems.
Low-throughput experiments and high-throughput proteomic and genomic analyses have created enormous quantities of data that can be used to explore protein function and evolution. The ability to consolidate these data into an informative and intuitive format is vital to our capacity to comprehend these distinct but complementary sources of information. However, existing tools to visualize protein-related data are restricted by their presentation, sources of information, functionality or accessibility. We introduce ProViz, a powerful browser-based tool to aid biologists in building hypotheses and designing experiments by simplifying the analysis of functional and evolutionary features of proteins. Feature information is retrieved in an automated manner from resources describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterization of functional regions. These features are mapped to evolutionary information from precomputed multiple sequence alignments. Data are displayed in an interactive and information-rich yet intuitive visualization, accessible through a simple protein search interface. This allows users with limited bioinformatic skills to rapidly access data pertinent to their research. Visualizations can be further customized with user-defined data either manually or using a REST API. ProViz is available at http://proviz.ucd.ie/.
A substantial portion of the regulatory interactions in the higher eukaryotic cell are mediated by simple sequence motifs in the regulatory segments of genes and (pre-)mRNAs, and in the intrinsically disordered regions of proteins. Although these regulatory modules are physicochemically distinct, they share an evolutionary plasticity that has facilitated a rapid growth of their use and resulted in their ubiquity in complex organisms. The ease of motif acquisition simplifies access to basal housekeeping functions, facilitates the co-regulation of multiple biomolecules allowing them to respond in a coordinated manner to changes in the cell state, and supports the integration of multiple signals for combinatorial decision-making. Consequently, motifs are indispensable for temporal, spatial, conditional and basal regulation at the transcriptional, post-transcriptional and post-translational level. In this review, we highlight that many of the key regulatory pathways of the cell are recruited by motifs and that the ease of motif acquisition has resulted in large networks of co-regulated biomolecules. We discuss how co-operativity allows simple static motifs to perform the conditional regulation that underlies decision-making in higher eukaryotic biological systems. We observe that each gene and its products have a unique set of DNA, RNA or protein motifs that encode a regulatory program to define the logical circuitry that guides the life cycle of these biomolecules, from transcription to degradation. Finally, we contrast the regulatory properties of protein motifs and the regulatory elements of DNA and (pre-)mRNAs, advocating that co-regulation, co-operativity, and motif-driven regulatory programs are common mechanisms that emerge from the use of simple, evolutionarily plastic regulatory modules.
Short sequence motifs are ubiquitous across the three major types of biomolecules: hundreds of classes and thousands of instances of DNA regulatory elements, RNA motifs and protein short linear motifs (SLiMs) have been characterised. The increase in complexity of transcriptional, post-transcriptional and post-translational regulation in higher Eukaryotes has coincided with a significant expansion of motif use. But how did the eukaryotic cell acquire such a vast repertoire of motifs? In this review, we curate the available literature on protein motif evolution and discuss the evidence that suggests SLiMs can be acquired by mutations, insertions and deletions in disordered regions. We propose a mechanism of ex nihilo SLiM evolution - the evolution of a novel SLiM from "nothing" - adding a functional module to a previously non-functional region of protein sequence. In our model, hundreds of motif-binding domains in higher eukaryotic proteins connect simple motif specificities with useful functions to create a large functional motif space. Accessible peptides that match the specificity of these motif-binding domains are continuously created and destroyed by mutations in rapidly evolving disordered regions, creating a dynamic supply of new interactions that may have advantageous phenotypic novelty. This provides a reservoir of diversity to modify existing interaction networks. Evolutionary pressures will act on these motifs to retain beneficial instances. However, most will be lost on an evolutionary timescale as negative selection and genetic drift act on deleterious and neutral motifs respectively. In light of the parallels between the presented model and the evolution of motifs in the regulatory segments of genes and (pre-)mRNAs, we suggest our understanding of regulatory networks would benefit from the creation of a shared model describing the evolution of transcriptional, post-transcriptional and post-translational regulation.
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.
Traditionally, protein-protein interactions were thought to be mediated by large, structured domains. However, it has become clear that the interactome comprises a wide range of binding interfaces with varying degrees of flexibility, ranging from rigid globular domains to disordered regions that natively lack structure. Enrichment for disorder in highly connected hub proteins and its correlation with organism complexity hint at the functional importance of disordered regions. Nevertheless, they have not yet been extensively characterised. Shifting the attention from globular domains to disordered regions of the proteome might bring us closer to elucidating the dense and complex connectivity of the interactome. An important class of disordered interfaces are the compact mono-partite, short linear motifs (SLiMs, or eukaryotic linear motifs (ELMs)). They are evolutionarily plastic and interact with relatively low affinity due to the limited number of residues that make direct contact with the binding partner. These features confer to SLiMs the ability to evolve convergently and mediate transient interactions, which is imperative to network evolution and to maintain robust cell signalling, respectively. The ability to discriminate biologically relevant SLiMs by means of different attributes will improve our understanding of the complexity of the interactome and aid development of bioinformatics tools for motif discovery. In this paper, the curated instances currently available in the Eukaryotic Linear Motif (ELM) database are analysed to provide a clear overview of the defining attributes of SLiMs. These analyses suggest that functional SLiMs have higher levels of conservation than their surrounding residues, frequently evolve convergently, preferentially occur in disordered regions and often form a secondary structure when bound to their interaction partner. These results advocate searching for small groupings of residues in disordered regions with higher relative conservation and a propensity to form the secondary structure. Finally, the most interesting conclusions are examined in regard to their functional consequences.
Viruses, as obligate intracellular parasites, are the pathogens that have the most intimate relationship with their host, and as such, their genomes have been shaped directly by interactions with the host proteome. Every step of the viral life cycle, from entry to budding, is orchestrated through interactions with cellular proteins. Accordingly, viruses will hijack and manipulate these proteins utilising any achievable mechanism. Yet, the extensive interactions of viral proteomes has yielded a conundrum: how do viruses commandeer so many diverse pathways and processes, given the obvious spatial constraints imposed by their compact genomes? One important approach is slowly being revealed, the extensive mimicry of host protein short linear motifs (SLiMs).
<h4>Background</h4>Large datasets of protein interactions provide a rich resource for the discovery of Short Linear Motifs (SLiMs) that recur in unrelated proteins. However, existing methods for estimating the probability of motif recurrence may be biased by the size and composition of the search dataset, such that p-value estimates from different datasets, or from motifs containing different numbers of non-wildcard positions, are not strictly comparable. Here, we develop more exact methods and explore the potential biases of computationally efficient approximations.<h4>Results</h4>A widely used heuristic for the calculation of motif over-representation approximates motif probability by assuming that all proteins have the same length and composition. We introduce pv, which calculates the probability exactly. Secondly, the recently introduced SLiMFinder statistic Sig, accounts for multiple testing (across all possible motifs) in motif discovery. However, it approximates the probability of all other possible motifs, occurring with a score of p or less, as being equal to p. Here, we show that the exhaustive calculation of the probability of all possible motif occurrences that are as rare or rarer than the motif of interest, Sig', may be carried out efficiently by grouping motifs of a common probability (i.e. those which have permuted orders of the same residues). Sig'v, which corrects both approximations, is shown to be uniformly distributed in a random dataset when searching for non-ambiguous motifs, indicating that it is a robust significance measure.<h4>Conclusions</h4>A method is presented to compute exactly the true probability of a non-ambiguous short protein sequence motif, and the utility of an approximate approach for novel motif discovery across a large number of datasets is demonstrated.
Short linear motifs (SLiMs) are protein binding modules that play major roles in almost all cellular processes. SLiMs are short, often highly degenerate, difficult to characterize and hard to detect. The eukaryotic linear motif (ELM) resource (elm.eu.org) is dedicated to SLiMs, consisting of a manually curated database of over 275 motif classes and over 3000 motif instances, and a pipeline to discover candidate SLiMs in protein sequences. For 15 years, ELM has been one of the major resources for motif research. In this database update, we present the latest additions to the database including 32 new motif classes, and new features including Uniprot and Reactome integration. Finally, to help provide cellular context, we present some biological insights about SLiMs in the cell cycle, as targets for bacterial pathogenicity and their functionality in the human kinome.
The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.
The extensive intrinsically disordered regions of higher eukaryotic proteomes contain vast numbers of functional interaction modules known as short linear motifs (SLiMs). Here, we present SLiMSearch, a motif discovery tool that scans a motif consensus, representing the specificity determinants of a motif-binding domain, against a proteome to discover putative novel motif instances. SLiMSearch applies several distinct and complementary approaches exploiting the common properties of SLiMs to predict novel motifs. Consensus matches are annotated with overlapping sequence annotation, including feature information describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterisation of functional regions. Discriminatory motif attributes such as conservation and accessibility are also calculated. In addition, SLiMSearch provides functional enrichment and evolutionary analysis tools. The enrichment tool analyses GO terms, keywords and interacting partner enrichment to indicate possible motif function. The evolutionary tool evaluates motif taxonomic range and the conservation of motif sequence context. Consensus matches can be filtered based on motif attributes such as accessibility and taxonomic range; or by the localisation, interacting partners or ontology annotation of the peptide-containing protein. SLiMSearch supports a range of species of experimental and therapeutic relevance and is available online at http://slim.ucd.ie/slimsearch/.
The anaphase-promoting complex or cyclosome (APC/C) is the ubiquitin ligase that regulates mitosis by targeting specific proteins for degradation at specific times under the control of the spindle assembly checkpoint (SAC). How the APC/C recognizes its different substrates is a key problem in the control of cell division. Here, we have identified the ABBA motif in cyclin A, BUBR1, BUB1, and Acm1, and we show that it binds to the APC/C coactivator CDC20. The ABBA motif in cyclin A is required for its proper degradation in prometaphase through competing with BUBR1 for the same site on CDC20. Moreover, the ABBA motifs in BUBR1 and BUB1 are necessary for the SAC to work at full strength and to recruit CDC20 to kinetochores. Thus, we have identified a conserved motif integral to the proper control of mitosis that connects APC/C substrate recognition with the SAC.
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Polyphosphates (polyP) are chains of inorganic phosphates found in all cells. Previous work has implicated these chains in diverse functions, but the mechanism of action is unclear. A recent study reports that polyP can be non-enzymatically and covalently attached to lysine residues on yeast proteins Nsr1 and Top1. One question emerging from this work is whether so-called "polyphosphorylation" is unique to these proteins or instead functions as a global regulator akin to other lysine post-translational modifications. Here, we present the results of a screen for polyphosphorylated proteins in yeast. We uncovered 15 targets including a conserved network of proteins functioning in ribosome biogenesis. Multiple genes contribute to polyphosphorylation of targets by regulating polyP synthesis, and disruption of this synthesis results in translation defects as measured by polysome profiling. Finally, we identify 6 human proteins that can be modified by polyP, highlighting the therapeutic potential of manipulating polyphosphorylation in vivo.
Studies over the past decade have highlighted the functional significance of intrinsically disordered proteins (IDPs). Due to conformational heterogeneity and inherent dynamics, structural studies of IDPs have relied mostly on NMR spectroscopy, despite IDPs having characteristics that make them challenging to study using traditional <sup>1</sup>H-detected biomolecular NMR techniques. Here, we develop a suite of 3D <sup>15</sup>N-detected experiments that take advantage of the slower transverse relaxation property of <sup>15</sup>N nuclei, the associated narrower linewidth, and the greater chemical shift dispersion compared with those of <sup>1</sup>H and <sup>13</sup>C resonances. The six 3D experiments described here start with aliphatic <sup>1</sup>H magnetization to take advantage of its higher initial polarization, and are broadly applicable for backbone assignment of proteins that are disordered, dynamic, or have unfavorable amide proton exchange rates. Using these experiments, backbone resonance assignments were completed for the unstructured regulatory domain (residues 131-294) of the human transcription factor nuclear factor of activated T cells (NFATC2), which includes 28 proline residues located in functionally important serine-proline (SP) repeats. The complete assignment of the NFATC2 regulatory domain enabled us to study phosphorylation of NFAT by kinase PKA and phosphorylation-dependent binding of chaperone protein 14-3-3 to NFAT, providing mechanistic insight on how 14-3-3 regulates NFAT nuclear translocation.
Transcription of the Ebola virus genome depends on the viral transcription factor VP30 in its unphosphorylated form, but the underlying molecular mechanism of VP30 dephosphorylation is unknown. Here we show that the Ebola virus nucleoprotein (NP) recruits the host PP2A-B56 protein phosphatase through a B56-binding LxxIxE motif and that this motif is essential for VP30 dephosphorylation and viral transcription. The LxxIxE motif and the binding site of VP30 in NP are in close proximity, and both binding sites are required for the dephosphorylation of VP30. We generate a specific inhibitor of PP2A-B56 and show that it suppresses Ebola virus transcription and infection. This work dissects the molecular mechanism of VP30 dephosphorylation by PP2A-B56, and it pinpoints this phosphatase as a potential target for therapeutic intervention.
The intrinsically disordered regions of eukaryotic proteomes are enriched in short linear motifs (SLiMs), which are of crucial relevance for cellular signaling and protein regulation; many mediate interactions by providing binding sites for peptide-binding domains. The vast majority of SLiMs remain to be discovered highlighting the need for experimental methods for their large-scale identification. We present a novel proteomic peptide phage display (ProP-PD) library that displays peptides representing the disordered regions of the human proteome, allowing direct large-scale interrogation of most potential binding SLiMs in the proteome. The performance of the ProP-PD library was validated through selections against SLiM-binding bait domains with distinct folds and binding preferences. The vast majority of identified binding peptides contained sequences that matched the known SLiM-binding specificities of the bait proteins. For SHANK1 PDZ, we establish a novel consensus TxF motif for its non-C-terminal ligands. The binding peptides mostly represented novel target proteins, however, several previously validated protein-protein interactions (PPIs) were also discovered. We determined the affinities between the VHS domain of GGA1 and three identified ligands to 40-130 μm through isothermal titration calorimetry, and confirmed interactions through coimmunoprecipitation using full-length proteins. Taken together, we outline a general pipeline for the design and construction of ProP-PD libraries and the analysis of ProP-PD-derived, SLiM-based PPIs. We demonstrated the methods potential to identify low affinity motif-mediated interactions for modular domains with distinct binding preferences. The approach is a highly useful complement to the current toolbox of methods for PPI discovery.
Tandem mass spectrometry (MS/MS) techniques, developed for protein identification, are increasingly being applied in the field of peptidomics. Using this approach, the set of protein fragments observed in a sample of interest can be determined to gain insights into important biological processes such as signaling and other bioactivities. As the peptidomics era progresses, there is a need for robust and convenient methods to inspect and analyze MS/MS derived data. Here, we present Peptigram, a novel tool dedicated to the visualization and comparison of peptides detected by MS/MS. The principal advantage of Peptigram is that it provides visualizations at both the protein and peptide level, allowing users to simultaneously visualize the peptide distributions of one or more samples of interest, mapped to their parent proteins. In this way rapid comparisons between samples can be made in terms of their peptide coverage and abundance. Moreover, Peptigram integrates and displays key sequence features from external databases and links with peptide analysis tools to offer the user a comprehensive peptide discovery resource. Here, we illustrate the use of Peptigram on a data set of milk hydrolysates. For convenience, Peptigram is implemented as a web application, and is freely available for academic use at http://bioware.ucd.ie/peptigram .
The Spindle Assembly Checkpoint (SAC) ensures genomic stability by preventing sister chromatid separation until all chromosomes are attached to the spindle. It catalyzes the production of the Mitotic Checkpoint Complex (MCC), which inhibits Cdc20 to inactivate the Anaphase Promoting Complex/Cyclosome (APC/C). Here we show that two Cdc20-binding motifs in BubR1 of the recently identified ABBA motif class are crucial for the MCC to recognize active APC/C-Cdc20. Mutating these motifs eliminates MCC binding to the APC/C, thereby abolishing the SAC and preventing cells from arresting in response to microtubule poisons. These ABBA motifs flank a KEN box to form a cassette that is highly conserved through evolution, both in the arrangement and spacing of the ABBA-KEN-ABBA motifs, and association with the amino-terminal KEN box required to form the MCC. We propose that the ABBA-KEN-ABBA cassette holds the MCC onto the APC/C by binding the two Cdc20 molecules in the MCC-APC/C complex.
The Database of Protein Disorder (DisProt, URL: www.disprot.org) has been significantly updated and upgraded since its last major renewal in 2007. The current release holds information on more than 800 entries of IDPs/IDRs, i.e. intrinsically disordered proteins or regions that exist and function without a well-defined three-dimensional structure. We have re-curated previous entries to purge DisProt from conflicting cases, and also upgraded the functional classification scheme to reflect continuous advance in the field in the past 10 years or so. We define IDPs as proteins that are disordered along their entire sequence, i.e. entirely lack structural elements, and IDRs as regions that are at least five consecutive residues without well-defined structure. We base our assessment of disorder strictly on experimental evidence, such as X-ray crystallography and nuclear magnetic resonance (primary techniques) and a broad range of other experimental approaches (secondary techniques). Confident and ambiguous annotations are highlighted separately. DisProt 7.0 presents classified knowledge regarding the experimental characterization and functional annotations of IDPs/IDRs, and is intended to provide an invaluable resource for the research community for a better understanding structural disorder and for developing better computational tools for studying disordered proteins.
Dynamic protein phosphorylation is a fundamental mechanism regulating biological processes in all organisms. Protein phosphatase 2A (PP2A) is the main source of phosphatase activity in the cell, but the molecular details of substrate recognition are unknown. Here, we report that a conserved surface-exposed pocket on PP2A regulatory B56 subunits binds to a consensus sequence on interacting proteins, which we term the LxxIxE motif. The composition of the motif modulates the affinity for B56, which in turn determines the phosphorylation status of associated substrates. Phosphorylation of amino acid residues within the motif increases B56 binding, allowing integration of kinase and phosphatase activity. We identify conserved LxxIxE motifs in essential proteins throughout the eukaryotic domain of life and in human viruses, suggesting that the motifs are required for basic cellular function. Our study provides a molecular description of PP2A binding specificity with broad implications for understanding signaling in eukaryotes.
The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org) is a manually curated database of short linear motifs (SLiMs). In this update, we present the latest additions to this resource, along with more improvements to the web interface. ELM 2016 contains more than 240 different motif classes with over 2700 experimentally validated instances, manually curated from more than 2400 scientific publications. In addition, more data have been made available as individually searchable pages and are downloadable in various formats.
Although histone acetylation and deacetylation machineries (HATs and HDACs) regulate important aspects of cell function by targeting histone tails, recent work highlights that non-histone protein acetylation is also pervasive in eukaryotes. Here, we use quantitative mass-spectrometry to define acetylations targeted by the sirtuin family, previously implicated in the regulation of non-histone protein acetylation. To identify HATs that promote acetylation of these sites, we also performed this analysis in gcn5 (SAGA) and esa1 (NuA4) mutants. We observed strong sequence specificity for the sirtuins and for each of these HATs. Although the Gcn5 and Esa1 consensus sequences are entirely distinct, the sirtuin consensus overlaps almost entirely with that of Gcn5, suggesting a strong coordination between these two regulatory enzymes. Furthermore, by examining global acetylation in an ada2 mutant, which dissociates Gcn5 from the SAGA complex, we found that a subset of Gcn5 targets did not depend on an intact SAGA complex for targeting. Our work provides a framework for understanding how HAT and HDAC enzymes collaborate to regulate critical cellular processes related to growth and division.
Huge research effort has been invested over many years to determine the phenotypes of natural or artificial mutations in HIV proteins--interpretation of mutation phenotypes is an invaluable source of new knowledge. The results of this research effort are recorded in the scientific literature, but it is difficult for virologists to rapidly find it. Manually locating data on phenotypic variation within the approximately 270,000 available HIV-related research articles, or the further 1,500 articles that are published each month is a daunting task. Accordingly, the HIV research community would benefit from a resource cataloguing the available HIV mutation literature. We have applied computational text-mining techniques to parse and map mutagenesis and polymorphism information from the HIV literature, have enriched the data with ancillary information and have developed a public, web-based interface through which it can be intuitively explored: the HIV mutation browser. The current release of the HIV mutation browser describes the phenotypes of 7,608 unique mutations at 2,520 sites in the HIV proteome, resulting from the analysis of 120,899 papers. The mutation information for each protein is organised in a residue-centric manner and each residue is linked to the relevant experimental literature. The importance of HIV as a global health burden advocates extensive effort to maximise the efficiency of HIV research. The HIV mutation browser provides a valuable new resource for the research community. The HIV mutation browser is available at: http://hivmut.org.
The ubiquitin protein ligase anaphase-promoting complex or cyclosome (APC/C) controls mitosis by promoting ordered degradation of securin, cyclins, and other proteins. The mechanisms underlying the timing of APC/C substrate degradation are poorly understood. We explored these mechanisms using quantitative fluorescence microscopy of GFP-tagged APC/C(Cdc20) substrates in living budding yeast cells. Degradation of the S cyclin, Clb5, begins early in mitosis, followed 6 min later by the degradation of securin and Dbf4. Anaphase begins when less than half of securin is degraded. The spindle assembly checkpoint delays the onset of Clb5 degradation but does not influence securin degradation. Early Clb5 degradation depends on its interaction with the Cdk1-Cks1 complex and the presence of a Cdc20-binding "ABBA motif" in its N-terminal region. The degradation of securin and Dbf4 is delayed by Cdk1-dependent phosphorylation near their Cdc20-binding sites. Thus, a remarkably diverse array of mechanisms generates robust ordering of APC/C(Cdc20) substrate destruction.
Disease mutations are traditionally thought to impair protein functionality by disrupting the folded globular structure of proteins. However, 22% of human disease mutations occur in natively unstructured segments of proteins known as intrinsically disordered regions (IDRs). This therefore implicates defective IDR functionality in various human diseases including cancer. The functionality of IDRs is partly attributable to short linear motifs (SLiMs), but it remains an open question how much defects in SLiMs contribute to human diseases. A proteome-wide comparison of the distribution of missense mutations from disease and non-disease mutation datasets revealed that, in IDRs, disease mutations are more likely to occur within SLiMs than neutral missense mutations. Moreover, compared to neutral missense mutations, disease mutations more frequently impact functionally important residues of SLiMs, cause changes in the physicochemical properties of SLiMs, and disrupt more SLiM-mediated interactions. Analysis of these mutations resulted in a comprehensive list of experimentally validated or predicted SLiMs disrupted in disease. Furthermore, this in-depth analysis suggests that 'prostate cancer pathway' is particularly enriched for proteins with disease-related SLiMs. The contribution of mutations in SLiMs to disease may currently appear small when compared to mutations in globular domains. However, our analysis of mutations in predicted SLiMs suggests that this contribution might be more substantial. Therefore, when analysing the functional impact of mutations on proteins, SLiMs in proteins should not be neglected. Our results suggest that an increased focus on SLiMs in the coming decades will improve our understanding of human diseases and aid in the development of targeted treatments.
A molecular description of functional modules in the cell is the focus of many high-throughput studies in the postgenomic era. A large portion of biomolecular interactions in virtually all cellular processes is mediated by compact interaction modules, referred to as peptide motifs. Such motifs are typically less than ten residues in length, occur within intrinsically disordered regions, and are recognized and/or posttranslationally modified by structured domains of the interacting partner. In this review, we suggest that there might be over a million instances of peptide motifs in the human proteome. While this staggering number suggests that peptide motifs are numerous and the most understudied functional module in the cell, it also holds great opportunities for new discoveries.
The eukaryotic linear motif (ELM http://elm.eu.org) resource is a hub for collecting, classifying and curating information about short linear motifs (SLiMs). For >10 years, this resource has provided the scientific community with a freely accessible guide to the biology and function of linear motifs. The current version of ELM contains ∼200 different motif classes with over 2400 experimentally validated instances manually curated from >2000 scientific publications. Furthermore, detailed information about motif-mediated interactions has been annotated and made available in standard exchange formats. Where appropriate, links are provided to resources such as switches.elm.eu.org and KEGG pathways.
Short linear motifs (SLiMs) are protein interaction sites that play an important role in cell regulation by controlling protein activity, localization, and local abundance. The functionality of a SLiM can be modulated in a context-dependent manner to induce a gain, loss, or exchange of binding partners, which will affect the function of the SLiM-containing protein. As such, these conditional interactions underlie molecular decision-making in cell signaling. We identified multiple types of pre- and posttranslational switch mechanisms that can regulate the function of a SLiM and thereby control its interactions. The collected examples of experimentally characterized SLiM-based switch mechanisms were curated in the freely accessible switches.ELM resource (http://switches.elm.eu.org). On the basis of these examples, we defined and integrated rules to analyze SLiMs for putative regulatory switch mechanisms. We applied these rules to known validated SLiMs, providing evidence that more than half of these are likely to be pre- or posttranslationally regulated. In addition, we showed that posttranslationally modified sites are enriched around SLiMs, which enables cooperative and integrative regulation of protein interaction interfaces. We foresee switches.ELM complementing available resources to extend our knowledge of the molecular mechanisms underlying cell signaling.
The SPla/Ryanodine receptor (SPRY)/B30.2 domain is one of the most common folds in higher eukaryotes. The human genome encodes 103 SPRY/B30.2 domains, several of which are involved in the immune response. Approximately 45% of human SPRY/B30.2-containing proteins are E3 ligases. The role and function of the majority of SPRY/B30.2 domains are still poorly understood, however, in several cases mutations in this domain have been linked to congenital disorders. The recent characterization of SPRY/B30.2-mediated protein interactions has provided evidence for a role of this domain as an adaptor module to assemble macromolecular complexes, analogous to Src homology (SH)2, SH3, and WW domains. However, functional and structural evidence suggests that SPRY/B30.2 is a more versatile fold, allowing a wide range of binding modes.
Intrinsically disordered regions in eukaryotic proteomes contain key signaling and regulatory modules and mediate interactions with many proteins. Many viral proteomes encode disordered proteins and modulate host factors through the use of short linear motifs (SLiMs) embedded within disordered regions. However, the degree of viral protein disorder across different viruses is not well understood, so we set out to establish the constraints acting on viruses, in terms of their use of disordered protein regions. We surveyed predicted disorder across 2,278 available viral genomes in 41 families, and correlated the extent of disorder with genome size and other factors. Protein disorder varies strikingly between viral families (from 2.9% to 23.1% of residues), and also within families. However, this substantial variation did not follow the established trend among their hosts, with increasing disorder seen across eubacterial, archaebacterial, protists, and multicellular eukaryotes. For example, among large mammalian viruses, poxviruses and herpesviruses showed markedly differing disorder (5.6% and 17.9%, respectively). Viral families with smaller genome sizes have more disorder within each of five main viral types (ssDNA, dsDNA, ssRNA+, dsRNA, retroviruses), except for negative single-stranded RNA viruses, where disorder increased with genome size. However, surveying over all viruses, which compares tiny and enormous viruses over a much bigger range of genome sizes, there is no strong association of genome size with protein disorder. We conclude that there is extensive variation in the disorder content of viral proteomes. While a proportion of this may relate to base composition, to extent of gene overlap, and to genome size within viral types, there remain important additional family and virus-specific effects. Differing disorder strategies are likely to impact on how different viruses modulate host factors, and on how rapidly viruses can evolve novel instances of SLiMs subverting host functions, such as innate and acquired immunity.
Mass spectrometric analysis of peptides contained in enzymatically digested hydrolysates of proteins is increasingly being used to characterize potentially bioactive or otherwise interesting hydrolysates. However, when preparations containing mixtures of enzymes are used, from either biological or experimental sources, it is unclear which of these enzymes have been most important in hydrolyzing the sample. We have developed a tool to rapidly evaluate the evidence for which enzymes are most likely to have cleaved the sample. EnzymePredictor, a web-based software, has been developed to (i) identify the protein sources of fragments found in the hydrolysates and map them back on it, (ii) identify enzymes that could yield such cleavages, and (iii) generate a colored visualization of the hydrolysate, the source proteins, the fragments, and the predicted enzymes. It tabulates the enzymes ranked according to their cleavage counts. The provision of odds ratio and standard error in the table permits users to evaluate how distinctively particular enzymes may be favored over other enzymes as the most likely cleavers of the samples. Finally, the method displays the cleavage not only according to peptides, but also according to proteins, permitting evaluation of whether the cleavage pattern is general across all proteins, or specific to a subset. We illustrate the application of this method using milk hydrolysates, and show how it can rapidly identify the enzymes or enzyme combinations used in generating the peptides. The approach developed here will accelerate the identification of enzymes most likely to have been used in hydrolyzing a set of mass spectrometrically identified peptides derived from proteins. This has utility not only in understanding the results of mass spectrometry experiments, but also in choosing enzymes likely to yield similar cleavage patterns. EnzymePredictor can be found at http://bioware.ucd.ie/∼enzpred/Enzpred.php.
Adeno-associated virus (AAV) capsid assembly requires expression of the assembly-activating protein (AAP) together with capsid proteins VP1, VP2, and VP3. AAP is encoded by an alternative open reading frame of the cap gene. Sequence analysis and site-directed mutagenesis revealed that AAP contains two hydrophobic domains in the N-terminal part of the molecule that are essential for its assembly-promoting activity. Mutation of these sequences reduced the interaction of AAP with the capsid proteins. Deletions and a point mutation in the capsid protein C terminus also abolished capsid assembly and strongly reduced the interaction with AAP. Interpretation of these observations on a structural basis suggests an interaction of AAP with the VP C terminus, which forms the capsid protein interface at the 2-fold symmetry axis. This interpretation is supported by a decrease in the interaction of monoclonal antibody B1 with VP3 under nondenaturing conditions in the presence of AAP, indicative of steric hindrance of B1 binding to its C-terminal epitope by AAP. In addition, AAP forms high-molecular-weight oligomers and changes the conformation of nonassembled VP molecules as detected by conformation-sensitive monoclonal antibodies A20 and C37. Combined, these observations suggest a possible scaffolding activity of AAP in the AAV capsid assembly reaction.
Microtubule plus-end tracking proteins (+TIPs) are structurally and functionally diverse factors that accumulate at the growing microtubule plus-ends, connect them to various cellular structures, and control microtubule dynamics [1, 2]. EB1 and its homologs are +TIPs that can autonomously recognize growing microtubule ends and recruit to them a variety of other proteins. Numerous +TIPs bind to end binding (EB) proteins through natively unstructured basic and serine-rich polypeptide regions containing a core SxIP motif (serine-any amino acid-isoleucine-proline) [3]. The SxIP consensus sequence is short, and the surrounding sequences show high variability, raising the possibility that undiscovered SxIP containing +TIPs are encoded in mammalian genomes. Here, we performed a proteome-wide search for mammalian SxIP-containing +TIPs by combining biochemical and bioinformatics approaches. We have identified a set of previously uncharacterized EB partners that have the capacity to accumulate at the growing microtubule ends, including protein kinases, a small GTPase, centriole-, membrane-, and actin-associated proteins. We show that one of the newly identified +TIPs, CEP104, interacts with CP110 and CEP97 at the centriole and is required for ciliogenesis. Our study reveals the complexity of the mammalian +TIP interactome and provides a basis for investigating the molecular crosstalk between microtubule ends and other cellular structures.
The pre-translational modification of messenger ribonucleic acids (mRNAs) by alternative promoter usage and alternative splicing is an important source of pleiotropy. Despite intensive efforts, our understanding of the functional implications of this dynamically created diversity is still incomplete. Using the available knowledge of interaction modules, particularly within intrinsically disordered regions (IDRs), we analysed the occurrences of protein modules within alternative exons. We find that regions removed or included by pre-translational variation are enriched in linear motifs suggesting that the removal or inclusion of exons containing these interaction modules is an important regulatory mechanism. In particular, we observe that PDZ-, PTB-, SH2- and WW-domain binding motifs are more likely to occur within alternative exons. We also determine that regions removed or included by alternative promoter usage are enriched in IDRs suggesting that protein isoform diversity is tightly coupled to the modulation of IDRs. This study, therefore, demonstrates that short linear motifs are key components for establishing protein diversity between splice variants.
The assembly of retroviruses such as HIV-1 is driven by oligomerization of their major structural protein, Gag. Gag is a multidomain polyprotein including three conserved folded domains: MA (matrix), CA (capsid) and NC (nucleocapsid). Assembly of an infectious virion proceeds in two stages. In the first stage, Gag oligomerization into a hexameric protein lattice leads to the formation of an incomplete, roughly spherical protein shell that buds through the plasma membrane of the infected cell to release an enveloped immature virus particle. In the second stage, cleavage of Gag by the viral protease leads to rearrangement of the particle interior, converting the non-infectious immature virus particle into a mature infectious virion. The immature Gag shell acts as the pivotal intermediate in assembly and is a potential target for anti-retroviral drugs both in inhibiting virus assembly and in disrupting virus maturation. However, detailed structural information on the immature Gag shell has not previously been available. For this reason it is unclear what protein conformations and interfaces mediate the interactions between domains and therefore the assembly of retrovirus particles, and what structural transitions are associated with retrovirus maturation. Here we solve the structure of the immature retroviral Gag shell from Mason-Pfizer monkey virus by combining cryo-electron microscopy and tomography. The 8-Å resolution structure permits the derivation of a pseudo-atomic model of CA in the immature retrovirus, which defines the protein interfaces mediating retrovirus assembly. We show that transition of an immature retrovirus into its mature infectious form involves marked rotations and translations of CA domains, that the roles of the amino-terminal and carboxy-terminal domains of CA in assembling the immature and mature hexameric lattices are exchanged, and that the CA interactions that stabilize the immature and mature viruses are almost completely distinct.
RNA-binding proteins (RBPs) determine RNA fate from synthesis to decay. Employing two complementary protocols for covalent UV crosslinking of RBPs to RNA, we describe a systematic, unbiased, and comprehensive approach, termed "interactome capture," to define the mRNA interactome of proliferating human HeLa cells. We identify 860 proteins that qualify as RBPs by biochemical and statistical criteria, adding more than 300 RBPs to those previously known and shedding light on RBPs in disease, RNA-binding enzymes of intermediary metabolism, RNA-binding kinases, and RNA-binding architectures. Unexpectedly, we find that many proteins of the HeLa mRNA interactome are highly intrinsically disordered and enriched in short repetitive amino acid motifs. Interactome capture is broadly applicable to study mRNA interactome composition and dynamics in varied biological settings.
Tight regulation of gene products from transcription to protein degradation is required for reliable and robust control of eukaryotic cell physiology. Many of the mechanisms directing cell regulation rely on proteins detecting the state of the cell through context-dependent, tuneable interactions. These interactions underlie the ability of proteins to make decisions by combining regulatory information encoded in a protein's expression level, localisation and modification state. This raises the question, how do proteins integrate available information to correctly make decisions? Over the past decade pioneering work on the nature and function of intrinsically disordered protein regions has revealed many elegant switching mechanisms that underlie cell signalling and regulation, prompting a reevaluation of their role in cooperative decision-making.
<h4>Motivation</h4>Eukaryotic proteins are highly modular, containing multiple interaction interfaces that mediate binding to a network of regulators and effectors. Recent advances in high-throughput proteomics have rapidly expanded the number of known protein-protein interactions (PPIs); however, the molecular basis for the majority of these interactions remains to be elucidated. There has been a growing appreciation of the importance of a subset of these PPIs, namely those mediated by short linear motifs (SLiMs), particularly the canonical and ubiquitous SH2, SH3 and PDZ domain-binding motifs. However, these motif classes represent only a small fraction of known SLiMs and outside these examples little effort has been made, either bioinformatically or experimentally, to discover the full complement of motif instances.<h4>Results</h4>In this article, interaction data are analysed to identify and characterize an important subset of PPIs, those involving SLiMs binding to globular domains. To do this, we introduce iELM, a method to identify interactions mediated by SLiMs and add molecular details of the interaction interfaces to both interacting proteins. The method identifies SLiM-mediated interfaces from PPI data by searching for known SLiM-domain pairs. This approach was applied to the human interactome to identify a set of high-confidence putative SLiM-mediated PPIs.<h4>Availability</h4>iELM is freely available at http://elmint.embl.de<h4>Contact</h4>[email protected]<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) resource at http://elm.eu.org provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences. The current update of the ELM database comprises 1800 annotated motif instances representing 170 distinct functional classes, including approximately 500 novel instances and 24 novel classes. Several older motif class entries have been also revisited, improving annotation and adding novel instances. Furthermore, addition of full-text search capabilities, an enhanced interface and simplified batch download has improved the overall accessibility of the ELM data. The motif discovery portion of the ELM resource has added conservation, and structural attributes have been incorporated to aid users to discriminate biologically relevant motifs from stochastically occurring non-functional instances.
Many of the specific functions of intrinsically disordered protein segments are mediated by Short Linear Motifs (SLiMs) interacting with other proteins. Well known examples include SLiMs that interact with 14-3-3, PDZ, SH2, SH3, and WW domains but the true extent and diversity of SLiM-mediated interactions is largely unknown. Here, we attempt to expand our knowledge of human SLiMs by applying in silico SLiM prediction to the human interactome. Combining data from seven different interaction databases, we analysed approximately 6000 protein-centred and 1600 domain-centred human interaction datasets of 3+ unrelated proteins that interact with a common partner. Results were placed in context through comparison to randomised datasets of similar size and composition. The search returned thousands of evolutionarily conserved, intrinsically disordered occurrences of hundreds of significantly enriched recurring motifs, including many that have never been previously identified (). In addition to True Positive results for at least 25 different known SLiMs, a striking number of "off-target" proteins/domains also returned significantly enriched known motifs. Often, this was due to the non-independence of the datasets, with many proteins sharing interaction partners or contributing interactions to multiple domain datasets. The majority of these motif classes, however, were also found to be significantly enriched in one or more randomised datasets. This highlights the need for care when interpreting motif predictions of this nature but also raises the possibility that SLiM occurrences may be successfully identified independently of interaction data. Although not as compositionally biased as previous studies, patterns matching known SLiMs tended to cluster into a few large groups of similar sequence, while novel predictions tended to be more distinctive and less abundant. Whether this is due to ascertainment bias or a true functional composition bias of SLiMs is not clear and warrants further investigation.
Intracellular juxtamembrane regions of transmembrane proteins play pivotal roles in cell signalling, mediated by protein-protein interactions. Disordered protein regions, and short conserved motifs within them, are emerging as key determinants of many such interactions. Here, we investigated whether disorder and conserved motifs are enriched in the juxtamembrane area of human single-pass transmembrane proteins. Conserved motifs were defined as short disordered regions that were much more conserved than the adjacent disordered residues. Human single-pass proteins had higher mean disorder in their cytoplasmic segments than their extracellular parts. Some, but not all, of this effect reflected the shorter length of the cytoplasmic tail. A peak of cytoplasmic disorder was seen at around 30 residues from the membrane. We noted a significant increase in the incidence of conserved motifs within the disordered regions at the same location, even after correcting for the extent of disorder. We conclude that elevated disorder within the cytoplasmic tail of many transmembrane proteins is likely to be associated with enrichment for signalling interactions mediated by conserved short motifs.
Several major human pathogens, including the filoviruses, paramyxoviruses, and rhabdoviruses, package their single-stranded RNA genomes within helical nucleocapsids, which bud through the plasma membrane of the infected cell to release enveloped virions. The virions are often heterogeneous in shape, which makes it difficult to study their structure and assembly mechanisms. We have applied cryo-electron tomography and sub-tomogram averaging methods to derive structures of Marburg virus, a highly pathogenic filovirus, both after release and during assembly within infected cells. The data demonstrate the potential of cryo-electron tomography methods to derive detailed structural information for intermediate steps in biological pathways within intact cells. We describe the location and arrangement of the viral proteins within the virion. We show that the N-terminal domain of the nucleoprotein contains the minimal assembly determinants for a helical nucleocapsid with variable number of proteins per turn. Lobes protruding from alternate interfaces between each nucleoprotein are formed by the C-terminal domain of the nucleoprotein, together with viral proteins VP24 and VP35. Each nucleoprotein packages six RNA bases. The nucleocapsid interacts in an unusual, flexible "Velcro-like" manner with the viral matrix protein VP40. Determination of the structures of assembly intermediates showed that the nucleocapsid has a defined orientation during transport and budding. Together the data show striking architectural homology between the nucleocapsid helix of rhabdoviruses and filoviruses, but unexpected, fundamental differences in the mechanisms by which the nucleocapsids are then assembled together with matrix proteins and initiate membrane envelopment to release infectious virions, suggesting that the viruses have evolved different solutions to these conserved assembly steps.
Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html.
A wealth of in silico tools is available for protein motif discovery and structural analysis. The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use. A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence. Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence. The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed. Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt.
The assembly of retroviruses is driven by oligomerization of the Gag polyprotein. We have used cryo-electron tomography together with subtomogram averaging to describe the three-dimensional structure of in vitro-assembled Gag particles from human immunodeficiency virus, Mason-Pfizer monkey virus, and Rous sarcoma virus. These represent three different retroviral genera: the lentiviruses, betaretroviruses and alpharetroviruses. Comparison of the three structures reveals the features of the supramolecular organization of Gag that are conserved between genera and therefore reflect general principles of Gag-Gag interactions and the features that are specific to certain genera. All three Gag proteins assemble to form approximately spherical hexameric lattices with irregular defects. In all three genera, the N-terminal domain of CA is arranged in hexameric rings around large holes. Where the rings meet, 2-fold densities, assigned to the C-terminal domain of CA, extend between adjacent rings, and link together at the 6-fold symmetry axis with a density, which extends toward the center of the particle into the nucleic acid layer. Although this general arrangement is conserved, differences can be seen throughout the CA and spacer peptide regions. These differences can be related to sequence differences among the genera. We conclude that the arrangement of the structural domains of CA is well conserved across genera, whereas the relationship between CA, the spacer peptide region, and the nucleic acid is more specific to each genus.
Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at: http://bioware.ucd.ie/slimfinder.html.
Short linear motifs (SLiMs) in proteins can act as targets for proteolytic cleavage, sites of post-translational modification, determinants of sub-cellular localization, and mediators of protein-protein interactions. Computational discovery of SLiMs involves assembling a group of proteins postulated to share a potential motif, masking out residues less likely to contain such a motif, down-weighting shared motifs arising through common evolutionary descent, and calculation of statistical probabilities allowing for the multiple testing of all possible motifs. Much of the challenge for motif discovery lies in the assembly and masking of datasets of proteins likely to share motifs, since the motifs are typically short (between 3 and 10 amino acids in length), so that potential signals can be easily swamped by the noise of stochastically recurring motifs. Focusing on disordered regions of proteins, where SLiMs are predominantly found, and masking out non-conserved residues can reduce the level of noise but more work is required to improve the quality of high-throughput experimental datasets (e.g. of physical protein interactions) as input for computational discovery.
<h4>Motivation</h4>Short linear motifs (SLiMs) are important mediators of protein-protein interactions. Their short and degenerate nature presents a challenge for computational discovery. We sought to improve SLiM discovery by incorporating evolutionary information, since SLiMs are more conserved than surrounding residues.<h4>Results</h4>We have developed a new method that assesses the evolutionary signal of a residue in its sequence and structural context. Under-conserved residues are masked out prior to SLiM discovery, allowing incorporation into the existing statistical model employed by SLiMFinder. The method shows considerable robustness in terms of both the conservation score used for individual residues and the size of the sequence neighbourhood. Optimal parameters significantly improve return of known functional motifs from benchmarking data, raising the return of significant validated SLiMs from typical human interaction datasets from 20% to 60%, while retaining the high level of stringency needed for application to real biological data. The success of this regime indicates that it could be of general benefit to computational annotation and prediction of protein function at the sequence level.<h4>Availability</h4>All data and tools in this article are available at http://bioware.ucd.ie/~slimdisc/slimfinder/conmasking/.
Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.
<h4>Unlabelled</h4>CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs.<h4>Availability</h4>CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/
Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. Overrepresentation of convergent occurrences of motifs in proteins with a common attribute (such as similar subcellular location or a shared interaction partner) provides a feasible means to discover novel occurrences computationally. The SLiMDisc (Short, Linear Motif Discovery) web server corrects for common ancestry in describing shared motifs, concentrating on the convergently evolved motifs. The server returns a listing of the most interesting motifs found within unmasked regions, ranked according to an information content-based scoring scheme. It allows interactive input masking, according to various criteria. Scoring allows for evolutionary relationships in the data sets through treatment of BLAST local alignments. Alongside this ranked list, visualizations of the results improve understanding of the context of suggested motifs, helping to identify true motifs of interest. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Additional options for filtering and/or re-ranking motifs further permit the user to focus on motifs with desired attributes. Returned motifs can also be compared with known SLiMs from the literature. SLiMDisc is available at: http://bioware.ucd.ie/~slimdisc/.
<h4>Background</h4>Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif" from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.<h4>Methodology/principal findings</h4>In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.<h4>Conclusions/significance</h4>The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.
Sequences of human proteins are frequently prepared as synthetic oligopeptides to assess their functional ability to act as compounds modulating pathways involving the parent protein. Our objective was to analyze a set of oligopeptides, to determine if their solubility or activity correlated with features of their primary sequence, or with features of properties inferred from three-dimensional structural models derived by conformational searches. We generated a conformational database for a set of 78 oligopeptides, derived from human proteins, and correlated their 3D structures with solubility and biological assay activity (as measured by platelet activation and inhibition). Parameters of these conformers (frequency of coil, frequency of turns, the degree of packing, and the energy) did not correlate with solubility, which was instead partly predicted by two measures obtained from primary sequence analysis, that is, the hydrophobic moment and the number of charges. The platelet activity of peptides was correlated with a parameter derived from the structural modeling; this was the second virial coefficient (a measure of the tendency for a structure to autoaggregate). This could be explained by an excess among the active peptides of those which had either a large number of positive charges or in some cases a large number of negative charges, with a corresponding deficit of peptides with a mixture of negative and positive charges. We subsequently determined that a panel of 523 commercially available (and biologically active) peptides shared this elevation of absolute net charge: there were significantly lower frequencies of peptides of mixed charges compared to expectations. We conclude that the design of biologically active peptides should consider favoring those with a higher absolute net charge.
Many important interactions of proteins are facilitated by short, linear motifs (SLiMs) within a protein's primary sequence. Our aim was to establish robust methods for discovering putative functional motifs. The strongest evidence for such motifs is obtained when the same motifs occur in unrelated proteins, evolving by convergence. In practise, searches for such motifs are often swamped by motifs shared in related proteins that are identical by descent. Prediction of motifs among sets of biologically related proteins, including those both with and without detectable similarity, were made using the TEIRESIAS algorithm. The number of motif occurrences arising through common evolutionary descent were normalized based on treatment of BLAST local alignments. Motifs were ranked according to a score derived from the product of the normalized number of occurrences and the information content. The method was shown to significantly outperform methods that do not discount evolutionary relatedness, when applied to known SLiMs from a subset of the eukaryotic linear motif (ELM) database. An implementation of Multiple Spanning Tree weighting outperformed two other weighting schemes, in a variety of settings.
<h4>Objective</h4>Brown adipose tissue (BAT) is important for thermoregulation in many mammals. Uncoupling protein 1 (UCP1) is the critical regulator of thermogenesis in BAT. Here we aimed to investigate the deacetylation control of BAT and to investigate a possible functional connection between UCP1 and sirtuin 3 (SIRT3), the master mitochondrial lysine deacetylase.<h4>Methods</h4>We carried out physiological, molecular, and proteomic analyses of BAT from wild-type and Sirt3KO mice when BAT is activated. Mice were either cold exposed for 2 days or were injected with the β3-adrenergic agonist, CL316,243 (1 mg/kg; i.p.). Mutagenesis studies were conducted in a cellular model to assess the impact of acetylation lysine sites on UCP1 function. Cardiac punctures were collected for proteomic analysis of blood acylcarnitines. Isolated mitochondria were used for functional analysis of OXPHOS proteins.<h4>Results</h4>Our findings showed that SIRT3 absence in mice resulted in impaired BAT lipid use, whole body thermoregulation, and respiration in BAT mitochondria, without affecting UCP1 expression. Acetylome profiling of BAT mitochondria revealed that SIRT3 regulates acetylation status of many BAT mitochondrial proteins including UCP1 and crucial upstream proteins. Mutagenesis work in cells suggested that UCP1 activity was independent of direct SIRT3-regulated lysine acetylation. However, SIRT3 impacted BAT mitochondrial proteins activities of acylcarnitine metabolism and specific electron transport chain complexes, CI and CII.<h4>Conclusions</h4>Our data highlight that SIRT3 likely controls BAT thermogenesis indirectly by targeting pathways upstream of UCP1.
PP1 and PP2A-B56 are major serine/threonine phosphatase families that achieve specificity by colocalizing with substrates. At the kinetochore, however, both phosphatases localize to an almost identical molecular space and yet they still manage to regulate unique pathways and processes. By switching or modulating the positions of PP1/PP2A-B56 at kinetochores, we show that their unique downstream effects are not due to either the identity of the phosphatase or its precise location. Instead, these phosphatases signal differently because their kinetochore recruitment can be either inhibited (PP1) or enhanced (PP2A) by phosphorylation inputs. Mathematical modeling explains how these inverse phospho-dependencies elicit unique forms of cross-regulation and feedback, which allows otherwise indistinguishable phosphatases to produce distinct network behaviors and control different mitotic processes. Furthermore, our genome-wide analysis suggests that these major phosphatase families may have evolved to respond to phosphorylation inputs in opposite ways because many other PP1 and PP2A-B56-binding motifs are also phospho-regulated.
Dynamic protein phosphorylation constitutes a fundamental regulatory mechanism in all organisms. Phosphoprotein phosphatase 4 (PP4) is a conserved and essential nuclear serine and threonine phosphatase. Despite the importance of PP4, general principles of substrate selection are unknown, hampering the study of signal regulation by this phosphatase. Here, we identify and thoroughly characterize a general PP4 consensus-binding motif, the FxxP motif. X-ray crystallography studies reveal that FxxP motifs bind to a conserved pocket in the PP4 regulatory subunit PPP4R3. Systems-wide in silico searches integrated with proteomic analysis of PP4 interacting proteins allow us to identify numerous FxxP motifs in proteins controlling a range of fundamental cellular processes. We identify an FxxP motif in the cohesin release factor WAPL and show that this regulates WAPL phosphorylation status and is required for efficient cohesin release. Collectively our work uncovers basic principles of PP4 specificity with broad implications for understanding phosphorylation-mediated signaling in cells.
The eukaryotic linear motif (ELM) resource is a repository of manually curated experimentally validated short linear motifs (SLiMs). Since the initial release almost 20 years ago, ELM has become an indispensable resource for the molecular biology community for investigating functional regions in many proteins. In this update, we have added 21 novel motif classes, made major revisions to 12 motif classes and added >400 new instances mostly focused on DNA damage, the cytoskeleton, SH2-binding phosphotyrosine motifs and motif mimicry by pathogenic bacterial effector proteins. The current release of the ELM database contains 289 motif classes and 3523 individual protein motif instances manually curated from 3467 scientific publications. ELM is available at: http://elm.eu.org.
Modern biology produces data at a staggering rate. Yet, much of these biological data is still isolated in the text, figures, tables and supplementary materials of articles. As a result, biological information created at great expense is significantly underutilised. The protein motif biology field does not have sufficient resources to curate the corpus of motif-related literature and, to date, only a fraction of the available articles have been curated. In this study, we develop a set of tools and a web resource, 'articles.ELM', to rapidly identify the motif literature articles pertinent to a researcher's interest. At the core of the resource is a manually curated set of about 8000 motif-related articles. These articles are automatically annotated with a range of relevant biological data allowing in-depth search functionality. Machine-learning article classification is used to group articles based on their similarity to manually curated motif classes in the Eukaryotic Linear Motif resource. Articles can also be manually classified within the resource. The 'articles.ELM' resource permits the rapid and accurate discovery of relevant motif articles thereby improving the visibility of motif literature and simplifying the recovery of valuable biological insights sequestered within scientific articles. Consequently, this web resource removes a critical bottleneck in scientific productivity for the motif biology field. Database URL: http://slim.icr.ac.uk/articles/.
Short linear motifs (SLiMs) drive dynamic protein-protein interactions essential for signaling, but sequence degeneracy and low binding affinities make them difficult to identify. We harnessed unbiased systematic approaches for SLiM discovery to elucidate the regulatory network of calcineurin (CN)/PP2B, the Ca<sup>2+</sup>-activated phosphatase that recognizes LxVP and PxIxIT motifs. In vitro proteome-wide detection of CN-binding peptides, in vivo SLiM-dependent proximity labeling, and in silico modeling of motif determinants uncovered unanticipated CN interactors, including NOTCH1, which we establish as a CN substrate. Unexpectedly, CN shows SLiM-dependent proximity to centrosomal and nuclear pore complex (NPC) proteins-structures where Ca<sup>2+</sup> signaling is largely uncharacterized. CN dephosphorylates human and yeast NPC proteins and promotes accumulation of a nuclear transport reporter, suggesting conserved NPC regulation by CN. The CN network assembled here provides a resource to investigate Ca<sup>2+</sup> and CN signaling and demonstrates synergy between experimental and computational methods, establishing a blueprint for examining SLiM-based networks.
Protein kinase B (AKT1) is a central node in a signaling pathway that regulates cell survival. The diverse pathways regulated by AKT1 are communicated in the cell via the phosphorylation of perhaps more than 100 cellular substrates. AKT1 is itself activated by phosphorylation at Thr-308 and Ser-473. Despite the fact that these phosphorylation sites are biomarkers for cancers and tumor biology, their individual roles in shaping AKT1 substrate selectivity are unknown. We recently developed a method to produce AKT1 with programmed phosphorylation at either or both of its key regulatory sites. Here, we used both defined and randomized peptide libraries to map the substrate selectivity of site-specific, singly and doubly phosphorylated AKT1 variants. To globally quantitate AKT1 substrate preferences, we synthesized three AKT1 substrate peptide libraries: one based on 84 "known" substrates and two independent and larger oriented peptide array libraries (OPALs) of ∼10<sup>11</sup> peptides each. We found that each phospho-form of AKT1 has common and distinct substrate requirements. Compared with pAKT1<sup>T308</sup>, the addition of Ser-473 phosphorylation increased AKT1 activities on some, but not all of its substrates. This is the first report that Ser-473 phosphorylation can positively or negatively regulate kinase activity in a substrate-dependent fashion. Bioinformatics analysis indicated that the OPAL-activity data effectively discriminate known AKT1 substrates from closely related kinase substrates. Our results also enabled predictions of novel AKT1 substrates that suggest new and expanded roles for AKT1 signaling in regulating cellular processes.
The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.
Many protein-modifying enzymes recognize their substrates via docking motifs, but the range of functionally permissible motif sequences is often poorly defined. During eukaryotic cell division, cyclin-specific docking motifs help cyclin-dependent kinases (CDKs) phosphorylate different substrates at different stages, thus enforcing a temporally ordered series of events. In budding yeast, CDK substrates with Leu/Pro-rich (LP) docking motifs are recognized by Cln1/2 cyclins in late G1 phase, yet the key sequence features of these motifs were unknown. Here, we comprehensively analyze LP motif requirements in vivo by combining a competitive growth assay with deep mutational scanning. We quantified the effect of all single-residue replacements in five different LP motifs by using six distinct G1 cyclins from diverse fungi including medical and agricultural pathogens. The results uncover substantial tolerance for deviations from the consensus sequence, plus requirements at some positions that are contingent on the favorability of other motif residues. They also reveal the basis for variations in functional potency among wild-type motifs, and allow derivation of a quantitative matrix that predicts the strength of other candidate motif sequences. Finally, we find that variation in docking motif potency can advance or delay the time at which CDK substrate phosphorylation occurs, and thereby control the temporal ordering of cell cycle regulation. The overall results provide a general method for surveying viable docking motif sequences and quantifying their potency in vivo, and they reveal how variations in docking strength can tune the degree and timing of regulatory modifications.
The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.
Almost twenty years after its initial release, the Eukaryotic Linear Motif (ELM) resource remains an invaluable source of information for the study of motif-mediated protein-protein interactions. ELM provides a comprehensive, regularly updated and well-organised repository of manually curated, experimentally validated short linear motifs (SLiMs). An increasing number of SLiM-mediated interactions are discovered each year and keeping the resource up-to-date continues to be a great challenge. In the current update, 30 novel motif classes have been added and five existing classes have undergone major revisions. The update includes 411 new motif instances mostly focused on cell-cycle regulation, control of the actin cytoskeleton, membrane remodelling and vesicle trafficking pathways, liquid-liquid phase separation and integrin signalling. Many of the newly annotated motif-mediated interactions are targets of pathogenic motif mimicry by viral, bacterial or eukaryotic pathogens, providing invaluable insights into the molecular mechanisms underlying infectious diseases. The current ELM release includes 317 motif classes incorporating 3934 individual motif instances manually curated from 3867 scientific publications. ELM is available at: http://elm.eu.org.
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
Specific protein-protein interactions are central to all processes that underlie cell physiology. Numerous studies have together identified hundreds of thousands of human protein-protein interactions. However, many interactions remain to be discovered, and low affinity, conditional, and cell type-specific interactions are likely to be disproportionately underrepresented. Here, we describe an optimized proteomic peptide-phage display library that tiles all disordered regions of the human proteome and allows the screening of ~ 1,000,000 overlapping peptides in a single binding assay. We define guidelines for processing, filtering, and ranking the results and provide PepTools, a toolkit to annotate the identified hits. We uncovered >2,000 interaction pairs for 35 known short linear motif (SLiM)-binding domains and confirmed the quality of the produced data by complementary biophysical or cell-based assays. Finally, we show how the amino acid resolution-binding site information can be used to pinpoint functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the proteome. The optimized human disorderome library paired with PepTools represents a powerful pipeline for unbiased proteome-wide discovery of SLiM-based interactions.
Viral proteins make extensive use of short peptide interaction motifs to hijack cellular host factors. However, most current large-scale methods do not identify this important class of protein-protein interactions. Uncovering peptide mediated interactions provides both a molecular understanding of viral interactions with their host and the foundation for developing novel antiviral reagents. Here we describe a viral peptide discovery approach covering 23 coronavirus strains that provides high resolution information on direct virus-host interactions. We identify 269 peptide-based interactions for 18 coronaviruses including a specific interaction between the human G3BP1/2 proteins and an ΦxFG peptide motif in the SARS-CoV-2 nucleocapsid (N) protein. This interaction supports viral replication and through its ΦxFG motif N rewires the G3BP1/2 interactome to disrupt stress granules. A peptide-based inhibitor disrupting the G3BP1/2-N interaction dampened SARS-CoV-2 infection showing that our results can be directly translated into novel specific antiviral reagents.
Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has F<sub>max</sub> = 0.483 on the full dataset and F<sub>max</sub> = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with F<sub>max</sub> = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.
Hub proteins participate in cellular regulation by dynamic binding of multiple proteins within interaction networks. The hub protein LC8 reversibly interacts with more than 100 partners through a flexible pocket at its dimer interface. To explore the diversity of the LC8 partner pool, we screened for LC8 binding partners using a proteomic phage display library composed of peptides from the human proteome, which had no bias toward a known LC8 motif. Of the identified hits, we validated binding of 29 peptides using isothermal titration calorimetry. Of the 29 peptides, 19 were entirely novel, and all had the canonical TQT motif anchor. A striking observation is that numerous peptides containing the TQT anchor do not bind LC8, indicating that residues outside of the anchor facilitate LC8 interactions. Using both LC8-binding and nonbinding peptides containing the motif anchor, we developed the "LC8Pred" algorithm that identifies critical residues flanking the anchor and parses random sequences to predict LC8-binding motifs with ∼78% accuracy. Our findings significantly expand the scope of the LC8 hub interactome.
Scaffold liprin-α1 is required to assemble dynamic plasma membrane-associated platforms (PMAPs) at the front of migrating breast cancer cells, to promote protrusion and invasion. We show that the N-terminal region of liprin-α1 contains an LxxIxE motif interacting with B56 regulatory subunits of serine/threonine protein phosphatase 2A (PP2A). The specific interaction of B56γ with liprin-α1 requires an intact motif, since two point mutations strongly reduce the interaction. B56γ mediates the interaction of liprin-α1 with the heterotrimeric PP2A holoenzyme. Most B56γ protein is recovered in the cytosolic fraction of invasive MDA-MB-231 breast cancer cells, where B56γ is complexed with liprin-α1. While mutation of the short linear motif (SLiM) does not affect localization of liprin-α1 to PMAPs, localization of B56γ at these sites specifically requires liprin-α1. Silencing of B56γ or liprin-α1 inhibits to similar extent cell spreading on extracellular matrix, invasion, motility and lamellipodia dynamics in migrating MDA-MB-231 cells, suggesting that B56γ/PP2A is a novel component of the PMAPs machinery regulating tumor cell motility. In this direction, inhibition of cell spreading by silencing liprin-α1 is not rescued by expression of B56γ binding-defective liprin-α1 mutant. We propose that liprin-α1-mediated recruitment of PP2A via B56γ regulates cell motility by controlling protrusion in migrating MDA-MB-231 cells.
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Short linear motif (SLiM)-mediated interactions offer a unique strategy for viral intervention due to their compact interfaces, ease of convergent evolution, and key functional roles. Consequently, many viruses extensively mimic host SLiMs to hijack or deregulate cellular pathways and the same motif-binding pocket is often targeted by numerous unrelated viruses. A toolkit of therapeutics targeting commonly mimicked SLiMs could provide prophylactic and therapeutic broad-spectrum antivirals and vastly improve our ability to treat ongoing and future viral outbreaks. In this opinion article, we discuss the therapeutic relevance of SLiMs, advocating their suitability as targets for broad-spectrum antiviral inhibitors.
Short linear motifs (SLiMs) are a unique and ubiquitous class of protein interaction modules that perform key regulatory functions and drive dynamic complex formation. For decades, interactions mediated by SLiMs have accumulated through detailed low-throughput experiments. Recent methodological advances have opened this previously underexplored area of the human interactome to high-throughput protein-protein interaction discovery. In this article, we discuss that SLiM-based interactions represent a significant blind spot in the current interactomics data, introduce the key methods that are illuminating the elusive SLiM-mediated interactome of the human cell on a large scale, and discuss the implications for the field.
Viruses mimic host short linear motifs (SLiMs) to hijack and deregulate cellular functions. Studies of motif-mediated interactions therefore provide insight into virus-host dependencies, and reveal targets for therapeutic intervention. Here, we describe the pan-viral discovery of 1712 SLiM-based virus-host interactions using a phage peptidome tiling the intrinsically disordered protein regions of 229 RNA viruses. We find mimicry of host SLiMs to be a ubiquitous viral strategy, reveal novel host proteins hijacked by viruses, and identify cellular pathways frequently deregulated by viral motif mimicry. Using structural and biophysical analyses, we show that viral mimicry-based interactions have similar binding strength and bound conformations as endogenous interactions. Finally, we establish polyadenylate-binding protein 1 as a potential target for broad-spectrum antiviral agent development. Our platform enables rapid discovery of mechanisms of viral interference and the identification of potential therapeutic targets which can aid in combating future epidemics and pandemics.
Phosphorylation is a ubiquitous post-translation modification that regulates protein function by promoting, inhibiting or modulating protein-protein interactions. Hundreds of thousands of phosphosites have been identified but the vast majority have not been functionally characterised and it remains a challenge to decipher phosphorylation events modulating interactions. We generated a phosphomimetic proteomic peptide-phage display library to screen for phosphosites that modulate short linear motif-based interactions. The peptidome covers ~13,500 phospho-serine/threonine sites found in the intrinsically disordered regions of the human proteome. Each phosphosite is represented as wild-type and phosphomimetic variant. We screened 71 protein domains to identify 248 phosphosites that modulate motif-mediated interactions. Affinity measurements confirmed the phospho-modulation of 14 out of 18 tested interactions. We performed a detailed follow-up on a phospho-dependent interaction between clathrin and the mitotic spindle protein hepatoma-upregulated protein (HURP), demonstrating the essentiality of the phospho-dependency to the mitotic function of HURP. Structural characterisation of the clathrin-HURP complex elucidated the molecular basis for the phospho-dependency. Our work showcases the power of phosphomimetic ProP-PD to discover novel phospho-modulated interactions required for cellular function.
Phosphoprotein phosphatases (PPPs) regulate major signaling pathways, but the determinants of phosphatase specificity are poorly understood. This is because methods to investigate this at scale are lacking. Here, we develop a novel in vitro assay, MRBLE:Dephos, that allows multiplexing of dephosphorylation reactions to determine phosphatase preferences. Using MRBLE:Dephos, we establish amino acid preferences of the residues surrounding the dephosphorylation site for PP1 and PP2A-B55, which reveals common and unique preferences. To compare the MRBLE:Dephos results to cellular substrates, we focused on mitotic exit that requires extensive dephosphorylation by PP1 and PP2A-B55. We use specific inhibition of PP1 and PP2A-B55 in mitotic exit lysates coupled with phosphoproteomics to identify more than 2,000 regulated sites. Importantly, the sites dephosphorylated during mitotic exit reveal key signatures that are consistent with MRBLE:Dephos. Furthermore, integration of our phosphoproteomic data with mitotic interactomes of PP1 and PP2A-B55 provides insight into how binding of phosphatases to substrates shapes dephosphorylation. Collectively, we develop novel approaches to investigate protein phosphatases that provide insight into mitotic exit regulation.
Short Linear Motifs (SLiMs) are the smallest structural and functional components of modular eukaryotic proteins. They are also the most abundant, especially when considering post-translational modifications. As well as being found throughout the cell as part of regulatory processes, SLiMs are extensively mimicked by intracellular pathogens. At the heart of the Eukaryotic Linear Motif (ELM) Resource is a representative (not comprehensive) database. The ELM entries are created by a growing community of skilled annotators and provide an introduction to linear motif functionality for biomedical researchers. The 2024 ELM update includes 346 novel motif instances in areas ranging from innate immunity to both protein and RNA degradation systems. In total, 39 classes of newly annotated motifs have been added, and another 17 existing entries have been updated in the database. The 2024 ELM release now includes 356 motif classes incorporating 4283 individual motif instances manually curated from 4274 scientific publications and including >700 links to experimentally determined 3D structures. In a recent development, the InterPro protein module resource now also includes ELM data. ELM is available at: http://elm.eu.org.
The virus life cycle depends on host-virus protein-protein interactions, which often involve a disordered protein region binding to a folded protein domain. Here, we used proteomic peptide phage display (ProP-PD) to identify peptides from the intrinsically disordered regions of the human proteome that bind to folded protein domains encoded by the SARS-CoV-2 genome. Eleven folded domains of SARS-CoV-2 proteins were found to bind 281 peptides from human proteins, and affinities of 31 interactions involving eight SARS-CoV-2 protein domains were determined (K<sub>D</sub> ∼ 7-300 μM). Key specificity residues of the peptides were established for six of the interactions. Two of the peptides, binding Nsp9 and Nsp16, respectively, inhibited viral replication. Our findings demonstrate how high-throughput peptide binding screens simultaneously identify potential host-virus interactions and peptides with antiviral properties. Furthermore, the high number of low-affinity interactions suggest that overexpression of viral proteins during infection may perturb multiple cellular pathways.
The identification of protein surfaces required for interaction with other biomolecules broadens our understanding of protein function, their regulation by post-translational modification, and the deleterious effect of disease mutations. Protein interaction interfaces are often identifiable as patches of conserved residues on a protein's surface. However, finding conserved accessible surfaces on folded regions requires an understanding of the protein structure to discriminate between functional and structural constraints on residue conservation. With the emergence of deep learning methods for protein structure prediction, high-quality structural models are now available for any protein. In this study, we introduce tools to identify conserved surfaces on AlphaFold2 structural models. We define autonomous structural modules from the structural models and convert these modules to a graph encoding residue topology, accessibility, and conservation. Conserved surfaces are then extracted using a novel eigenvector centrality-based approach. We apply the tool to the human proteome identifying hundreds of uncharacterised yet highly conserved surfaces, many of which contain clinically significant mutations. The xProtCAS tool is available as open-source Python software and an interactive web server.
Viruses are the obligate intracellular parasites that exploit the host cellular machinery to replicate their genome. During the viral life cycle viruses manipulate the host cell through interactions with host proteins. Many of these protein-protein interactions are mediated through the recognition of host globular domains by short linear motifs (SLiMs), or longer intrinsically disordered domains (IDD), in the disordered regions of viral proteins. However, viruses also employ their own globular domains for binding to SLiMs and IDDs present in host proteins or virus proteins. In this review, we focus on the different strategies adopted by viruses to utilize proteins or protein domains for binding to the disordered regions of human or/and viral ligands. With a set of examples, we describe viral domains that bind human SLiMs. We also provide examples of viral proteins that bind to SLiMs, or IDDs, of viral proteins as a part of complex assembly and regulation of protein functions. The protein-protein interactions are often crucial for viral replication, and may thus offer possibilities for innovative inhibitor design.
Whole genome and exome sequencing are reporting on hundreds of thousands of missense mutations. Taking a pan-disease approach, we explored how mutations in intrinsically disordered regions (IDRs) break or generate protein interactions mediated by short linear motifs. We created a peptide-phage display library tiling ~57,000 peptides from the IDRs of the human proteome overlapping 12,301 single nucleotide variants associated with diverse phenotypes including cancer, metabolic diseases and neurological diseases. By screening 80 human proteins, we identified 366 mutation-modulated interactions, with half of the mutations diminishing binding, and half enhancing binding or creating novel interaction interfaces. The effects of the mutations were confirmed by affinity measurements. In cellular assays, the effects of motif-disruptive mutations were validated, including loss of a nuclear localisation signal in the cell division control protein CDC45 by a mutation associated with Meier-Gorlin syndrome. The study provides insights into how disease-associated mutations may perturb and rewire the motif-based interactome.
Several novel high-throughput experimental techniques have been developed in recent years that generate large datasets of putative biologically functional peptides. However, many of the computational tools required to process these datasets have not yet been created. In this study, we introduce FaSTPACE, a fast and scalable computational tool to rapidly align short peptides and extract enriched specificity determinants. The tool aligns peptides in a pairwise manner to produce a position-specific global similarity matrix for each peptide. Peptides are realigned in an iterative manner scoring the updated alignment based on the global similarity matrices of the peptides and updating the global similarity matrices based on the new alignment. The method then iterates until the global similarity matrices converge. Finally, an alignment and consensus motif are extracted from the resulting global similarity matrices. The tool is the first to support custom weighting for the input peptides to satisfy the pressing need to include experimental attributes encoding peptide confidence in specificity determinant extraction. FaSTPACE exhibited state-of-the-art performance and accuracy when benchmarked against similar tools on motif datasets generated using curated peptides and high-throughput data from proteomic peptide phage display. FaSTPACE is available as an open-source Python package and a web server.
Translesion DNA synthesis (TLS) is a cellular process that enables the bypass of DNA lesions encountered during DNA replication and is emerging as a primary target of chemotherapy. Among vertebrate DNA polymerases, polymerase κ (Polκ) has the distinctive ability to bypass minor groove DNA adducts in vitro. However, Polκ is also required for cells to overcome major groove DNA adducts but the basis of this requirement is unclear. Here, we combine CRISPR base-editor screening technology in human cells with TLS analysis of defined DNA lesions in Xenopus egg extracts to unravel the functions and regulations of Polκ during lesion bypass. Strikingly, we show that Polκ has two main functions during TLS, which are differentially regulated by Rev1 binding. On the one hand, Polκ is essential to replicate across a minor groove DNA lesion in a process that depends on PCNA ubiquitylation but is independent of Rev1. On the other hand, through its cooperative interaction with Rev1 and ubiquitylated PCNA, Polκ appears to stabilize the Rev1-Polζ extension complex on DNA to allow extension past major groove DNA lesions and abasic sites, in a process that is independent of Polκ's catalytic activity. Together, our work identifies catalytic and noncatalytic functions of Polκ in TLS and reveals important regulatory mechanisms underlying the unique domain architecture present at the C-terminal end of Y-family TLS polymerases.
<h4>Motivation</h4>Short linear motifs (SLiMs) are compact functional modules that mediate low-affinity protein-protein interactions. SLiMs direct the function of many dynamic signalling and regulatory complexes playing a central role in most biological processes of the cell. Motif-binding determinants describe the contribution of each residue in a motif-containing peptide to the affinity and specificity of binding to the motif-binding partner. Motif-binding determinants are generally defined as a motif consensus pattern or a position-specific scoring matrix (PSSM) encoding quantitative preferences. Motif-binding determinant comparison is an important motif analysis task and can be applied to motif annotation, classification, clustering, discovery and benchmarking. Currently, binding determinant comparison is generally performed by analysing consensus similarity; however, this ignores important quantitative information in both the consensus and non-consensus positions.<h4>Results</h4>We have created a new tool, CompariPSSM, that quantifies the similarity between motif-binding determinants using sliding window PSSM-PSSM comparison and scores PSSM similarity using a randomisation-based probabilistic framework. The tool has been benchmarked on curated data from the eukaryotic linear motif database and experimental data from proteomic peptidephage display. CompariPSSM can be used for peptide classification to validate motif classes, peptide clustering to group functionally related conserved disordered regions, and benchmarking experimental motif discovery methods.<h4>Availability and implementation</h4>CompariPSSM is available at https://slim.icr.ac.uk/projects/comparipssm.