Image: An illustration of a DNA strand and human figure made up of code representing DNA. Credit: Dr James Wright
As you sit reading, nestled away in the heart of each of your 15 trillion cells is a bundle of chemicals, known as DNA, encoding your unique source code, your genome. Written as a text document it would be less than 3 gigabytes. No more than a small file on your computer.
The difference between your code and the person next to you is only about one per cent but that small difference makes all the difference. In fact, as we live our lives, exposing ourselves to the world, the source code in our cells can become mutated so that there are tiny differences between your lung source code and the source code in your foot.
In 2004, the human genome was published and launched a genomics revolution. It has opened the floodgates for discovery about a vast range of diseases, fashioning a new data-driven era for medicine and biology. That famous ‘human genome’, however, is not real. It does not represent any one person.
Instead, it is an amalgamation, a patchwork, of human genomes, the most common baseline version of Homo sapiens. The study of the human genome is known as ‘genomics’ and for some time has mainly relied on this baseline genome. Now the age of the personal genome is dawning. With faster and more accurate DNA sequencing technologies, we are examining individual human and cancer-specific genomes.
Our DNA source code is an instruction manual compiled over millions of years. It has different parts, including some structural bridging that holds the manual together. Some bits are obsolete instructions superseded or no longer needed. Other parts regulate how and when the source code is read.
Finally, there are the core instructions describing how to build the many different proteins our cells need. It is hard to overstate just how crucial a role proteins play in our health and very existence. They form all our cellular structure and machinery, they carry messages around, and they interact to produce the reactions that make a cell alive. Proteins are the building blocks of life and the study of proteins is called ‘proteomics’.
Variations in the code
Variations in our DNA source code make us each unique individuals because of their effect on the proteins encoded in our genome. A variant gene can change a protein’s effectiveness, abundance and interactions. Slight variation in sets of proteins during development in the womb define our many physical attributes and to some extent our personality.
The significant impact of variation in our source code and resulting proteins can also render us less or more susceptible to diseases and change how our bodies and cells respond to particular drugs and treatments.
As we go about our daily lives the source code in our cells acquires more of these variations through exposure to different environmental factors that damage and mutate our DNA. Our cells have developed sophisticated systems to detect this damage and either repair the DNA or push the self-destruct button in the affected cells.
However, some mutations manage to persist and when a bad combination of pre-existing variants and mutations occurs, cells can lose control of their tightly regulated cellular machinery, metamorphosing into dreaded cancer cells. Left unchecked, these cells spread out of control forming a tumour, the malignancy and speed of which relates back to the individual’s source code and the specific mutations that have been acquired.
Cancer research is seeing a seismic shift towards integrated personal genomics and proteomics research in the rapidly developing field of proteogenomics. In this new cross-disciplined field we are collecting and sequencing genomics data specific to individuals or specific subtypes of cancer and using this data to feed information about variant and mutated proteins into proteomic experiments and analyses.
These experiments are able to characterise and accurately measure the presence of personal or cancer-specific proteins. Proteogenomic approaches leverage huge quantities of data to find key mutations and variants in a cancer so that we can learn which ones are linked to disease and the efficacy of different treatments.
The wider context of this proteogenomic research is that it leads to tailored and personalised treatments. One hugely promising avenue of clinical research stemming from this is the development of immunotherapies, which can tune and train our own immune systems to target specific cancers.
Cells are constantly presenting fragments of proteins, known as peptides or antigens, on their outer surface. These peptides act to signal the state and health of the cell to passing immune cells. They may come from viral or bacterial proteins which have infected the cell. Immune cells can detect these non-self peptides that the cell is holding up for inspection and learn to recognise them. This leads to an immunogenic response where the immune system attacks cells presenting these dangerous peptides to prevent the spread of infection.
The future of cancer therapy
Leveraging this system to direct the immune system to attack cancer cells, as it would cells infected with a virus, is the next big thing in cancer therapy. To be able to discover and develop cancer immunotherapies we need to detect cancer-specific antigen peptides and encourage their presentation to the immune system. Our source code, and that of every cancer, is different.
With rapid developments in personal proteogenomic research approaches, we can hunt down the variants and mutations specific to cancerous cells and use them to create new precision treatments. One of the biggest drawbacks of past approaches to cancer treatment has been their imprecision in targeting cancer cells. But this results in burning down the forest to get rid of a few diseased trees, causing side effects and damaging quality of life. Instead, we want to move away from such approaches.
We look forward to a future that promises bespoke cancer treatment. One in which therapies are tailored to the specific cancer, the individual, and their personal source code.
This piece is shortlisted for the 2021 Mel Greaves Science Writing Prize.
Read more entries from the finalists
Dr James Wright is a bioinformatician with over a decade of experience in genomics and proteomics research and has published widely in top research journals, including Science and Nature.
James specialises in proteogenomics and personal proteomics, using advanced computational and machine learning techniques in the investigation of multi-omics datasets to explore cancer development and the impact of novel cancer therapies.
In 2017, James joined Professor Jyoti Choudhary’s Functional Proteomics team at the ICR. He is also a Visiting Scientist at the European Bioinformatics Institute. Previously, he held positions as principal bioinformatician at the Wellcome Trust Sanger Institute, bioinformatician at AstraZeneca, and was a member of the GENCODE Project which led and was responsible for the comprehensive annotation of the human and mouse genomes. James was awarded a PhD in Techniques for Cross Species Proteomics jointly by the University of Manchester and University of Liverpool in 2010.