Beatrice Di Caro
Have you had your genome sequenced? If you asked this question five years ago, people would have thought you were crazy. Today, genome sequencing is already for sale in supermarkets. It costs under $600 and takes less than a week for an individual to have his or her genome sequenced, and can even be ordered online.
Fifteen years ago, the first ever genome sequencing project carried out on humans, called the Human Genome Project (HGP), was completed. It took more than 13 years to accomplish, cost more than $3 billion and drew on efforts from scientists and non-scientists around the world.
Since then, the entire landscape of genome sequencing has been revolutionised. It is hard to believe that we are now talking about sequencing the genomes of a million people. In fact, the rate of progress in genomics development resembles – if not surpasses – that of Moore’s Law in processor development. Tremendous advances have been made in the field of DNA sequencing – but this is just the beginning.
From HGP to personal genome, and from personal genome to genome for all
The completion of the HGP granted us knowledge of the entire structure of the human genome, including its length, sequences, number of protein-coding genes and the presence of a large number of repeats and noncoding sequences which were originally thought to be ‘junk DNA’.
Many questions are still to be solved, however. For instance, only 1.5% of genes encode proteins, and the functions of most genes are unknown. In addition, most genes do not function independently, but seem to participate in complex pathways, networks and systems. The HGP also raised awareness that subtle genomic variations can be associated with human diseases, including but not limited to cancer and genetic diseases.
The genomes of all individuals are nearly identical, and it is the small differences among them (around 0.1% of the genome), such as a single letter change, that result in the differences between us. Understanding differences in the genome is critical to understanding the principles and mechanisms of human health and disease. In order to do so, we emphasise here that more genome sequencing datasets based on larger populations are needed.
Five years after the HGP, the first personal genome was sequenced in 2008. In the same year, the first Asian personal genome – known as ‘YH’ – was published. Subsequently, a new era of large-scale personal genomics was inaugurated, including the first African personal genome, the first Korean personal genome and the first cancer patient genome.
As population studies began to reveal the role played by rare genetic variants in individuals’ predisposition to disease, more and more physicians now offer genetic screening to help with cancer diagnoses and medication guides. Studies of complex traits or disorders have enabled new discoveries of the genetic architectures of the autism spectrum disorders, adiponectin levels (which are linked to obesity) and height.
Rare variants also determine individual responses to drug treatment. For example, only people with lung cancer who hold certain mutations in the gene EGFR will respond to the tyrosine kinase inhibitors treatment. Moreover, approximately 6% of European people carrying HLA-B alleles have a life-threatening hypersensitivity to the antiretroviral drug abacavir, which is used to treat HIV.
To reveal more genetic impacts on diseases, many government-funded, population-scale sequencing programmes have been launched. The 1,000 genome project, the 10K project and the Icelandic genome project have greatly improved our understanding of global patterns of human genetic variations, human evolutionary history and the genetic contributions to many diseases.
More outcomes are expected soon, such as the UK’s 100,000 Genome project and other large-scale sequencing plans begun in US, Canada, France, Saudi Arabia, China, Korea and Australia that aim at sequencing from 100 to 1 million genomes.
However, as we embrace these advances in genomic medicine, we must also consider emerging problems caused by inconsistent results among underrepresented populations. Genetic research and disease treatments have traditionally been misled by biased genome datasets that are skewed by the overrepresentation of individuals from well-studied groups, such as the Caucasian population. Solving this problem calls for understanding of all genetic diversity – perhaps sequencing for all.
The sequencing race
The factor stopping us moving beyond million-level population genomics studies is the cost of sequencing. The cost per genome sequenced has been reduced from $3 billion to less than $1,000 in the past 15 years, and is expected to be further reduced by 10 times within the next five years.
The original Sanger sequencing method developed by Frederick Sanger in 1977 took weeks to sequence a short DNA fragment. A highly automatic and higher throughput, capillary-based Sanger sequencer was then invented in 1986; it enabled 96 DNA fragments to be sequenced at the same time with the read length at 1,000 base pairs and a price of less than $1 per base pair on its latest version of the sequencer.
Since 2005, the development of short-read sequencing technology (also called next generation sequencing, or NGS) has significantly reduced the cost of sequencing by thousands of times and improved the throughput on a similar scale.
In 2010, as a complement to existing low-cost short-read sequencing technology, single molecular long-read sequencers were invented. These can read 10,000 base pair DNA fragments, which can improve the sequencing coverage in regions of the genome previously inaccessible or difficult to analyse with short-read sequencers.
Due to the nature of the technology, however, long-read sequencers are limited in both efficiency and automation, while lower costs and higher throughputs can be achieved with short-read sequencers. The recent rapid development of image sensors has gone a long way to solving the efficiency problem, while developments in the field of AI and robotics are addressing the automation gap. We envision that the blossoming of technology within the genomics field will continue to push the frontier for ‘sequencing for all’ at affordable prices.
As the foundation of life information, our genome plays a very important role in our health management, disease diagnosis and treatment. The accumulation of genomic big data will enable greater understanding of all aspects of health and disease. We believe that genome sequencing will become part of everyone’s life. Everyone should get their own genomes sequenced. So, when will you have your genome sequenced?