Zelalem H. Mekuria DVM, PhD.
Associate Editor-P2P e-Health Newsletter
Recently, I attended a scientific conference where a key note presenter argued the word “astronomical!” which we colloquially used to describe exponential increases of “things” fits only as an outdated concept, and he suggested the word “genomical!” instead. The proposal was well received by the crowed after he demonstrated the sheer scale of genomic date produced in the past two decades exceeding anything else human have generated in history.
Indeed, the last twenty years have been nothing less than the era of a genomic revolution. However, the story of the DNA, and our ability to read it, dates back to more than a century. In fact, the first Nobel Prize in medicine was awarded in 1910 to Albrecht Kossel for his discovery of the four bases of DNA molecule also known as nucleotides. Then, James Watson and Francis Crick came up with their super famous DNA double helix, deciphering the structural framework of the molecule. Another landmark event was the development of a method by Frederick Sanger and his colleagues called chain termination which has enabled researchers across the globe to sequence (read) stretches of genetic code from living organisms. Although these were novel and fundamental discoveries, the limited throughput they offered have constrained progress on DNA sequencing for some time. In this regard, the turning point was the enormous effort taken by US government to sequence the human genome through the human genome project.
The publicly funded scientific endeavor involved thousands of researchers and collaborators in the US and across the globe. The sequencing project required more than two billion US dollars and fifteen years to complete. At the end, the project managed to produce the first contiguous map of human chromosomes represented in three billion letters. But far beyond deciphering the human genetic landscape, the project served as the launching platform for so many futuristic approaches of that time, also now known as the second-generation (next-generation) sequencing technologies which certainly have spearheaded the genomic revolution of the following two decades.
Innovation in these new technologies incorporated massive parallelization of the process of DNA sequencing resulting in an exponential increase in the volume of data generated and at the same time significantly lowering the cost of DNA sequencing. To put it in perspective; compared to the mammoth investment of two billion dollars and fifteen years, a mammalian genome can now be sequenced with a few thousand dollars within three to five days. Yes, that is the leap of science and technological advancement championed in a time frame of twenty years.
The era of “the genomic revolution” has witnessed remarkable improvements even outpacing the “computing revolution” described by Moore’s law, which states that the complexity and number of computer microchips doubles approximately every two years, while the sequencing capabilities between 2004 and 2010 was doubling every five months. As result, genetic data-driven fields in medicine, agriculture, biotechnology, genetics, ecology and microbiology highly benefited from these advancements. New knowledge and insights in each of these disciplines provided opportunities to study the codes of life in unprecedented molecular details, thus leading to countless discoveries.
`However, despite the massive gains and progress, the speed at which it has occurred was one major challenge and setback to the revolution. In global terms, the sheer size of genomic data generated has simply outpaced our ability to analyze it. The current high-end sequencers on the market produce five to seven terabyte (TB) of data in an individual run. Theoretically, this amount can yield the completion of seven human genomes, with each covered in thirty times iteration. However, the data coming right out of the sequencing machine is in the form of highly fragmented DNA pieces, which requires a downstream computational analysis (assembly) to stitch it together. Performing these steps requires highly skilled professionals called bioinformaticians with a knowledge of computer code scripting. A seasoned bioinformatician working on the seven terabyte of DNA sequence data can take months to even years to complete a mailman genome. As a result, there is a huge backlog of genomic data waiting to be analyzed across the board. Additionally, the current competitive market for hiring bioinformaticians makes it more difficult to hold on a dedicated person in a position. The global trends to solve this problem has focused on expanding graduate level training in the area of bioinformatics to produce more professionals especially those with primary a biology background. However, the level of mastery required in the field is a considerable factor which thwarts speedy progress.
Apart from the side effects of its own fast growth, critics argue that the genomic revolution of the past two decades was only a scientific revolution of the western nations and their institutions. Except in a few instances, the revolution has failed largely to include the poor and middle-income countries of Asia, Latina America and Africa. Obviously, investment in the area molecular DNA sequencing would not come as a priority to the poor or for a government serving an underprivileged public. Equally, companies owning the rights to these technologies have not made any palpable effort to democratize sequencing into resource-limited countries by developing a compatible and innovative solutions. As result, discrepancies in the data can easily be noted by browsing through repositories like the GeneBank, where relevant genetic information from the third world countries is infrequent. or available when it is in the interest of collaboration by western counterparts.
In summary, predictions for the coming decades support strong continuation of the growth and widespread adoption of genomic technologies worldwide. Certainly, what has been achieved so far is only a subset of what can be done. With so many new technologies and next generation computing devices in sight, we are welcoming a new era of ubiquitous integration of DNA sequencing in every aspect of life. It also seems also inevitable on the part of companies to open up their technologies for the poor and middle- income countries, placing them as a new frontier of unexplored niches for new discoveries. Primary validation works on emerging sequencing platforms in Africa has already created a lot of optimism on the application genomic technologies and genomic science in resource-limited settings. The computational puzzle may remain elusive for some time, however, the emergence of cheap and affordable third generation long reads (sequences) technologies can mitigate the problems at least in part. Undoubtedly, applications to personalized precision medicine and diseases diagnostics are among the highly anticipated future directions of a continued genomic era.