Read Time: 726 words, 7 minutes

How genes taught us about the origin of novel coronavirus

A group of scientists collected samples from some of the initially infected COVID-19 patients to show its similarities to other coronavirus infections in humans and animals, allowing them to track where it may have come from.

Image Credit: Wikimedia Commons

The situation regarding the new, or novel coronavirus, also known as COVID-19, has become more and more serious in the past few months, leading to a global pandemic that has affected every aspect of life for billions of people. As researchers work to find vaccines and help those who are already infected, the question inevitably comes up: how did this all start?

In order to find the origins of the virus, first named 2019-nCoV at the time that this paper was published, researchers in China decided to go to the source of the outbreak in Wuhan. They took throat swab samples from nine patients hospitalized in late December for pneumonia, but no previously known virus or bacteria was found. Eight of the nine patients had visited the Huanan seafood market in Wuhan, where a variety of non-aquatic animals were also sold.

Assuming it was viral pneumonia, scientists were able to extract small fragments of the viral RNA from the patient throat samples. The ends of these small RNA fragments overlapped with other fragments, allowing the scientists to piece them together like a puzzle to reconstruct larger fragments of RNA. These larger fragments of RNA were similar to RNA sequences in a coronavirus found in bats, bat-SL-CoVZC45. This allowed the scientists to use the genome of the bat-derived coronavirus as a reference to map together the entire genome of 2019-nCoV, similar to putting together puzzle pieces by looking at the picture on the front of the puzzle box. However, since this was a brand-new virus, the bat coronavirus (the “picture on the box” in this case) was only an approximation, making this an incredibly difficult puzzle to put together.

Using these methods, the researchers were able to put together 8 whole 2019-nCoV genomes that were 99.98% similar to one another. The fact that the 2019-nCoV viral sequences extracted from different patients were so similar is striking. Over time, mutations will occur naturally in any living organism. However, a high amount of sequence similarity across genomes taken from different samples indicates that not many mutations have occurred and that therefore not much time has passed between infections. As a result, the scientists concluded that 2019-nCoV came from one source and started to rapidly infect human populations very recently over a short period of time.

2019-nCoV has a genome that matches closely to two bat-derived coronaviruses, indicating that bats are likely the original hosts of this virus. However, the authors of the paper note that there is likely another animal acting as an intermediate host between bats and humans, given that bats were not found or sold at the Huanan seafood market and that bats species in Wuhan hibernate in late December. Additionally, a sequence similarity of 87% is not close enough to indicate direct ancestry, so that means the virus may have gone through changes in the animal host before reaching humans.

In the past, there have been other coronaviruses that have infected human populations and caused deaths, such as the severe acute respiratory syndrome coronavirus (SARS-CoV) and the Middle East respiratory syndrome coronavirus (MERS-CoV). The researchers of this paper compared the sequence of the 2019-nCoV with those of SARS-CoV and MERS-CoV and found a 79% and 50% similarity, respectively, indicating that the current coronavirus is even less related to these viruses.

Coronaviruses have a characteristic structure known as a spike protein. This structure helps the virus attach itself to the outside of a cell and ultimately infect it. In this paper, researchers found that the spike protein in 2019-nCoV was longer compared to SARS-CoV, MERS-CoV, and the two bat-derived coronaviruses. Interestingly, while 2019-nCoV was more closely related to the two bat coronaviruses overall, the part of the spike protein that interacts immediately with human cells, known as the receptor-binding domain, is more similar to that of the 2003 SARS virus. This suggests that the 2003 SARS and the present 2019 virus may enter human cells at the same entry point.

In conclusion, the authors of this paper were able to determine the genomic structure of 2019-nCoV. Using this, they were able to gain some insight into the virus’s origins and describe how wild animals are potentially hosts of pathogenic and dangerous viruses that can spill over into human populations. More studies are required to fully understand how 2019-nCoV infects human cells so well and what the potential intermediate host may be between bats and humans.

Study Information

Original study: Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

Study was published on: January 30, 2020

Study author(s): Roujian Lu*, Xiang Zhao*, Juan Li*, Peihua Niu*, Bo Yang*, Honglong Wu*, Wenling Wang, Hao Song, Baoying Huang, Na Zhu, Yuhai Bi, Xuejun Ma, Faxian Zhan, Liang Wang, Tao Hu, Hong Zhou, Zhenhong Hu, Weimin Zhou, Li Zhao, Jing Chen, Yao Meng, Ji Wang, Yang Lin, Jianying Yuan, Zhihao Xie, Jinmin Ma, William J Liu, Dayan Wang, Wenbo Xu, Edward C Holmes, George F Gao, Guizhen Wu¶, Weijun Chen¶, Weifeng Shi¶, Wenjie Tan¶

The study was done at:

The study was funded by:

Raw data availability:

Featured image credit: Wikimedia Commons

This summary was edited by: Gina Misra