Less than a year into the COVID-19 pandemic, genomic scientists have revealed more about the underlying SARS-CoV-2 virus than has ever been known about any organism in such a short period of time. “Everything is unprecedented about the pandemic, and genomics is no exception,” says Oliver Pybus, an infectious disease expert at the University of Oxford. At a recent virtual conference, many genomic scientists reported on key discoveries about the virus, how it spreads, where it came from, and more. Here are some highlights.

1. SARS-CoV-2 Has a Molecular Clock

Every organism on the planet acquires genetic mutations, and some are passed on to the broader population. The mutation rate varies tremendously across species, but RNA viruses like SARS-CoV-2 tend to see such changes faster than most. Scientists studying this coronavirus learned that its small genome acquires about two mutations per month. That’s slower than average, which explains why we haven’t seen major changes in the virus in the 10 months or so it’s been spreading. But it’s enough to let researchers calculate pretty close to exactly when the virus crossed over into humans. While one model suggests SARS-CoV-2 could have emerged as early as last August or September, most evidence points to it jumping into humans by mid-November of 2019.


2. Natural Origins

Early on, one of the biggest research efforts aimed at finding out where SARS-CoV-2 came from. That was urgent, especially amid concerns it may have been created in a lab. Today scientists have an unequivocal answer to that last part: “The data very clearly shows it did not come from a lab,” says Kristian Andersen, director of infectious disease genomics at Scripps Research. He’s able to be certain it was not bio-engineered because every element of the viral genome has been found in other coronaviruses naturally circulating among bats and pangolins, two common host species.

Its specific origin, though, has not yet been discovered. Jemma Geoghegan, a virologist at the University of Otago in New Zealand, says that while SARS-CoV-2 has close genetic relatives that have infected both bats and pangolins, nobody has yet tracked down an exact copy of this virus in any animal host. One of the challenges is how easily coronaviruses move from one host species to another. “There’s been a long history of host-jumping between these viruses,” she says. After the first SARS outbreak back in 2002, it took researchers two years to find the source; many human infections believed to come from animals have never been traced back to their original host.

And remember the Wuhan wet market where SARS-CoV-2 was initially believed to have emerged? Andersen, at least, now believes that this was more likely a very early superspreader event, rather than the place where this virus jumped to humans. Genomic data point to human-to-human transmission from the very beginning, rather than multiple introductions from animals to humans over time, as would be expected at something like the wet market.


3. Data Volume Matters

While the speed of genomic scientists’ response to COVID-19 was impressive — the first sequence of the viral genome was generated just a few weeks after the outbreak was reported to the World Health Organization — it’s the volume of data that has been game-changing. The early genome sequencing projects “really put us off to the races” for developing diagnostics, finding possible treatments, and more, says Phil Febbo, chief medical officer at DNA sequencer manufacturer Illumina. “Now we’re seeing sequencing play a role across the range of activities of this pandemic.”

In fact, scientists have now sequenced tens of thousands of SARS-CoV-2 strains, by collecting and analyzing the virus from infected individuals. That trove of data has made it possible to track the spread of the virus — not only around the world, but to reconstruct transmission chains within countries and communities. It also allowed scientists to spot a new strain of the virus, known as D614G, that quickly outcompeted the original strain and became the dominant cause of COVID-19 around the world. Compared to the original, D614G is better at spreading among humans but leads to about the same severity of disease.

4. First Confirmed Reinfection

While fears swirled early on that COVID-19 patients were getting reinfected with the virus, such cases were later chalked up to false negative or false positive diagnostic results. It was genomic data that actually confirmed what is believed to be the first known case of reinfection: a 33-year-old man in Hong Kong who tested positive for the virus in March and again in August. Genome analysis showed that the second illness was actually caused by a slightly different strain of the virus than the one the patient had initially, indicating two separate infection events.

5. ‘Pandemic Potential’

Unlike its viral predecessors, SARS-CoV-2 emerged in humans with a couple of traits that made it extremely good at spreading and at causing serious disease: first, strong binding to a receptor commonly found on human cells, and second, remarkable efficiency in its ability to hijack and commandeer those cells. Typically, viruses can do one or the other well, but not both. Worryingly, subsequent genomic studies of coronaviruses living in bats and other animal hosts have found these same elements co-existing in other strains that could cross over to humans with the right opportunity. Andersen refers to this as “pandemic potential,” noting that viruses with high infection efficiency and the ability to target human cells are widely circulating in bat populations. This points to a key role for genomics in spotting new outbreaks and preventing pandemics. “Genomic epidemiology can and should be more tightly integrated into outbreak surveillance and control,” Pybus emphasizes.