剑桥博士：美国新冠病毒类型比武汉的更原始，是第一代病毒 -6parkbbs.com

[返回科技频道首页]·[所有跟帖]·[ 回复本帖 ] ·[热门原创] ·[繁體閱讀]·[版主管理]

剑桥博士：美国新冠病毒类型比武汉的更原始，是第一代病毒

送交者: hgao[♂☆★★★★声望勋衔19★★★★☆♂] 于 2020-04-16 22:17 已读 1668 次 1 赞

hgao的个人频道

美国“病毒猎手”：新冠病毒可能已在人类中传播数月甚至数年
近日，国际知名学术期刊《美国科学院院报》发表了德国和英国研究团队合作撰写的有关新冠病毒变种的论文。论文指出根据研究结果，没有证据表明新冠病毒起源于中国武汉。4月14日，南都记者通过邮件联系了论文通讯作者，剑桥大学麦克唐纳考古研究所的彼得·福斯特博士。他指出，据其研究显示新冠病毒原始病毒类型（下称为“A型病毒”）在武汉只有极少数的感染病例，而在美国和澳大利亚却占了三分之二。

彼得·福斯特博士。

福斯特是知名研究人类起源和史前语言的专家。据他介绍，自20世纪90年代以来，他和他的同事们开发了用于追踪史前人类DNA的“系统进化网络”算法。这套算法这次被用来帮新冠病毒“寻根问祖”。

福斯特的论文分析了160个完整的新冠病毒全基因组。根据突变分析，研究人员将新冠病毒分为A型、B型和C型。其中A型与蝙蝠和穿山甲身上发现的病毒最为接近，是原始病毒类型，B型演化自A，C型演化自B。

福斯特指出，不同地区分布的新冠病毒变体类型差距极大。疫情在武汉发生时，最早被发现的是B型，这使得研究人员误以为B型病毒就是原始病毒。但事实上，A型病毒才是原始病毒。A型新冠病毒在武汉虽然也存在，但并不是该市样本中的主要病毒类型。来自武汉样本的主要病毒类型是B型，并且B型也是中国和东亚其他地区患者感染病毒的主要类型。而更原始的A型新冠病毒更多地存在于美国和澳大利亚的病例中，占了这两地研究病例的三分之二。

福斯特表示，他的研究推测新冠肺炎的首例感染病例发生在2019年9月13日到12月7日之间。因此2019年12月24日从武汉采样的病毒基因组不能准确地表明病毒起源。

福斯特称，目前已经有多名中国科学家就该篇论文与他联系。他们探讨了福斯特的研究如何用于评估新病毒变种对未来的威胁，以及该研究如何帮助开发治疗方法和疫苗。 6park.com

英文原文： 6park.com

Phylogenetic network analysis of SARS-CoV-2 genomes
Peter Forster, Lucy Forster, Colin Renfrew, and View ORCID ProfileMichael Forster 6park.com

PNAS first published April 8, 2020 https://doi.org/10.1073/pnas.2004999117 6park.com

6park.com

Contributed by Colin Renfrew, March 30, 2020 (sent for review March 17, 2020; reviewed by Toomas Kivisild and Carol Stocking)
6park.com

Article Figures & SI Info & Metrics PDF 6park.com

Significance

This is a phylogenetic network of SARS-CoV-2 genomes sampled from across the world. These genomes are closely related and under evolutionary selection in their human hosts, sometimes with parallel evolution events, that is, the same virus mutation emerges in two different human hosts. This makes character-based phylogenetic networks the method of choice for reconstructing their evolutionary paths and their ancestral genome in the human host. The network method has been used in around 10,000 phylogenetic studies of diverse organisms, and is mostly known for reconstructing the prehistoric population movements of humans and for ecological studies, but is less commonly employed in the field of virology. 6park.com

Abstract

In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide. 6park.com

SARS-CoV-2 evolutionsubtypeancestral type

The search for human origins seemed to take a step forward with the publication of the global human mitochondrial DNA tree (1). It soon turned out, however, that the tree-building method did not facilitate an unambiguous interpretation of the data. This motivated the development, in the early 1990s, of phylogenetic network methods which are capable of enabling the visualization of a multitude of optimal trees (2, 3). This network approach, based on mitochondrial and Y chromosomal data, allowed us to reconstruct the prehistoric population movements which colonized the planet (4, 5). The phylogenetic network approach from 2003 onward then found application in the reconstruction of language prehistory (6). It is now timely to apply the phylogenetic network approach to virological data to explore how this method can contribute to an understanding of coronavirus evolution.

In early March 2020, the GISAID database (https://www.gisaid.org/) contained a compilation of 253 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) complete and partial genomes contributed by clinicians and researchers from across the world since December 2019. To understand the evolution of this virus within humans, and to assist in tracing infection pathways and designing preventive strategies, we here present a phylogenetic network of 160 largely complete SARS-Cov-2 genomes (Fig. 1). 6park.com

6park.com

R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files. 6park.com

" data-icon-position="" data-hide-link-title="0" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; background-color: transparent; text-decoration-line: none; color: rgb(0, 90, 150); display: block; border: 0px; box-shadow: rgba(0, 0, 0, 0.15) 0px 2px 10px 0px; outline: 0px !important;">

6park.com

Download figure Open in new tab Download powerpoint 6park.com

Fig. 1.

Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files. 6park.com

Zhou et al. (7) recently reported a closely related bat coronavirus, with 96.2% sequence similarity to the human virus. We use this bat virus as an outgroup, resulting in the root of the network being placed in a cluster of lineages which we have labeled “A.” Overall, the network, as expected in an ongoing outbreak, shows ancestral viral genomes existing alongside their newly mutated daughter genomes.

There are two subclusters of A which are distinguished by the synonymous mutation T29095C. In the T-allele subcluster, four Chinese individuals (from the southern coastal Chinese province of Guangdong) carry the ancestral genome, while three Japanese and two American patients differ from it by a number of mutations. These American patients are reported to have had a history of residence in the presumed source of the outbreak in Wuhan. The C-allele subcluster sports relatively long mutational branches and includes five individuals from Wuhan, two of which are represented in the ancestral node, and eight other East Asians from China and adjacent countries. It is noteworthy that nearly half (15/33) of the types in this subcluster, however, are found outside East Asia, mainly in the United States and Australia.

Two derived network nodes are striking in terms of the number of individuals included in the nodal type and in mutational branches radiating from these nodes. We have labeled these phylogenetic clusters B and C.

For type B, all but 19 of the 93 type B genomes were sampled in Wuhan (n = 22), in other parts of eastern China (n = 31), and, sporadically, in adjacent Asian countries (n = 21). Outside of East Asia, 10 B-types were found in viral genomes from the United States and Canada, one in Mexico, four in France, two in Germany, and one each in Italy and Australia. Node B is derived from A by two mutations: the synonymous mutation T8782C and the nonsynonymous mutation C28144T changing a leucine to a serine. Cluster B is striking with regard to mutational branch lengths: While the ancestral B type is monopolized (26/26 genomes) by East Asians, every single (19/19) B-type genome outside of Asia has evolved mutations. This phenomenon does not appear to be due to the month-long time lag and concomitant mutation rate acting on the viral genome before it spread outside of China (Dataset S1, Supplementary Table 2). A complex founder scenario is one possibility, and a different explanation worth considering is that the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.

Type C differs from its parent type B by the nonsynonymous mutation G26144T which changes a glycine to a valine. In the dataset, this is the major European type (n = 11), with representatives in France, Italy, Sweden, and England, and in California and Brazil. It is absent in the mainland Chinese sample, but evident in Singapore (n = 5) and also found in Hong Kong, Taiwan, and South Korea.

One practical application of the phylogenetic network is to reconstruct infection paths where they are unknown and pose a public health risk. The following cases where the infection history is well documented may serve as illustrations (SI Appendix). On 25 February 2020, the first Brazilian was reported to have been infected following a visit to Italy, and the network algorithm reflects this with a mutational link between an Italian and his Brazilian viral genome in cluster C (SI Appendix, Fig. S1). In another case, a man from Ontario had traveled from Wuhan in central China to Guangdong in southern China and then returned to Canada, where he fell ill and was conclusively diagnosed with coronavirus disease 2019 (COVID-19) on 27 January 2020. In the phylogenetic network (SI Appendix, Fig. S2), his virus genome branches from a reconstructed ancestral node, with derived virus variants in Foshan and Shenzhen (both in Guangdong province), in agreement with his travel history. His virus genome now coexists with those of other infected North Americans (one Canadian and two Californians) who evidently share a common viral genealogy. The case of the single Mexican viral genome in the network is a documented infection diagnosed on 28 February 2020 in a Mexican traveler to Italy. Not only does the network confirm the Italian origin of the Mexican virus (SI Appendix, Fig. S3), but it also implies that this Italian virus derives from the first documented German infection on 27 January 2020 in an employee working for the Webasto company in Munich, who, in turn, had contracted the infection from a Chinese colleague in Shanghai who had received a visit by her parents from Wuhan. This viral journey from Wuhan to Mexico, lasting a month, is documented by 10 mutations in the phylogenetic network.

This viral network is a snapshot of the early stages of an epidemic before the phylogeny becomes obscured by subsequent migration and mutation. The question may be asked whether the rooting of the viral evolution can be achieved at this early stage by using the oldest available sampled genome as a root. As SI Appendix, Fig. S4 shows, however, the first virus genome that was sampled on 24 December 2019 already is distant from the root type according to the bat coronavirus outgroup rooting.

The described core mutations have been confirmed by a variety of contributing laboratories and sequencing platforms and can be considered reliable. The phylogeographic patterns in the network are potentially affected by distinctive migratory histories, founder events, and sample size. Nevertheless, it would be prudent to consider the possibility that mutational variants might modulate the clinical presentation and spread of the disease. The phylogenetic classification provided here may be used to rule out or confirm such effects when evaluating clinical and epidemiological outcomes of SARS-CoV-2 infection, and when designing treatment and, eventually, vaccines. 6park.com

Materials and Methods

The Global Initiative on Sharing Avian Influenza Data (GISAID) was founded in 2006, and, since 2010, has been hosted by the German Federal Ministry of Food, Agriculture and Consumer Protection. GISAID has also become a coronavirus repository since December 2019. As of 4 March 2020, the cutoff point for our phylogenetic analysis, the GISAID database (https://www.gisaid.org/) had compiled 254 coronavirus genomes, isolated from 244 humans, nine Chinese pangolins, and one bat Rhinolophus affinis (BatCoVRaTG13 from Yunnan Province, China). The sequences have been deposited by 82 laboratories listed in Dataset S1, Supplementary Table 1. Although SARS-CoV-2 is an RNA virus, the deposited sequences, by convention, are in DNA format. Our initial alignment confirmed an earlier report by Zhou et al. (7) that the pangolin coronavirus sequences are poorly conserved with respect to the human SARS-CoV-2 virus, while the bat coronavirus yielded a sequence similarity of 96.2% in our analysis, in agreement with the 96.2% published by Zhou et al. We discarded partial sequences, and used only the most complete genomes that we aligned to the full reference genome by Wu et al. (8) comprising 29,903 nucleotides. Finally, to ensure comparability, we truncated the flanks of all sequences to the consensus range 56 to 29,797, with nucleotide position numbering according to the Wuhan 1 reference sequence (8). The laboratory codes of the resulting 160 sequences and the bat coronavirus sequences are listed in Dataset S1, Supplementary Table 2 (Coronavirus Isolate Labels).

The 160 human coronavirus sequences comprised exactly 100 different types. We added to the data the bat coronavirus as an outgroup to determine the root within the phylogeny. Phylogenetic network analyses were performed with the Network 5011CS package, which includes, among other algorithms, the median joining network algorithm (3) and a Steiner tree algorithm to identify most-parsimonious trees within complex networks (9). We coded gaps of adjacent nucleotides as single deletion events (these deletions being rare, up to 24 nucleotides long, and mostly in the amino acid reading frame) and ran the data with the epsilon parameter set to zero, and performed an exploratory run by setting the epsilon parameter to 10. Both settings yielded a low-complexity network. The Steiner tree algorithm was then run on both networks and provided the identical result that the most-parsimonious trees within the network were of length 229 mutations. The structures of both networks were very similar, with the epsilon 10 setting providing an additional rectangle between the A and B clusters. The network output was annotated using the Network Publisher option to indicate geographic regions, sample collection times, and cluster nomenclature. 6park.com

Data Availability.

The nucleotide sequences of the SARS-CoV-2 genomes used in this analysis are available, upon free registration, from the GISAID database (https://www.gisaid.org/). The Network5011 software package and coronavirus network files are available as shareware on the Fluxus Technology website (https://www.fluxus-engineering.com/). 6park.com

Acknowledgments

We gratefully acknowledge the authors and originating and submitting laboratories of the sequences from GISAID’s EpiFlu(TM) Database on which this research is based. We are grateful to Trevor Bedford (GISAID) for providing instructions and advice on the database. A table of the contributors is available in Dataset S1, Supplementary Table 1. We thank Arne R?hl for assessing the network. 6park.com

Footnotes

?1To whom correspondence may be addressed. Email: pf223@cam.ac.uk or acr10@cam.ac.uk.

Author contributions: P.F. and M.F. performed research; P.F., L.F., and M.F. analyzed data; P.F. and M.F. performed statistical analyses; P.F., C.R., and M.F. wrote the paper; and C.R. wrote the Introduction.

Reviewers: T.K., Katholieke Universiteit Leuven; and C.S., University Medical Center Hamburg-Eppendorf.

The authors declare no competing interest.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2004999117/-/DCSupplemental.

Copyright ? 2020 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

6park.com

References

? 6park.com

R. L. Cann, M. Stoneking, A. C. Wilson, Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987). 6park.com

CrossRefPubMedGoogle Scholar 6park.com

? 6park.com

H. J. Bandelt, P. Forster, B. C. Sykes, M. B. Richards, Mitochondrial portraits of human populations using median networks. Genetics 141, 743–753 (1995). 6park.com

Abstract/FREE Full TextGoogle Scholar 6park.com

? 6park.com

H. J. Bandelt, P. Forster, A. R?hl, Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48 (1999). 6park.com

CrossRefPubMedGoogle Scholar 6park.com

? 6park.com

P. Forster, C. Renfrew, Evolution. Mother tongue and Y chromosomes. Science 333, 1390–1391 (2011). 6park.com

Abstract/FREE Full TextGoogle Scholar 6park.com

? 6park.com

C. Renfrew, P. Bahn, The Cambridge World Prehistory (Cambridge University Press, 2014). 6park.com

Google Scholar 6park.com

? 6park.com

P. Forster, C. Renfrew, Phylogenetic Methods and the Prehistory of Languages (McDonald Institute Press, 2006). 6park.com

Google Scholar 6park.com

? 6park.com

P. Zhou et al., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). 6park.com

CrossRefPubMedGoogle Scholar 6park.com

? 6park.com

F. Wu et al., A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020). 6park.com

CrossRefPubMedGoogle Scholar 6park.com

? 6park.com

T. Polzin, S. V. Daneshmand, On Steiner trees and minimum-spanning trees in hypergraphs. Oper. Res. Lett. 31, 12–20 (2003). 6park.com

CrossRefGoogle Scholar 6park.com

6park.com

链接如下： 6park.com

https://www.pnas.org/content/pnas/early/2020/04/07/2004999117.full.pdf 6park.com

喜欢hgao朋友的这个贴子的话，

请点这里投票，“赞”助支持！

[举报反馈]·[ hgao的个人频道 ]·[-->>参与评论回复]·[用户前期主贴]·[手机扫描浏览分享]·[返回科技频道首页]

帖子内容是网友自行贴上分享，如果您认为其中内容违规或者侵犯了您的权益，请与我们联系，我们核实后会第一时间删除。

所有跟帖: ( 主贴楼主有权删除不文明回复，拉黑不受欢迎的用户 )

要讲证据 (无内容) - 支持3银元 (0 bytes) 05/07/20
应该让邓亚萍剑桥博士发 (无内容) - 老新领导 (0 bytes) 04/18/20
isthatso 给 hgao 点“赞”支持3银元奖励！！ (无内容) - isthatso (0 bytes) 04/17/20
- hgao 给 isthatso 点“赞”支持3银元奖励！！ (无内容) - hgao (0 bytes) 04/17/20
国自然在1月底就公布了对SARS-CoV-2溯源性研究的支持计划。 - 董小骚很OK (434 bytes) 04/17/20

手机扫描进入，浏览分享更畅快！

楼主本栏目热帖推荐：

>>>>查看更多楼主社区动态...

[ 留园条例 ] [ 广告服务 ] [ 联系我们 ] [ 个人帐户 ] [ 版主申请 ] [ Contact us ]