Taxonium, a web-based tool for exploring large phylogenetic trees

  1. Theo Sanderson  Is a corresponding author
  1. The Francis Crick Institute, United Kingdom


The COVID-19 pandemic has resulted in a step change in the scale of sequencing data, with more genomes of SARS-CoV-2 having been sequenced than any other organism on earth. These sequences reveal key insights when represented as a phylogenetic tree, which captures the evolutionary history of the virus, and allows the identification of transmission events and the emergence of new variants. However, existing web-based tools for exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2. We have developed Taxonium, a new tool that uses WebGL to allow the exploration of trees with tens of millions of nodes in the browser for the first time. Taxonium links each node to associated metadata and supports mutation-annotated trees, which are able to capture all known genetic variation in a dataset. It can either be run entirely locally in the browser, from a serverbased backend, or as a desktop application. We describe insights that analysing a tree of five million sequences can provide into SARS-CoV-2 evolution, and provide a tool at for exploring a public tree of more than five million SARS-CoV-2 sequences. Taxonium can be applied to any tree, and is available at, with source code at

Data availability

All code is available on GitHub. Data was not generated as part of this study. Data sources are indicated in the manuscript and raw data is available in all cases, without the need for requests.

The following previously published data sets were used

Article and author information

Author details

  1. Theo Sanderson

    The Francis Crick Institute, London, United Kingdom
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4177-2851


Wellcome Trust (210918/Z/18/Z)

  • Theo Sanderson

Wellcome Trust (FC001043)

  • Theo Sanderson

Cancer Research UK (FC001043)

  • Theo Sanderson

Medical Research Council (FC001043)

  • Theo Sanderson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Richard A Neher, University of Basel, Switzerland

Publication history

  1. Received: August 2, 2022
  2. Accepted: October 27, 2022
  3. Accepted Manuscript published: November 15, 2022 (version 1)


© 2022, Sanderson

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.


  • 634
    Page views
  • 136
  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Theo Sanderson
Taxonium, a web-based tool for exploring large phylogenetic trees
eLife 11:e82392.
  1. Further reading

Further reading

    1. Epidemiology and Global Health
    2. Genetics and Genomics
    Amy L Roberts, Alessandro Morea ... Kerrin S Small
    Research Article


    Ageing is a heterogenous process characterised by cellular and molecular hallmarks, including changes to haematopoietic stem cells and is a primary risk factor for chronic diseases. X chromosome inactivation (XCI) randomly transcriptionally silences either the maternal or paternal X in each cell of 46, XX females to balance the gene expression with 46, XY males. Age acquired XCI-skew describes the preferential selection of cells across a tissue resulting in an imbalance of XCI, which is particularly prevalent in blood tissues of ageing females, and yet its clinical consequences are unknown.


    We assayed XCI in 1575 females from the TwinsUK population cohort using DNA extracted from whole blood. We employed prospective, cross-sectional, and intra-twin study designs to characterise the relationship of XCI-skew with molecular and cellular measures of ageing, cardiovascular disease risk, and cancer diagnosis.


    We demonstrate that XCI-skew is independent of traditional markers of biological ageing and is associated with a haematopoietic bias towards the myeloid lineage. Using an atherosclerotic cardiovascular disease risk score, which captures traditional risk factors, XCI-skew is associated with an increased cardiovascular disease risk both cross-sectionally and within XCI-skew discordant twin pairs. In a prospective 10 year follow-up study, XCI-skew is predictive of future cancer incidence.


    Our study demonstrates that age acquired XCI-skew captures changes to the haematopoietic stem cell population and has clinical potential as a unique biomarker of chronic disease risk.


    KSS acknowledges funding from the Medical Research Council [MR/M004422/1 and MR/R023131/1]. JTB acknowledges funding from the ESRC [ES/N000404/1]. MM acknowledges funding from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, Chronic Disease Research Foundation (CDRF), Zoe Global Ltd and the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London.

    1. Epidemiology and Global Health
    2. Immunology and Inflammation
    James A Hay, Stephen M Kissler ... Yonatan H Grad
    Research Article

    Background: The combined impact of immunity and SARS-CoV-2 variants on viral kinetics during infections has been unclear.

    Methods: We characterized 1,280 infections from the National Basketball Association occupational health cohort identified between June 2020 and January 2022 using serial RT-qPCR testing. Logistic regression and semi-mechanistic viral RNA kinetics models were used to quantify the effect of age, variant, symptom status, infection history, vaccination status and antibody titer to the founder SARS-CoV-2 strain on the duration of potential infectiousness and overall viral kinetics. The frequency of viral rebounds was quantified under multiple cycle threshold (Ct) value-based definitions.

    Results: Among individuals detected partway through their infection, 51.0% (95% credible interval [CrI]: 48.3-53.6%) remained potentially infectious (Ct<30) five days post detection, with small differences across variants and vaccination status. Only seven viral rebounds (0.7%; N=999) were observed, with rebound defined as 3+ days with Ct<30 following an initial clearance of 3+ days with Ct≥30. High antibody titers against the founder SARS-CoV-2 strain predicted lower peak viral loads and shorter durations of infection. Among Omicron BA.1 infections, boosted individuals had lower pre-booster antibody titers and longer clearance times than non-boosted individuals.

    Conclusions: SARS-CoV-2 viral kinetics are partly determined by immunity and variant but dominated by individual-level variation. Since booster vaccination protects against infection, longer clearance times for BA.1-infected, boosted individuals may reflect a less effective immune response, more common in older individuals, that increases infection risk and reduces viral RNA clearance rate. The shifting landscape of viral kinetics underscores the need for continued monitoring to optimize isolation policies and to contextualize the health impacts of therapeutics and vaccines.

    Funding: Supported in part by CDC contract #200-2016-91779, a sponsored research agreement to Yale University from the National Basketball Association contract #21-003529, and the National Basketball Players Association.