(1) Sequences and metadata are automatically downloaded and ‘ingested’ on a daily basis. Data for https://covidcg.org is downloaded from a custom GISAID endpoint, but can be configured for any data provider, for example, GenBank (see Methods for details). (2) Based on best practices, we filter out sequences on NextStrain’s exclusion list, non-human SARS-CoV-2 genomes, <29,700,700 nt, or with >5% ambiguous base calls (van Dorp et al., 2020b). (3) Single-nucleotide variations (SNVs) at the nucleotide and amino acid level are determined by aligning (via bowtie2) each sequence to the WIV04 reference, a high quality December 2019 genome recommended by GISAID; NextStrain uses the 100% identical Wuhan-Hu-1 (Langmead et al., 2009). Importantly, spurious SNVs and probable sequencing errors are filtered out prior to downstream analysis. (4) Viral lineages (defined by the pangolin tool) and clades are provided by GISAID. In accordance with pangolin, SNVs present in >90% of sequences within each lineage are assigned as lineage-defining SNVs. (5) The curated data and metadata, SNVs, and lineage-assigned SNVs are associated with their respective sequence identifier and compiled into a compact data set. (6) These data are uploaded onto the COVID-19 CG web application. (7) New analyses will be built into the COVID-19 CG application throughout the course of the pandemic. (8–10) Features and modules that integrate knowledge from other COVID-19 initiatives are continuously incorporated into COVID-19 CG.