Published graphs in the context of automatically found graphs.
We compared 22 different graphs from 8 publications to alternative graphs inferred on the same or very similar data; these findGraphs runs are highlighted in blue in the ‘Iterations’ column. In total, 51 findGraphs runs are summarized here since in some cases models more complex or less complex than the published one were explored and/or different population compositions were tested (see the ‘Topology search constraints and population modifications’ column and the footnotes for details). The columns with names in blue show various information on the published graphs or their modified versions and some properties of the published population sets. The columns with names in magenta show settings used for calculating f-statistics and for exploring the AG space, and the number of SNPs used that depends on them. The columns with names in black summarize the outcomes of findGraphs runs, that is, the properties of alternative model sets found.
Publication: Last name of the first author and year of the relevant publication.
Figure in the original publication: Figure number in the original paper where the AG is presented.
Groups (populations): The number of populations in each graph.
Singleton pseudo-haploid populations: The number of populations in the graph composed of a single pseudo-haploid individual. Calculation of negative ‘admixture’ f3-statistics is impossible for such populations since their heterozygosity cannot be estimated (see the text for details).
No. of negative f3-stats (allsnps: YES): The number of negative f3-statistics among all possible f3-statistics for a given set of populations when all available sites are used for each statistic. If no negative f3-statistics exist for a set of populations, AG fits are not affected by the ‘minac2=2’ setting intended for accurate calculation of f-statistics for non-singleton pseudo-haploid groups.
Admixture events: The number of admixture events in each graph.
Publ. model: log-likelihood (LL): Log-likelihood score of the published graph fitted to the SNP set shown in the ‘SNPs used’ column.
Publ. model: LL, median of bootstrap distr.: Median of the log-likelihood scores of 100 or 500 fits of the published graph using bootstrap resampled SNPs.
Publ. model: worst residual (WR), SE: The worst f-statistic residual of the published graph fitted to the SNP set shown in the ‘SNPs used’ column, measured in standard errors (SE).
SNPs used: The number of SNPs (with no missing data at the group level) used for fitting the AG. For all case studies, we tested the original data (SNPs, population composition, and the published graph topology) and obtained model fits very similar to the published ones. However, for the purpose of efficient topology search we adjusted settings for f3-statistic calculation, population composition, or graph complexity as shown here and discussed in the text.
Settings for calculating f2-statistics: Arguments of the extract_f2 function used for calculating all possible f2-statistics for a set of groups, which were then used by findGraphs for calculating f3-statistics needed for fitting AG models. See Appendix 1, Section 2.A for descriptions of each argument.
Topology search constraints and population modifications: Constraints applied when generating random starting graphs and/or when searching the topology space. Modifications of the original population composition are also described in this column, where applicable.
Iterations: The number of findGraphs iterations (runs), each started from a random graph of a certain complexity. For each case study, findGraphs setups that were considered optimal are highlighted in blue in this column.
Iterations confirming published graph: The number of iterations (runs) in which the resulting graph was topologically identical to the published graph. In the cases, when the published model was irrelevant since more complex graphs were explored, ‘N/A’ appears in this and subsequent columns. If less complex models were explored, the published model was still relevant since its version without selected admixture edges was tested.
Distinct alternative topologies found: The number of distinct newly found topologies. If graph complexity was equal to (or less than) that of the published graph, the published topology (or its simplified version) is not counted here. If graph complexity exceeded that of the published graph, all newly found topologies are counted. If the published topology was recovered by findGraphs, the numbers in this column are shown in bold.
Significantly better fitting topologies: The number of distinct topologies that fit significantly better than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value <0.05). If the number of distinct topologies was very large, a representative sample of models (1/20 to 1/3 of models evenly distributed along the log-likelihood spectrum) was compared to the published one instead. These cases are marked as ‘a fraction of models tested’ in this column. If model complexity was higher than that of the published model, model comparison was irrelevant and was not performed.
Non-significantly better fitting topologies: The number of distinct topologies that fit non-significantly (nominally) better than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value ≥0.05).
Non-significantly worse fitting topologies: The number of distinct topologies that fit non-significantly (nominally) worse than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value ≥0.05).
Significantly worse fitting topologies: The number of distinct topologies that fit significantly worse than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value <0.05).
Significantly better fitting topologies, %: The percentage of distinct topologies that fit significantly better than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value <0.05). If the number of distinct topologies was very large, a representative sample of models (1/20 to 1/3 of models evenly distributed along the log-likelihood spectrum) was compared to the published one instead, and the percentages in this and following columns were calculated on this sample.
Non-significantly better fitting topologies, %: The percentage of distinct topologies that fit non-significantly (nominally) better than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value ≥0.05).
Non-significantly worse fitting topologies, %: The percentage of distinct topologies that fit non-significantly (nominally) worse than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value ≥0.05).
Significantly worse fitting topologies, %: The percentage of distinct topologies that fit significantly worse than the published graph according to the bootstrap model comparison test (two-tailed empirical p-value <0.05).
p-value best alternative vs. publ.: An empirical two-tailed p-value of a test comparing log-likelihood distributions across bootstrap replicates for two topologies, the highest-ranking newly found topology and the published topology. In some cases, the highest-ranking newly found topology (according to LL) has a fit that is not significantly better than that of the published model, but other newly found models fit significantly better despite having higher LL. P-values below 0.05 are highlighted in green.
Used in Table 1: Here the findGraphs runs featured in Table 1 are marked.