1. Evolutionary Biology
Download icon

Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits

  1. Melissa L Spear  Is a corresponding author
  2. Alex Diaz-Papkovich
  3. Elad Ziv
  4. Joseph M Yracheta
  5. Simon Gravel
  6. Dara G Torgerson
  7. Ryan D Hernandez  Is a corresponding author
  1. Biomedical Sciences Graduate Program, University of California, San Francisco, United States
  2. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
  3. McGill Genome Centre, McGill University, Canada
  4. Department of Human Genetics, McGill University, Canada
  5. Quantitative Life Sciences Program, McGill University, Canada
  6. Division of General Internal Medicine, University of California, San Francisco, United States
  7. Department of Medicine, University of California, San Francisco, United States
  8. Institute of Human Genetics, University of California, San Francisco, United States
  9. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, United States
  10. Native BioData Consortium, United States
  11. Bloomberg School of Public Health, Johns Hopkins University, United States
  12. Department of Epidemiology and Biostatistics University of California, San Francisco, United States
  13. Bakar Computational Health Sciences Institute, University of California, San Francisco, United States
  14. Quantitative Biosciences Institute, University of California, San Francisco, United States
Research Article
  • Cited 0
  • Views 1,943
  • Annotations
Cite this article as: eLife 2020;9:e56029 doi: 10.7554/eLife.56029

Abstract

People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.

Introduction

The United States Census Bureau refers to the Hispanic/Latino ethnicity as a category for individuals who self-identify as ‘a person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race (United States Government, Executive Office of the President, Office of Management and Budget, Office of Information and Regulatory Affairs, 1997). As such, this broad ethnic group living in the United States is a culturally, phenotypically, and genetically diverse continuum of populations. Individuals who identify as Hispanic/Latino have varying proportions of Amerindigenous, African, and European genetic ancestries, each with their own unique continental demographic history. Demographic forces such as population bottlenecks, expansions, and migration as well as adaptation to novel environments resulted in observable differences in continental patterns of genetic variation (Nelson et al., 2008; Abecasis et al., 2012; Auton et al., 2015). These differing patterns were shaped by many historical events of migration which included the founding of the Americas by Amerindigenous populations, the colonization by Europeans, and the African slave trade (Gravel et al., 2013; Homburger et al., 2015; Moreno-Estrada et al., 2014; Moreno-Estrada et al., 2013; Reich et al., 2012; Bryc et al., 2015; Conomos et al., 2016; Han et al., 2017; Baharian et al., 2016; Jordan et al., 2019; Micheletti et al., 2020). However additional complexities surrounding these events remain highly understudied.

Demographic history has shaped the genetic architecture of modern human phenotypic variation (Agarwala et al., 2013; Eyre-Walker, 2010; Maher et al., 2013; Simons et al., 2014; Uricchio et al., 2016; Yang et al., 2015), and is critical to consider in the search for the genetic basis of complex diseases. The demography of the United States has changed drastically over the 20th century, and by 2044 is predicted to become a ‘minority-majority’ country whereby no one racial/ethnic group comprises more than 50% of the population. By 2060 Hispanics/Latinos are projected to make up 29% of the US population or 119 million individuals (Colby and Ortman, 2015). However, to date, population-based medical genomics research [and its subsequent benefits, including polygenic risk score (PRS) profiling] have been disproportionately focused on individuals of European descent, with the findings primarily benefiting European populations (Bustamante et al., 2011; Martin et al., 2019). Despite the increases in sample sizes, rates of discovery, and traits studied, Hispanic or Latin American participation in genome-wide association studies (GWAS) has continued to hover around 1% (Popejoy and Fullerton, 2016; Mills and Rahal, 2019). This trend, along with factors ranging from research abuse and community mistrust to community superstition and apathy have led to a situation where these populations (and other non-European populations) are particularly vulnerable to falling behind in receiving the benefits of the precision medicine revolution (Martin et al., 2019; Popejoy and Fullerton, 2016).

In this study, we utilize the largest genetic study of Hispanics/Latinos in the U.S. to date -- the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) (Conomos et al., 2016) -- to understand how patterns of genetic variation in Hispanic/Latino populations in the United States have changed over the last century, and evaluate the impact such changes may be having on complex traits.

Results

Global ancestry proportions among HCHS/SOL Hispanic/Latino Populations

Using the subset of sites that overlapped with our African, European, and Amerindigenous reference panels, we called 3-way global ancestry estimates for 10,268 unrelated HCHS/SOL individuals (see Materials and methods). Figure 1A summarizes the global ancestry proportions shaded by admixture estimates in a ternary plot, recapitulating the original HCHS/SOL analysis of continental ancestry (Conomos et al., 2016). However, while several population groups appear to have overlapping ancestry proportions (Figure 1B), this analysis masks more subtle structure in subcontinental ancestry. To investigate subtle population structure across these self-identified population groups, we performed UMAP on the top three principal components (see Materials and methods and Figure 1—figure supplement 1C), and find substantial structure across self-identified groups (Figure 1C–D). We find that Dominicans, who have the highest average proportions of African ancestry, are in the middle, with Puerto Ricans and Cubans, diverging in opposite directions (Figure 1D) with clines of increasing European ancestry proportions (Figure 1C). Further, while self-identified Mexican, Central, and South American groups appear to have overlapping ancestry proportions in Figure 1A–B, UMAP represents the Mexican Americans and Central/South American groups as large, separate wings that diverge from self-identified Cubans and Dominicans, with both clusters diverging with clines of increasing ancestry toward different Amerindigenous (AI) populations (Figure 1C–D and Figure 1—figure supplement 1B). When we included multiple European and African reference populations in our analyses as well as without reference populations, UMAP maintained the representation of separate clusters for each of the HCHS/SOL populations (Figure 1—figure supplement 1C–G). These clusters with varying AI ancestries are consistent with Conomos et al., 2016, however the UMAP embedding consolidates the signal present in the top three PCs into a succinct two-dimensional visualization.

Figure 1 with 1 supplement see all
Genomic ancestry and population structure in HCHS/SOL.

(A) Ternary plot of HCHS/SOL (n = 10,268) colored by admixture proportions. (B) Ternary plot of global ancestry proportions colored by population for 10,268 HCHS/SOL individuals (C) Uniform Manifold Approximation and Projection (UMAP) plot depicting the genetic diversity of HCHS/SOL and the reference panel (n = 10,591) using three principal components, colored by admixture proportions Within the legend, AFR, EUR, and AI refer to African, European, and Amerindigenous global ancestries, respectively. (D) UMAP plot of HCHS/SOL and the reference panel (n = 10,591) using three principal components, colored by HCHS/SOL population.

Dynamic global ancestry proportions in Mexican Americans

For each of the HCHS/SOL populations, we evaluated differences in global ancestry estimates over time while accounting for the sampling method (referred to as ‘sampling weight’, see Materials and methods) used for the design of the HCHS/SOL study (Sorlie et al., 2010). We found that in all populations, the effect size for AI ancestry on birth year is positive, though only statistically significant after multiple testing in the Mexican American (β=0.0023; 95% CI:0.0021–0.0025, p=3.58E-22; Figure 2A–B) and Central American (β=0.0013; 95% CI:0.0009–0.0017, p=0.0013) cohorts (Supplementary file 1). Due to the larger sample size, magnitude of the effect, and statistical significance, we shift our focus to Mexican Americans. In Mexican Americans, the increase in AI global ancestry over time was consistent across multiple data stratifications including recruitment region, US-born or not US-born, educational attainment, and gender (Table 1 and Supplementary file 2), and was robust to alternative methods for estimating global ancestry proportions (e.g. based on the summation of RFMix local ancestry estimates; Figure 2—figure supplements 1 and 2). We identified significant differences in AI ancestry between recruitment region (t-test, 95% CI:0.12–0.15, p<2.2E-16), US-born or not US-born individuals (t-test, 95% CI:0.06–0.09, p<2.2E-16), and educational attainment, which can be considered a proxy for socioeconomic status (one-way ANOVA, p<2E-16). In order to further assess changes in global ancestry distributions over time, we performed bootstrap resampling over individuals (n = 1000) of global AI ancestry for the Mexican Americans. We observed a consistent increase in AI ancestry with fitted locally estimated scatterplot smoothing (LOESS; Figure 2B) when individuals were binned by birth year decades (Figure 2—figure supplement 3). On average, global AI ancestry has increased ~20% over the 50 year period for Mexican Americans born from 1940 to 1990.

Figure 2 with 5 supplements see all
Amerindigenous ancestry has increased over time in Mexican Americans.

(A) Global Amerindigenous ancestry proportions plotted by birth year for Mexican Americans (n = 3,622). Fitted line is multiple regression of Amerindigenous ~ birth year + sampling weight. Bars represent 95% confidence intervals for individuals grouped by decade. (B) Bootstrap resampling (n = 1000 iterations) of Amerindigenous global ancestry for the Mexican American individuals with a fitted LOESS curve for each iteration. Dashed lines represent the 95% quantile range of LOESS curves and the blue line represents the fitted regression line from A.

Table 1
Relationship of Amerindigenous global ancestry and birth year for Mexican Americans stratified by recruitment region, US-born vs non-US-born status, gender and educational attainment.

For recruitment region, data stratification was limited to Chicago and San Diego as sample size for the Bronx and Miami was limited: 124 and 25 individuals, respectively. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS). The significance threshold was set at 0.006 using Bonferroni correction for multiple testing (0.05/9).

CategoryNMeanMedianR2EffectStd.errp
All36220.4890.4680.0270.00230.00023.58E-22
Chicago13100.5620.5500.0170.00160.00050.0006
San Diego21630.4280.4220.0120.00120.00024.29E-07
US-born6340.4270.4180.0630.00270.00041.77E-10
Non US-born29870.5020.4810.0500.00320.00031.38E-30
Male15000.4940.4750.0380.00280.00043.83E-14
Female21220.4850.4620.0220.00190.00033.07E-10
<HS15180.5200.5000.0450.00260.00041.39E-12
= HS9600.5010.4790.0220.00180.00050.0003
>HS11400.4360.4220.0450.00270.00046.53E-13

We replicated the increase in global AI ancestry over time in a smaller, independent cohort of self-identified Mexican Americans (n = 705) from the Health and Retirement Study (HRS) (Fisher and Ryan, 2018). The HRS Mexican Americans are older compared to the HCHS/SOL Mexican Americans (birth year distribution: 1915–1981; mean = 1943, median:1942) and have lower levels of global AI ancestry on average (mean = 0.29), but we still observed an increase in global AI ancestry over time (β=0.00082; 95% CI: 0.0005–0.0012; p=0.02; Figure 2—figure supplement 4A). We performed 1000 bootstrap resampling iterations of the linear regression model (global AI ancestry ~birth year) fitted to the data. From these resampling iterations, 98.2% of the tests had a slope >0% and 61.5% of the regression p-values were less than 0.05 (Figure 2—figure supplement 4B–4D).

A previous study (Baharian et al., 2016) identified ancestry biased migration in African Americans where individuals with higher proportions of European ancestry migrated first out of the South during the Great Migration followed by individuals with higher proportions of African ancestry. We hypothesized that a similar process occurred in US Hispanic/Latino populations, whereby earlier immigrants to the US had higher proportions of European ancestry followed by recent immigrants having higher proportions of global AI ancestry. In our non-US-born individuals (N = 2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (β=−0.0009; 95% CI: −0.0012, −0.0006; p=0.0006) suggesting that individuals who arrived earlier to the US had less AI ancestry. However, accounting for this did not change the effect of birth year on the proportion of global AI ancestry (β=0.0028; 95% CI: 0.0025–0.0031; p<2E-16) suggesting that ancestry biased migration does not fully explain the dynamic AI ancestry patterns we have inferred.

For US-born individuals we assessed whether parental birthplace could explain the increases in global AI ancestry. Of the 634 US-born individuals, 385 had parents both born outside of the US, 149 had one parent born outside of the US, and 97 had both parents born within the US. We tested a model with an interaction between estimated birth year and the number of parents born in the United States. We found a strong positive relationship between estimated birth year and increase in AI ancestry for those with both parents born outside the US, who formed the baseline group in this model (β=0.004; 95% CI: 0.0034-0.0046; p=4.85e-12) (Figure 2—figure supplement 5). The relationship between estimated birth year and AI ancestry for those with one parent born in the US was still positive but smaller when the effect size was added to the baseline mean (β=-0.0034; 95% CI: -0.0043, -0.0025; p=0.000123) and for those with both parents born in the US the relationship was overall negative (β=-0.0049; 95% CI: -0.006, -0.0038; p=1.04e-5).

Little evidence for subcontinental population structure

We explored whether the increase in global AI ancestry over time could occur in tandem with local changes in the specific subcontinental AI ancestries over time. If it were the case, then we would expect subtle signals of genetic divergence in AI ancestry tracts over time. To investigate this, we calculated FST within AI ancestry tracts between all pairs of birth-decades (see Materials and methods). Figure 3—figure supplement 1 shows all pairwise comparisons among birth-decades, and demonstrates that while the estimates of FST are negligible (with many estimates below 0), there is a subtle trend of increasing FST as birth-decade differences increase (though individuals born in the 80 s and 90 s show a conflicting pattern).

We further investigated patterns of subcontinental population structure using genetic diversity, π, in AI ancestry tracts for each birth-decade (see Materials and methods). We hypothesized that if there were increased migration from multiple AI source populations (coupled with rapid population growth in Mexican American communities), then genetic diversity should be increasing over time. We found the opposite: Figure 3A shows a subtle decrease in genetic diversity (π) over time from the 1930s to the 1980s in non-US-born Mexican Americans, and a subtle decrease in US-born Mexican Americans from the 70 s to the 90 s (while remaining roughly constant from the 30 s to the 70 s).

Figure 3 with 14 supplements see all
Architecture of genetic diversity in Mexican American Genomes.

(A) Genetic diversity (π) in Amerindigenous ancestry tracts stratified by US-born/not US-born status, and calculated between pairs of individuals born within each decade (with shaded envelopes showing 95% confidence intervals for each group). (B) Proportion of total Amerindigenous (AI) ancestral tracts in the HCHS/SOL Mexican American population by decade. (C) Variation in ROH by birth year. Solid lines show LOESS of the proportion of the genome with AI ancestry that overlap ROH of different lengths, while dotted lines show LOESS of the proportion of the genome with European ancestry that overlap ROH of different lengths. (D) Scatter plot of parents’ inferred global Amerindigenous (AI) ancestries using ANCESTOR.

AI ancestry tract lengths have not changed, but runs of homozygosity (ROH) have increased

If there was a rapid increase in the migration of individuals with high AI ancestry, we would expect to see an increase in long AI tracts over time. To test this, we calculated the length of each RFMix inferred local ancestry tract in each Mexican American individual and tested for differences in the distribution of tract lengths across birth-decades using a multiple linear regression model (see Materials and methods). We found no significant associations between the decade bin and the proportion of AI ancestral tracts at various lengths (Figure 3B; β = 0.04, CI = (−0.019–0.099); p=0.19), even when testing for violations of model assumptions (e.g. normalizing the tracts per bin by the number of individuals, or excluding the 1930s and/or 1990s individuals due to the small sample size in each bin).

While there are no statistical differences in the length of admixture tracts, it is possible that local ancestry tracts have accumulated in specific regions of the genome to drive the increased global ancestry proportions over time. We used local ancestry estimates generated across the genome to perform admixture mapping in HCHS/SOL Mexican Americans to determine if younger individuals harbored excess AI ancestry in certain regions of the genome. Although we tried two different models (see Materials and methods), we did not find any loci to be significantly associated with birth year across the genome (Figure 3—figure supplement 2).

We find that there are no changes in AI ancestry tract lengths over time nor any regions of the genome that seem to be accumulating AI ancestry at disproportionate rates, yet genetic diversity has decreased over time in the AI ancestry tracts of Mexican Americans despite rapid growth of the census population size. We therefore investigated whether this population has experienced increased haplotype homozygosity over time. We investigated this possibility by exploring runs of homozygosity (ROH) across the genomes of each of the 3622 Mexican Americans. We classified ROH into three categories: short, medium, and long, based on the length distribution in the population. Generally, short ROH are tens of kilobases in length and likely reflect the homozygosity of old haplotypes; medium ROH are hundreds of kilobases in length and likely reflect background relatedness in the population; and long ROH are hundreds of kilobases to several megabases in length and are likely the result of recent parental relatedness. Overall, we find a significant positive correlation between birth year and the total ROH (summed across size classes; τ = 0.0449, p=6.12e-5, Kendall’s rank correlation), but this signal becomes stronger when we restrict our analysis to ROH calls that overlap AI ancestry tracts (τ=0.065, p=7.39e-9). Figure 3C shows a fitted LOESS curve to the proportion of the genome with AI (or European) ancestry covered by ROH across the genomes of Mexican Americans as a function of their birth year, broken down by ROH size class (see Figure 3—figure supplement 3 for the distribution of ROH by length classes and ancestry). When stratified by size class and normalized by AI global ancestry, the associations (all Kendall’s rank correlation) in AI ROH were primarily driven by the short (τ=0.097, p<2.2E-16), and medium (τ=0.084, p=1.27E-13) size classes (while long ROH was insignificant after multiple testing due to the small number of long ROH across individuals; τ = −0.032, p=0.004). We observed the opposite pattern when ROH were restricted to European ancestry segments of the genome: there is a significant negative correlation between birth year and the total ROH that overlap European ancestry tracts (τ=−0.089, p=1.82E-14).

Strong ancestry-related assortative mating in HCHS/SOL Mexicans

Given that short and medium length ROH have increased over time, it appears that background relatedness within AI ancestry in Mexican Americans has increased over time (but not an increase in recent parental relatedness). One way for this to occur is if individuals with similar ancestry patterns tend to mate with one another more often than expected under a model of random mating (i.e. assortative mating). To measure assortative mating, we estimated the ancestral proportions of the biological parents of each HCHS/SOL Mexican American (see Materials and methods). With individuals from all decades pooled together, we found the inferred biological parental AI ancestries to be significantly correlated (Figure 3D, r = 0.708, 95% CI:0.69–0.72, p<2.2E-16, Pearson correlation). When stratified by decade, the distributions of the difference in parental AI overlap each other and the correlation in inferred parental AI global ancestry ranged from 0.65 to 0.74 (Figure 3—figure supplement 4), but were not statistically different from each other. This shows a consistent pattern of strong parental ancestry correlations among Mexican Americans over different generations. This signature of assortative mating is not due to recent parental relatedness, because there is no trend in long ROH with birth year (and an overall low rate of long ROH among Mexican Americans).

Population genetic factors affecting changes in ancestry proportions over time

We developed a Moran model (Moran, 1958) style simulator to evaluate how migration, assortative mating, population growth, and variance in reproduction affect ancestry proportions over the timescale shown in Figure 2 (for details, see Materials and methods). Briefly, a Moran model is a forward simulation approach whereby each iteration, a single individual is replaced by another individual through a process of choosing parents. For a population of size N individuals, it takes N steps to simulate a single generation, and as such the Moran model is commonly used to represent overlapping generations.

In our simulations, the initial mean AI ancestry proportion was set at 0.42, and two generations were simulated (assuming ~26 years/generation) (Moorjani et al., 2016). Each iteration incorporated population growth, assortative mating, ancestry-based fecundity differences and migration. Simulating a standard neutral model of random mating and constant population size showed no change in ancestry proportions over time (Figure 3—figure supplement 5).

Population growth affects diversity by increasing number and proportion of variants that are rare, and decreasing the rate of genetic drift. While population growth can intensify the strength of natural selection (causing deleterious alleles to decrease in frequency and adaptive alleles to increase in frequency), population growth does not cause systematic changes in the frequency of segregating neutral alleles. Similarly, including population growth did not affect the mean ancestry proportion in a population (Figure 3—figure supplement 6).

In our simulations, we specified the assortative mating parameter to range from 0 (random mating) to 1 (parents are chosen as nearest neighbors when sorted by ancestry proportions). Ancestry-based assortative mating can lead to increased ROH and decreased genetic diversity (see Figure 3), but because mating occurs from individuals proportionally across the ancestry spectrum, ancestry-based assortative mating does not induce any changes in mean ancestry proportions in the population (albeit with slight increase in variance in ancestry proportions over time, Figure 3—figure supplement 7). Note that AM=0.75 results in a correlation in parental ancestry proportions similar to our observed data.

There are many social and cultural properties that result in variance in fecundity within and between populations. Some of these factors may be correlated with genomic ancestry proportions. We tested whether ancestry-based fecundity differences could induce changes in mean ancestry proportions, and how strong the fecundity differences had to be to induce an effect similar to what we see in the data. To simulate this process, we sampled individuals to reproduce based on their ancestry proportion using a Beta(1, 1(1+FAI)) distribution, where FAI=0 induces a uniform distribution (i.e. no ancestry-based fecundity differences) and FAI=1 induces a strong ancestry-based fecundity difference (Figure 3—figure supplement 8A). Ancestry-based fecundity differences can induce systematic changes in ancestry proportions in the population (Figure 3—figure supplement 8B–E), but we are unaware of estimates of this effect in Mexican Americans. Further, ancestry-based assortative mating can magnify the effects of ancestry-based fecundity differences (Figure 3—figure supplement 8F–I). The joint effects of strong ancestry-related assortative mating (AM=0.75) and fecundity differences (FAI=0.8) results in a change in ancestry proportions over time similar to our observed data (Figure 3—figure supplement 8I).

While migration cannot explain all the changes in ancestry proportions we report, it is clearly a contributor. To model migration, we specified two parameters mAI: a parameter affecting migrant Amerindigenous ancestry proportions (Figure 3—figure supplement 9), and M: the probability that a new individual is a migrant. We simulated the joint effects of these parameters (Figure 3—figure supplement 10) and added the effects of ancestry-related assortative mating (AM=0.75) and increasing degrees of ancestry-related differences in fecundity (Figure 3—figure supplements 1114: FAI={0.1, 0.2, 0.4, 0.8}, respectively). We find a large number of parameter combinations that are consistent with our observed ancestry trends in Mexican Americans.

Genetic association of global AI ancestry with biomedical traits

We have shown that genetic variation patterns changed over time in the Mexican American population, with AI ancestry increasing over a short period of time (combined with decreased genetic diversity and increased short and medium length ROH within AI ancestry tracts). These features may have implications for the genetic architecture of complex traits within Mexican Americans, a topic that is understudied and poorly understood. To further our understanding of the genetic architecture of complex traits in Mexican Americans, we investigated the relationship between AI ancestry and 69 biomedical phenotypes (while controlling for several environmental and other factors; see Materials and methods). As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction p<6.6E-5) after adjusting for several factors including birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1). Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.

Figure 4 with 4 supplements see all
Global Amerindigenous ancestry and biomedical traits in HCHS/SOL Mexican Americans.

(A) The effect size of global AI ancestry on each of 69 quantile normalized traits (see Materials and methods) while controlling for birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. (B–C) The relationship between (B) Birth year and height and (C) Height and polygenic height score (PHS). The black line indicates the fitted linear model for all individuals. Each color represents a different quartile of Amerindigenous global ancestry. Polygenic height scores were assessed utilizing UKBB summary statistics for 1,078 SNPs.

Assessing the genetic contribution of AI ancestry to height

Among the traits we tested for association with global AI ancestry, height had the strongest effect. Further, our regression model indicated that height also had a strong positive relationship with birth year (Supplementary file 3). Globally, populations have grown taller over time due to a variety of non-genetic, environmental factors (NCD Risk Factor Collaboration (NCD-RisC), 2016). We find a similar trend in the HCHS/SOL Mexican Americans (β=0.096, 95% CI:0.077–0.114; p=5.95E-23) (Figure 4B and Supplementary file 4). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated (though individuals with lower AI ancestry were taller on average). The rates of change in height between AI quartiles were all positive and significant (p<5e-6). The largest was for the quartile with the highest AI ancestry, but the rates did not change monotonically with respect to AI ancestry across quartiles. The estimates for the quartiles with their 95% CIs are: β=0.135 (CI:0.097–0.173) for AI >0.58; β=0.124 (CI:0.089–0.160) for 0.46 <= AI <= 0.58; β=0.083 (CI:0.047–0.119) for 0.37 <= AI <= 0.46; and β=0.113 (CI:0.074–0.151) for AI <0.37 (Supplementary file 4).

Height is one of the most highly studied complex traits, with GWAS sample sizes numbering in the hundreds of thousands (Yengo et al., 2018). Results for many of these studies have been made readily available on public databases as summary association statistics that can be leveraged to build genetic predictions through polygenic risk scores (PRS) (Pasaniuc and Price, 2017). In Europeans, PRS have been shown to have great predictive power for several traits, including breast cancer, prostate cancer, and type 1 diabetes (Maas et al., 2016Sharp et al., 2019Maas et al., 2016Schumacher et al., 2018). PRS are most effective in populations of European descent as GWAS studies have been primarily performed in these populations (Bustamante et al., 2011; Martin et al., 2019; Popejoy and Fullerton, 2016) and are expected to be biased when applied to other populations due to differences in the genetic architecture of traits across diverse populations (Martin et al., 2017). Since Mexican Americans have some fraction of European ancestry, we sought to determine whether PRS calculated utilizing GWAS summary statistics from European populations could still provide useful insight.

To evaluate the effectiveness of a PRS for height calculated based on 1078 genome-wide SNPs selected from the UKBB GWAS of height (i.e. the polygenic height score, or PHS, see Materials and methods), we first tested whether there was an association between the observed height and the predicted height estimates while controlling for sampling weight, gender, recruitment center, educational attainment, US-born status and number of US-born parents (see Materials and methods). Allele frequencies for these SNPs between the 1000 Genomes Americas Superpopulation and UKBB showed good concordance (Figure 4—figure supplement 2, r = 0.93, 95% CI:0.92–0.94, p<2.2E-16, Pearson correlation). We identified a significant association between observed height and predicted height for the population as a whole (β=0.004, 95% CI: 0.0023–0.005; p=9.91E-8; Figure 4C, Supplementary file 5). However, when we stratified by quartiles of AI global ancestry, the association only remained for the individuals in the lower two quartiles of global AI ancestry proportions (AI < 0.37: β=0.005, 95% CI:0.0022–0.0076; p=4.42E-4 and 0.36 < AI < 0.46: β=0.006, 95% CI: 0.0032–0.009; p=4.38E-5, Supplementary file 5). The association between predicted height and observed height was no longer significant for individuals in the upper two quartiles of global AI ancestry proportions (0.46 < AI < 0.58: β=0.0007, p=0.6 and 0.58 < AI: β=0.003, p=0.08, Supplementary file 5).

As we found global AI ancestry to be increasing over time (and there is a strong association between observed height and both AI as well as birth year), we hypothesized that there would be a change in PHS over time as well. However, we did not find a significant effect of birth year on PHS (Figure 4—figure supplement 3; p=0.14) even when we stratified by the quartiles of global AI ancestry.

Discussion

The United States is a dynamic, rapidly changing population, and this will continue to occur as the population size grows (Colby and Ortman, 2015). Hispanics/Latinos are the largest and fastest growing minority group, and are projected to comprise ~29% of the US population by 2060. They are a genetically and phenotypically diverse population as a result of extensive admixture between Amerindigenous populations and immigrants from multiple geographic locations around the world. In this study, we identified additional population substructure complexities that may contribute to phenotypic variation within Hispanics/Latinos.

Specifically, we demonstrated how the admixture composition of Mexican Americans have changed over time, resulting in an increase of ~20% Amerindigenous ancestry on average over the 50 year period studied. This change in ancestry is equivalent to a mean increase in Amerindigenous ancestry of ~0.4% per year. While the effect sizes vary to some extent, we replicate the underlying pattern across multiple data stratifications (two metropolitan cities, US-born and non-US-born) and also replicate this feature in an independent cohort of Mexican Americans. Further, we find that a similar trend holds across multiple self-identified Hispanic/Latino populations in the US (and is statistically significant in Central Americans). This effect does not appear to have a simple explanation: we do not see any statistically significant increases in Amerindigenous ancestry at individual loci, we do not see more than a negligible degree of population differentiation over time, and this increase cannot be entirely explained by very recent migration based on our analyses of non-US-born individuals.

What could be driving the increased Amerindigenous ancestry in Mexican Americans? We hypothesize that several population, cultural, and environmental factors operating in unison have altered the genetic architecture of Mexican Americans. First, we identify strong ancestry-based assortative mating. However, while assortative mating could explain the increased ROH and decreased genetic diversity we inferred over time, ancestry-based assortative mating alone should not result in mean changes in global ancestry proportions (since a proportional number of offspring should derive from high- versus low-Amerindigenous ancestry parents, see simulations in Figure 3—figure supplement 7). Second, we do infer a subtle increase in Amerindigenous ancestry among individuals who migrated to the US more recently than individuals who migrated earlier. Independent analyses have shown that migration from Mexico to the US has shifted over the years from states with less Amerindigenous ancestry to states with higher Amerindigenous ancestry (Moreno-Estrada et al., 2014; Terrazas, 2010). However, these subtle shifts in recent migration cannot fully explain the changes in Amerindigenous ancestry we infer, and taking them into account in our statistical model did not change the effect size that birth year has on Amerindigenous ancestry over time. Third, from US census data, we know that Hispanic/Latino is the fastest growing ethnicity (with Mexican Americans constituting the plurality). However, similar to assortative mating, population size changes alone should not drive mean changes in global ancestry proportions (Figure 3—figure supplement 6). While none of these factors alone can adequately explain the temporal dynamics of Amerindigenous ancestry we have observed, simulations of the joint effects of all of these factors operating in unison can indeed drive substantial changes in global ancestry patterns (Figure 3—figure supplements 514). However, more research is necessary to understand which parameters are consistent with the continuum of Mexican American populations across the US.

Regardless of the underlying mechanisms driving increased Amerindigenous ancestry in Mexican Americans, this additional source of temporal substructure within this population has substantial consequences for phenotypic variation in biomedical traits. We identify several biomedical traits that are associated with Amerindigenous ancestry, with effects comparable to the high effects of gender, and show that in the case of height, there are both ancestry and temporal effects. While we do see differences in mean height based on percentage of AI ancestry, height increases over time in all groups at similar rates. Individuals with lower percentages of AI ancestry were taller on average than individuals with higher AI ancestry pointing to the role of AI ancestry on the trait. Further study is necessary to understand whether other biomedical traits are also changing over time as a result of the change in genomic ancestry proportions, and the degree to which other socio-economic factors independently drive both ancestry patterns as well as biomedical traits.

In our study, we bring specific attention to the biases that continue to exist with using European GWAS summary statistics to calculate polygenic risk scores in admixed populations such as Mexican Americans that are comprised of European, Amerindigenous, and African genetic ancestries. In particular, in the case of height, we found that the polygenic height score (PHS) correlated with observed height only in the subset of individuals with the lowest levels of Amerindigenous ancestry (i.e. the subset of individuals with highest European ancestry). As the population dynamics of the US continue to change, it is imperative that we study diverse populations, or we risk exacerbating the health disparities that currently exist. To date, population-based medical genomics research (and its subsequent benefits) have been disproportionately focused on populations of European ancestry. In order to improve the design and implementation of medical genetics studies for the ethnically diverse U.S. population, we need detailed insights into the population history of diverse U.S. populations. This includes characterizing the admixture dynamics of Hispanic/Latino populations, as well as the evolutionary forces that shaped patterns of genetic variation of the ancestral populations that contributed to modern day Hispanic/Latino populations.

The genetic variation of the Hispanic community in the United States belies categorization under a single label (Conomos et al., 2016). The events that have shaped and continue to shape this genetic diversity are complex, numerous, and nuanced, and the social history of such a diverse population is intrinsic to any genetic study. Mexico’s society was largely defined by an established social caste system based on ancestry, which disappeared after Mexico’s independence in 1821 (Lisker et al., 1990). Even so, social inequalities persist today with skin color having a significant effect on wealth and education (Martinez, 2017). A multitude of factors within and outside Mexico — whether related to trade, immigration policies, or armed conflicts — acted to influence who immigrated to the United States, and the impact of each of these fluctuates over time (Contreras, 2014Verea and Verea, 2014Fernández-Kelly and Massey, 2007). These changes shift the demographics of immigration, which is inherently related to the genetic ancestry of the population.

Consequently, this shapes the genetic architecture of complex traits. Diverse populations are at risk not only from underrepresentation in research, but because of poor understanding of the temporal and spatial dynamics at play in genetic variation. The promise of equitable precision medicine — one of the ultimate goals of medical genomics — cannot be kept without understanding this interplay. Health disparities in the United States are fed by structural inequalities. For example, studies that use modern Artificial Intelligence techniques have already been shown to inflate existing disparities between Black Americans and White Americans (Obermeyer et al., 2019). Such biases, whether from algorithms, study designs, or misunderstandings of subtleties in data, feed into the larger systemic pressures faced by minority populations in the United States.

While we have shown a dramatic shift in ancestry proportions in US Hispanic/Latinos, one of the caveats of this study is that the HCHS/SOL cohort is not representative of all US Hispanics/Latinos. HCHS/SOL participants were recruited at four primary centers: Bronx, Chicago, Miami, and San Diego. There may be additional genetic diversity that has not been captured by this dataset and trends exhibited in this dataset may not translate to Hispanic/Latino populations living in other regions of the US (though the temporal increase in Amerindigenous ancestry was replicated in an independent sample of Mexican Americans). Further, we have only assembled a reference panel with limited numbers of individuals with various Amerindigenous, European, and African ancestry. With better population genetic modeling and a deeper understanding of the social and historical aspects of Hispanic/Latino populations, we will be able to improve our understanding of the genetic and phenotypic diversity across these populations, and subsequently improve our ability to understand genetic contributions to complex traits and disease. These insights will lead to optimization of population sampling for the design of future medical genetic studies, the identification of disease risk variants, and ultimately, precision medicine for all.

Materials and methods

Study dataset and initial quality control

Request a detailed protocol

The HCHS/SOL study is a community-based cohort study of self-identified Hispanic/Latino individuals from four US metropolitan areas with the general goal of identifying risk and protective factors for various medical conditions including cardiovascular disease, diabetes, pulmonary disease, and sleep disorders (Sorlie et al., 2010). The sample survey for design for HCHS/SOL has been described previously (LaVange et al., 2010). Briefly, census block groups were selected in defined communities near each of the four recruitment centers, and households were sampled within census block groups. Households with Hispanic/Latino surnames and individuals as well as residents over 45 years old were oversampled in order to increase representation of the Hispanic/Latino target population and achieve a uniform age distribution. Sampling weights were calculated for each individual to reflect the probability of sampling (Conomos et al., 2016). 12,434 participants with birth year estimates between 1934–1993 who self-identified as being of Cuban, Dominican, Puerto Rican, Mexican, Central American, or South American background consented to genetics studies and posting of their genetic and phenotype data on the publicly available Database of Genotypes and Phenotypes (dbGaP) through Study Accession phs000810.v1.p1. Samples were genotyped on an Illumina custom array, SoL HCHS Custom 15041502 array (annotation B3, genome build 37), consisting of the Illumina Omni 2.5M array and 148,353 custom single nucleotide polymorphisms (SNPs) (Conomos et al., 2016). Data posted to dbGaP had passed initial sample quality control filters, including removing samples with differences in reported vs. genetic sex, call rates > 95%, and evidence for sample contamination (e.g. heterozygosity and sample call rates). For initial SNP quality control, we filtered out SNPs that were monomorphic, positional duplicates, or Illumina technical failures, as well as SNPs that had cluster separation <= 0.3, call rate <= 2%,>2 disconcordant calls in 291 duplicate samples,>3 Mendelian errors in parent-offspring pairs/trios, Hardy-Weinberg Equilibrium combined p-value<10−5, and sex differences in allele frequency ≥0.2. Our filtering resulted in 1,763,935 genotyped SNPs with minor allele frequency (MAF) >0.01.

Additional sample quality control performed in the HCHS/SOL dataset included filtering out samples with (1) large chromosomal anomalies, (2) substantial Asian ancestry as previously identified in HCHS/SOL (Conomos et al., 2016) and (3) individuals with up to third degree genetic relatedness in the dataset as inferred by REAP (Thornton et al., 2012). For genetic relatedness filtering, individuals from pairs were kept to maximize representation of the birth year distribution, which resulted in 10,268 unrelated remaining individuals.

From the original HCHS/SOL analysis, individuals were classified into genetic-analysis groups, similar to self-identified background groups in that they share cultural and environmental characteristics, but are also more genetically homogenous (Conomos et al., 2016).

Birth year for all individuals was estimated by subtracting the difference between date of first clinic visit for the baseline examination (Sorlie et al., 2010) and age. Year of arrival was estimated by subtracting the difference between date of first clinic visit for the baseline examination and years in the US.

Global, local, and parental ancestry inference

Request a detailed protocol

All ancestry analyses were restricted to the 211,152 autosomal SNP markers that overlapped between the study and reference panel genotyping array. For the HCHS/SOL dataset, global African, European, and Amerindigenous ancestries were inferred with ADMIXTURE (Alexander et al., 2009); in an unsupervised manner, with K = 3. Amerindigenous ancestry refers to estimates of Indigenous genetic ancestry from the Americas. For some analyses, HCHS/SOL individuals with greater than 95% of a single ancestry (e.g African, European, or Amerindigenous) were filtered out resulting in 9913 individuals: 1,099 Central American, 1,536 Cuban, 954 Dominican, 3,622 Mexican, 1,783 Puerto Rican, 652 South American and 267 ‘Other’ individuals.

Ancestral tracts, known as ‘local’ ancestry, along the genome for all HCHS/SOL individuals were inferred using RFMix (Maples et al., 2013) and a three population reference panel, comprised of 315 individuals: 104 HapMap phase 3 CEU (European) and 107 YRI (African) individuals (Altshuler et al., 2010) and 112 Amerindigenous individuals from throughout Latin America (Reich et al., 2012). The reference panel was limited to individuals with 99% continental ancestry as inferred by unsupervised ADMIXTURE (Alexander et al., 2009). Prior to local ancestry inference, HCHS/SOL individuals were merged with the reference panels and then phased using SHAPEIT2 (Delaneau et al., 2013). For all HCHS/SOL Mexican American individuals, parental genomic ancestry was inferred with ANCESTOR (Zou et al., 2015) using the local ancestry estimates generated by RFMix.

Bootstrap analyses (Figure 2B and Figure 2—figure supplement 3) were performed by calculating relevant statistics based on repeated resampling of individuals with replacement. Bootstrap resampling results in an estimate of the variance of the statistics that we are calculating in our data, and allows us to assess the impact of outliers (who are only resampled in a subset of iterations).

Uniform manifold approximation and projection (UMAP)

Request a detailed protocol

Principal components for HCHS/SOL and the reference panel were computed using smartPCA (Patterson et al., 2006). UMAP (version 0.3.8) was run using the Python script freely available at (https://github.com/diazale/gt-dimred; Diaz-Papkovich, 2019) with parameter specification set at 15 nearest neighbors and a minimum distance between points of 0.5.

For further analyses of HCHS/SOL population structure, a larger reference panel was assembled comprising of additional European and African populations from the Human Genome Diversity Project (HGDP) (Rosenberg et al., 2002Reich et al., 2012) and 1000 Genomes Project (Auton et al., 2015). For the European reference panel, 24 Basque, 28 French, 12 Italian, 25 Russian, and 28 Sardinian individuals from HGDP and 90 GBR, 107 IBS, 99 FIN, and 107 TSI individuals from 1000 Genomes were included with the original 104 CEU individuals. For the African reference panel, 9 BantuKenya, 8 BantuSouthAfrica, 22 Mandenka, 26 Mozabite from HGDP and 99 ESN, 113 GWD, 97 LWK, and 82 MSL from 1000 Genomes were included with the original 107 YRI individuals. Combined with the 112 Amerindigenous and 10,268 HCHS/SOL individuals, the larger additional analyses comprised 11,567 individuals in total.

Admixture mapping

Request a detailed protocol

Local ancestry estimates for 211,151 SNPs across the genome were used to perform admixture mapping in HCHS/SOL Mexican Americans to determine if younger individuals harbored excess Amerindigenous ancestry in certain regions of the genome. Admixture mapping was performed applying two different models: (1) a linear regression model with age as the dependent variable adjusting for global Amerindigenous ancestry, sampling weight and center and (2) a logistic regression model dividing the HCHS/SOL Mexican cohort in to an older vs younger generation with 1965 set as the dividing point while also adjusting for global Amerindigenous ancestry, sampling weight, and center. The threshold for genome-wide significance, 1.38 × 10−4 was calculated using the empirical autoregression framework with the package coda in R to estimate the total number of ancestral blocks (Sobota et al., 2015Plummer et al., 2012).

Tract lengths

Request a detailed protocol

The multiple regression model: log(f) = β01 T2 A3 TA +ε, where f  is a matrix containing the proportion of lengths of all ancestral tracts across the genome for all 3622 Mexican American individuals, T  the tract length bin and A  decade of birth year bin, was used to test for an effect of birth-decade on the proportion of Amerindigenous ancestral tract lengths. For assessment between the fraction of ancestry tracts in an individual’s genome and birth year, long tract cutoffs were chosen based on tract separation between the birth year decades in Figure 3B.

Diversity calculations

Request a detailed protocol

Subcontinental ancestry was assessed using the diversity measurements π and FST. π was calculated as the average number of pairwise genetic differences among all pairs of overlapping Amerindigenous ancestry tracts across individuals. FST was calculated as:

FST = (HT- HS)/HT where HT is the average heterozygosity when all individuals are pooled across decades and HS is the average heterozygosity within each decade of individuals.

Inference of runs of homozygosity

Request a detailed protocol

ROH were called using the program GARLIC v1.1.4 (Szpiech et al., 2017) on 211,152 sites for the Mexican American individuals. An analysis window size of 50 SNPs and an overlap fraction of 0.25 were both chosen using GARLIC’s rule of thumb parameter estimation. GARLIC chose a LOD score cutoff of 0. Using a three-component Gaussian mixture, GARLIC determined class A/B (short/medium) and class B/C (medium/long) size boundaries as 845,097 bp and 2,501,750 bp, respectively.

Simulating ancestry proportions over time

Request a detailed protocol

Our Moran model simulator includes population growth (exponential), migration (with adjustable levels of migration and ancestry patterns), ancestry-based assortative mating, and ancestry-based variability in fecundity (see https://github.com/mlspear09/hchs-sol; Spear, 2020). Our simulations are modeled after the data shown in Figure 2. Initial ancestry proportion in the population was set to 0.42. Previous estimates of the generation time in humans has resulted in an estimate of ~26–30 years per generation (Moorjani et al., 2016). As such, the data analyzed correspond to ~2 human generations. We therefore begin our simulations with a random sample of ancestry proportions with mean 0.42, and simulate two generations (corresponding to 2N steps in our simulator). In all simulations, we start with N = 1000. The general idea is to model population parameters (such as average ancestry proportions in the population), which is less sensitive to the actual population size used.

Imputation

Request a detailed protocol

Imputation for HCHS/SOL was performed locally using IMPUTE2 (Howie et al., 2009) with the 1000 Genomes Project Phase three haplotypes (Auton et al., 2015) used as a reference panel. After filtering on an info score cutoff of 0.3, this resulted in 33,041,084 SNPs.

Analyzing biomedical traits

Request a detailed protocol

We analyzed a total of 69 biomedical traits contained in the HCHS/SOL phenotypic dataset. We used a multiple linear regression model to analyze the effects of global AI ancestry on each trait while controlling for birth year (a proxy for age), center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. In Figure 4A, we show the effect size (β) for AI ancestry after quantile normalizing each trait (Bolstad et al., 2003; Qiu et al., 2013). Quantile normalization forces each phenotype to be rank-transformed to a Standard Normal distribution. While quantile normalization is a common approach to transforming data to conform to the Normal distribution assumption inherent in linear regression (and provides the benefit of effect sizes that are readily comparable across traits), this procedure can result in a modest reduction in statistical power compared to untransformed data (Qiu et al., 2013). We find that the p-values for the AI effect sizes are highly correlated when phenotypes are untransformed vs quantile normalized (Spearman ρ=0.943; p<2.2e-16) (Figure 4—figure supplement 4) with no statistical evidence for a difference in their distribution (Mann-Whitney U test p=0.912).

Polygenic risk score calculations

Request a detailed protocol

Polygenic risk scores for height were calculated using the publicly available UK Biobank (UKBB) GWAS Round 2 Summary Statistics retrieved from http://www.nealelab.is/uk-biobank. Briefly, for sample quality control, sample inclusion was limited to unrelated samples who passed the sex chromosome aneuploidy filter. British ancestry was determined using the 1st 6 PCs; individuals more than seven standard deviations away from the 1st 6 PCs were excluded. Further filtering included limiting to self -reported 'white-British' / 'Irish' / 'White' resulting in a QCed sample count of 361,194 individuals as described in (https://github.com/Nealelab/UK_Biobank_GWAS#imputed-v3-sample-qc; Neale Lab, 2018). An imputation panel of ~90 million SNPs from HRC, UK10K and 1 KG were used to impute genotypes. 13.7 million autosomal and X-chromosome SNPs passed quality control thresholds including Info score >0.8, MAF >0.0001, and HWE p-value>1e-10. For the phenotype, a linear regression model in Hail was run for all individuals (both sexes) adjusting by the first 20 PCs + sex + age + age2 + (sex*age) + (sex*age)2. For height, there was complete phenotype information for 360,388 individuals.

Risk scores were calculated by extracting the overlapping genome-wide significant hits initially discovered in the UKBB GWASs of height and selecting SNPs with the lowest p-value in each 1 Mb window across the genome. Prior to extraction there were a total of 227,794 genome-wide significant SNPs initially discovered in the UKBB GWAS of height. For height this resulted in a dataset of 1078 overlapping SNPs for the PRS calculation that were present in our dataset of genotyped and imputed SNPs.

Health and retirement Study (HRS)

Request a detailed protocol

For replication, we used genotype data from 705 self-identified Mexican Americans from the Health and Retirement Study (HRS) (Fisher and Ryan, 2018), genotyped on the Illumina Human Omni 2.5M platform. HRS data was made available under IRB Study No. A11-E91-13B - The apportionment of genetic diversity within the United States. Estimated global ancestry proportions for the Mexican American population in the HRS were calculated as in Baharian et al., 2016, which used an alternative reference panel and alternative ancestry inference approach. Briefly, RFMix was used to infer local ancestry estimates across the genome utilizing CHS, YRI, and CEU individuals from the 1000 Genomes Project as reference populations for Amerindigenous/Asian, African, and European ancestries, respectively. Global ancestry estimates were calculated using the summed RFMix calls.

Statistical analyses and plots

Request a detailed protocol

Statistical analyses and plot generation were performed within Rstudio using Version 1.1.463 and R version 3.5.3. ternary and ggridges/ggplot2 packages were used to create the simplex and ridgeline plots.

For each of the HCHS/SOL populations, we evaluated differences in global ancestry estimates over time while accounting for the sampling method (referred to as ‘sampling weight’, see Materials and methods) used for the design of the HCHS/SOL study.

To test for differences in each ancestry over time for each HCHS/SOL population, we ran a linear regression model of Ancestry = β01 BY +β2 SW + ε, where BY = birth year and SW = log(sampling weight). Within the Mexican Americans, we ran this model stratified by gender, the recruitment centers Chicago and San Diego, born in the US versus outside the US and education attainment. For recruitment centers, data stratification was limited to Chicago and San Diego as sample size for the Bronx and Miami was limited: 124 and 25 individuals, respectively. Education attainment was categorized as either less than a high school diploma or equivalent degree (<HS), equal to a high school diploma or equivalent degree (=HS), or post-secondary education (>HS).

To test differences in mean Amerindigenous ancestry by group, we ran t-tests. The data were split and compared by gender, recruitment center, born in the US versus outside the US, and educational attainment levels.

For the height and polygenic height score analyses, 3604 Mexicans were included based on complete information for height, gender, recruitment center, sampling weight, education attainment, born in the US versus outside the US, and number of US-born parents.

Data availability

All data used in this manuscript were downloaded from publicly available sources (dbGap). No new data were created.

The following previously published data sets were used
    1. Conomos MP
    2. Laurie CA
    3. Stilp AM
    4. Gogarten SM
    5. McHugh CP
    6. Nelson SC
    (2016) phs000810.v1.p1
    ID phs000810.v1.p1. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos.
    1. Fisher GG
    2. Ryan LH
    (2018) A11-E91-13B
    ID A11-E91-13B. Overview of the Health and Retirement Study and Introduction to the Special Issue.

References

  1. Conference
    1. Colby SL
    2. Ortman JM
    (2015)
    Projections of the size and compositon of the US
    Population: 2014 to 2060. U.S. Census Bureau, Commerce USDo.
    1. Fernández-Kelly P
    2. Massey DS
    (2007) Borders for whom? the role of NAFTA in Mexico-U.S. migration
    The ANNALS of the American Academy of Political and Social Science 610:98–118.
    https://doi.org/10.1177/0002716206297449
    1. Lisker R
    2. Ramirez E
    3. Briceno RP
    4. Granados J
    5. Babinsky V
    (1990)
    Gene frequencies and admixture estimates in four mexican urban centers
    Human biology 62:791–801.
    1. Moran PAP
    (1958) Random processes in genetics
    Mathematical Proceedings of the Cambridge Philosophical Society 54:60–71.
    https://doi.org/10.1017/S0305004100033193
    1. Plummer MBN
    2. Cowles K
    3. Vines K
    (2012)
    CODA: convergence diagnosis and output analysis for MCMC
    R News 6:7–11.
    1. Schumacher FR
    2. Al Olama AA
    3. Berndt SI
    4. Benlloch S
    5. Ahmed M
    6. Saunders EJ
    7. Dadaev T
    8. Leongamornlert D
    9. Anokian E
    10. Cieza-Borrella C
    11. Goh C
    12. Brook MN
    13. Sheng X
    14. Fachal L
    15. Dennis J
    16. Tyrer J
    17. Muir K
    18. Lophatananon A
    19. Stevens VL
    20. Gapstur SM
    21. Carter BD
    22. Tangen CM
    23. Goodman PJ
    24. Thompson IM
    25. Batra J
    26. Chambers S
    27. Moya L
    28. Clements J
    29. Horvath L
    30. Tilley W
    31. Risbridger GP
    32. Gronberg H
    33. Aly M
    34. Nordström T
    35. Pharoah P
    36. Pashayan N
    37. Schleutker J
    38. Tammela TLJ
    39. Sipeky C
    40. Auvinen A
    41. Albanes D
    42. Weinstein S
    43. Wolk A
    44. Håkansson N
    45. West CML
    46. Dunning AM
    47. Burnet N
    48. Mucci LA
    49. Giovannucci E
    50. Andriole GL
    51. Cussenot O
    52. Cancel-Tassin G
    53. Koutros S
    54. Beane Freeman LE
    55. Sorensen KD
    56. Orntoft TF
    57. Borre M
    58. Maehle L
    59. Grindedal EM
    60. Neal DE
    61. Donovan JL
    62. Hamdy FC
    63. Martin RM
    64. Travis RC
    65. Key TJ
    66. Hamilton RJ
    67. Fleshner NE
    68. Finelli A
    69. Ingles SA
    70. Stern MC
    71. Rosenstein BS
    72. Kerns SL
    73. Ostrer H
    74. Lu YJ
    75. Zhang HW
    76. Feng N
    77. Mao X
    78. Guo X
    79. Wang G
    80. Sun Z
    81. Giles GG
    82. Southey MC
    83. MacInnis RJ
    84. FitzGerald LM
    85. Kibel AS
    86. Drake BF
    87. Vega A
    88. Gómez-Caamaño A
    89. Szulkin R
    90. Eklund M
    91. Kogevinas M
    92. Llorca J
    93. Castaño-Vinyals G
    94. Penney KL
    95. Stampfer M
    96. Park JY
    97. Sellers TA
    98. Lin HY
    99. Stanford JL
    100. Cybulski C
    101. Wokolorczyk D
    102. Lubinski J
    103. Ostrander EA
    104. Geybels MS
    105. Nordestgaard BG
    106. Nielsen SF
    107. Weischer M
    108. Bisbjerg R
    109. Røder MA
    110. Iversen P
    111. Brenner H
    112. Cuk K
    113. Holleczek B
    114. Maier C
    115. Luedeke M
    116. Schnoeller T
    117. Kim J
    118. Logothetis CJ
    119. John EM
    120. Teixeira MR
    121. Paulo P
    122. Cardoso M
    123. Neuhausen SL
    124. Steele L
    125. Ding YC
    126. De Ruyck K
    127. De Meerleer G
    128. Ost P
    129. Razack A
    130. Lim J
    131. Teo SH
    132. Lin DW
    133. Newcomb LF
    134. Lessel D
    135. Gamulin M
    136. Kulis T
    137. Kaneva R
    138. Usmani N
    139. Singhal S
    140. Slavov C
    141. Mitev V
    142. Parliament M
    143. Claessens F
    144. Joniau S
    145. Van den Broeck T
    146. Larkin S
    147. Townsend PA
    148. Aukim-Hastie C
    149. Gago-Dominguez M
    150. Castelao JE
    151. Martinez ME
    152. Roobol MJ
    153. Jenster G
    154. van Schaik RHN
    155. Menegaux F
    156. Truong T
    157. Koudou YA
    158. Xu J
    159. Khaw KT
    160. Cannon-Albright L
    161. Pandha H
    162. Michael A
    163. Thibodeau SN
    164. McDonnell SK
    165. Schaid DJ
    166. Lindstrom S
    167. Turman C
    168. Ma J
    169. Hunter DJ
    170. Riboli E
    171. Siddiq A
    172. Canzian F
    173. Kolonel LN
    174. Le Marchand L
    175. Hoover RN
    176. Machiela MJ
    177. Cui Z
    178. Kraft P
    179. Amos CI
    180. Conti DV
    181. Easton DF
    182. Wiklund F
    183. Chanock SJ
    184. Henderson BE
    185. Kote-Jarai Z
    186. Haiman CA
    187. Eeles RA
    188. Profile Study
    189. Australian Prostate Cancer BioResource (APCB)
    190. IMPACT Study
    191. Canary PASS Investigators
    192. Breast and Prostate Cancer Cohort Consortium (BPC3)
    193. PRACTICAL (Prostate Cancer Association Group to Investigate Cancer-Associated Alterations in the Genome) Consortium
    194. Cancer of the Prostate in Sweden (CAPS)
    195. Prostate Cancer Genome-wide Association Study of Uncommon Susceptibility Loci (PEGASUS)
    196. Genetic Associations and Mechanisms in Oncology (GAME-ON)/Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE) Consortium
    (2018) Association analyses of more than 140,000 men identify 63 new prostate Cancer susceptibility loci
    Nature Genetics 50:928–936.
    https://doi.org/10.1038/s41588-018-0142-8

Decision letter

  1. Mashaal Sohail
    Reviewing Editor; Brigham and Women's Hospital and Harvard Medical School, United States
  2. Patricia J Wittkopp
    Senior Editor; University of Michigan, United States
  3. Mashaal Sohail
    Reviewer; Brigham and Women's Hospital and Harvard Medical School, United States
  4. Genevieve L Wojcik
    Reviewer; Stanford University School of Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This article studies genetic and complex trait variation in individuals in the United States with origins in Mexico. The authors find a pattern of increasing amerindigenous ancestry with birth year, which they investigate using simulations and attribute to the likely combination of several cultural and historical factors. They find amerindigenous ancestry to be associated with trait variation for several complex traits, and highlight the importance of further work in medical and population genomics across human diversity.

Decision letter after peer review:

Thank you for submitting your article "Recent fluctuations in Mexican American genomes have altered the genetic architecture of biomedical traits" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Mashaal Sohail as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Patricia Wittkopp as the Senior Editor. The following individual involved in review of your submission had agreed to reveal their identity: Genevieve L Wojcik.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The editors have judged that your manuscript is of interest, but as described below that some conclusions and analyses need to revised as presented in light of our comments before it is published. We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is “in revision at eLife”. Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

This paper examines population structure in a Hispanic/Latino (H/L) cohort, first highlighting genetic diversity and fine-scale population substructure via UMAP. The paper highlights an interesting temporal aspect to population substructure in H/L groups, demonstrating increasing proportions of Amerindigenous ancestry, particularly in Mexican Americans, over time. They also provide other analyses that may help interpret or follow from this observation, involving genetic diversity, assortative mating and runs of homozygosity. They show a correlation between amerindigenous ancestry and complex traits, and show that the behavior of UKB height GWAS polygenic scores in Mexican-Americans depends on the proportion of Amerindigenous ancestry.

Essential revisions:

This paper's major finding is an observation of an increase in Amerindigenous ancestry in Mexican Americans in the 1940-1990 period. The manuscript is interesting and in theory suitable for publication in eLife, but after a number of points are addressed. First, the authors need to articulate clearly the factors that may have caused this primary observation, and what may be the most likely explanations outlined below. Second, they need to address that the primary explanation for the ROH increase is likely the amerindigenous ancestry increase, and in that sense, determine and clearly articulate the place of the assortative mating observation in the manuscript. Lastly, they need to clearly admit in the manuscript the importance of the un-modelled socio-economic variable in the correlation between amerindigenous ancestry and complex traits, and only present this analysis after controlling for appropriate covariates. It is a time when we are finally seeing some population genetic studies of understudied populations as they relate to complex trait variation (to which this manuscript can be an important contribution), and so the bar to be as rigorous as possible in considering alternative explanations and model all possible covariates should be set extremely high.

1) The main finding is the increase in amerindigenous ancestry in the 1940 – 1990 period in Mexican-Americans in the United States.

a) The authors state: "In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003)." If I understand correctly, this regression is amerindigenous ancestry against time and other covariates. It would be helpful if the authors add to this sentence something along the lines of, "implying that individuals who arrived earlier to the US from Mexico had more European ancestry."

b) Given the above analysis, and independent migration analyses (see, for example, https://www.migrationpolicy.org/article/mexican-immigrants-united-states-2), it seems that migration from Mexico to the US shifted over the years from states with less amerindigenous ancestry to states with higher amerindigenous ancestry (Chiapas, Oaxaca, Veracruz) in the South and South-east of the country. This seems a highly likely explanation for the pattern of increasing amerindigenous ancestry that they see, and should be stated as such in the manuscript. This seems especially likely given that the signal comes primarily from non-US born individuals, or US-born individuals with parents born outside the US.

c) The authors state that "It is possible that the increase in global Amerindigenous ancestry over time could be biased by changes in the specific subcontinental Amerindigenous ancestries over time (though such an effect is not visible in our UMAP analysis, Figure 1B)." – It is not clear what is meant by this sentence – please re-phrase and articulate more clearly. If it alludes to the difference in migration sources over time I mention above, I don't think their analyses of Fst and genetic diversity rule that explanation out.

d) Assortative mating (Figure 2D and Supplementary figure 9). This argument is puzzling because if there is assortative mating along indigenous ancestry as they suggest, then would this not mean that there is also assortative mating along the collinear European ancestry? If this is the case, why would amerindigenous ancestry be increasing in particular? The authors do state that assortative mating would not cause an increase in one ancestry. In that case, the paper overall does not provide an explanation for why the amerindigenous ancestry is actually increasing – is the migration sources explanation the most likely explanation? Along with that individuals with higher amerindigenous ancestry must be reproducing more? The likely explanations of the primary result should be made very clear for the reader.

e) The authors state: "and this increase cannot be entirely explained by very recent migration." What is the evidence backing this claim?

f) "sampling weight" is not actually defined in the Materials and methods. Can the authors clarify how this is defined and used to weight the major analysis?

g) Please describe the bootstrap resampling performed – is the bootstrapping performed over individuals or segments of the genome? Please justify the strategy picked. This should be described in the manuscript.

2) The pattern of ROH change over time observed.

a) What are the number of individuals in each decade? Please show this in the manuscript. If there are more individuals in later birth decades (as may be expected), you would see an increase in the ROH summed over each genome with time, simply because you are summing over more individuals at later time periods. It is not clear if the analyses for Figure 2C are normalized by the number of individuals in each decade – if not, this would be important to do, and only the normalized results should be reported.

b) The ROHsum increasing with time could simply be due to the amerindigenous ancestry increasing with time, as amerindigenous ancestry carries more short ROH segments than European ancestry (see for example Ceballos et al. Nature Review Genetics 2018). The authors should explicitly describe this, as this simplest explanation would not require assortative mating to be invoked either.

3) Correlation of amerindigenous ancestry and complex traits

Many of the traits studied would also be affected by socioeconomic status (for example, height, cholesterol). Do the authors have this variable available? If yes, it should be included in the multiple regression. If not, it should be clearly mentioned that they are not able to account for this likely important effect, leaving their estimates confounded by socio-economic differences that likely correlate with amerindigenous ancestry. For Figure 3, we don't think it is fair to show tau between only amerindigenous ancestry and traits as this analysis does not account for important covariates, and would like to see Supplementary file 3 instead to replace Figure 3 (in a figure form as the authors prefer) such that only the multiple regression effect sizes are reported in the manuscript that account for covariates.

Why do a first pass of this analysis without covariates included, and then re-run with covariates in the Bonferroni significant subset of traits only? (Given that there will be confounding between Amerindigenous ancestry and socioeconomic, environmental and other non-genetic factors, and especially age). Furthermore, looking in the Materials and methods section, we cannot seem to find the full description of how this analysis was performed i.e. what models were run, and how/if phenotypic measures used were cleaned and normalized etc. Please provide this.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for the revised submission of your manuscript, now called "Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits." for consideration by eLife.

The Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

The manuscript is significantly improved, and the addition of the new simulation analyses greatly help in the interpretation of the trends that they see. The authors have sufficiently addressed concerns about two of their main results: (1) Interpretation of ancestry change over time and (2) Interpretation of ROH change over time. I still have the following points I would ask them to consider regarding their third main result (3) the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits, and to make appropriate additions/revisions to their analyses and text before submitting a revised manuscript. Beyond that, I see that the new simulation results don't appear in the manuscript until the Discussion. I would consider them a result and would like to see the authors try to integrate the main simulation results in the Results section, when they report their empirical observed patterns.

Revisions for this paper:

The authors state in the manuscript that, "As illustrated in Figure 4A, 20 of these traits (29%) are significantly correlated after Bonferroni correction (P<0.000145), highlighting the need for increased investigation into the role of AI genetic ancestry and other unmodelled socio-economic variables in admixed populations such as Mexican Americans." As written, it understates the correlation between AI ancestry and unmodelled socio-economic variables in the United States which we know exists on the basis of historical and social science research. Given this, I would like to see the text revised to say that it is not clear what the correlation with AI ancestry implies, and while it could be reflecting genetic effects, it can also be reflecting socio-economic variables that are correlated with AI ancestry and that AI ancestry could be serving as a proxy for. The authors themselves show that AI ancestry is correlated with educational attainment levels which they state is a proxy for socio-economic status. I would like to see their model for testing the effect of ancestry on traits include as covariates: (1) educational attainment as a proxy for socioeconomic status, (2) whether they are US born or not, and whether their parents are US born or not, to help capture the effects of different environments they would have been born in, and that their parents would have created for them on various levels. I would also encourage the authors to make their model as rich as possible by adding other environmental variables they could obtain on their recruitment sites (altitude, latitude, longitude, population density, average obesity rates to name some that are likely relevant). Ideally, this kind of analysis would be done in a mixed model framework as well, correcting for the full genomic relationship matrix, and adding a random effect to account for unaccounted for environmental factors, but they should at least add covariates to their multiple regression framework that they have access to, or could access. They should also consider issues of collinearity as they may affect their estimates and study and report the Variance Inflation Factors (VIF) of the different variables. They should report in the manuscript, the results for the full model, giving coefficients and p-values for not only AI ancestry but also the other test variables, and should describe these in the results as well, and report and discuss the contribution of other test variables relative to AI ancestry as estimated from their model.

Further, the authors results show three observations that I would like to see described in the Results and discussed in the Discussion, as the implications are important. The authors observe an increase in height in Mexican Americans (at roughly the same rate in all amerindigenous ancestry stratifications, see note below) with birth year. First, they do not see any trend of the polygenic height score with birth year. This suggests that while the genetic predisposition of the trait remains the same, the trait has changed significantly due to non-genetic environmental factors. Second, even though amerindigenous ancestry is negatively correlated with height, and amerindigenous ancestry is increasing over time, the trait value/height increases rather than decreases over time. This also points to the effects of non-genetic factors playing an important role in values of the trait.

Lastly, if amerindigenous ancestry is negatively correlated with height due to genetic reasons, then shouldn't we expect to see the polygenic height score decrease with birth year, as amerindigenous ancestry increases with birth year? How do the authors interpret this meta pattern across their analyses, and what implications does it have for how temporal change in ancestry can alter the genetic architecture of traits? Overall, this could mean that (1) the correlation of amerindigenous ancestry with height is at least partly due to genetic reasons. Given that, while ancestry trends (AI ancestry increasing), combined with correlations of ancestry with traits, may make you predict one thing with respect to the genetic architecture of traits (height will decrease with birth year), the way heredity interacts with the environment through the randomness of development makes the trait move in the opposite direction than the model would predict. Or (2) the correlation of amerindigenous ancestry with height is fully picking a signal of height being lower in individuals with higher amerindigenous ancestry due to primarily environmental reasons, and therefore, as environment changes, the correlation is not meaningful for prediction. I would like to see the authors consider the above, and state these patterns as they stand across analyses, and discuss their implications for their overall thesis (as it relates to height and traits in general).

Note: They say in the study, "We find a similar trend in the HCHS/SOL Mexican Americans (Figure 4B). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated." Can the authors make this statement more specific in the manuscript – what are the rates of change in the different quartiles? Are they higher in quartiles with higher indigenous ancestry? Please integrate this into the Discussion above as well.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits" for further consideration by eLife. Your revised article has been evaluated by Patricia Wittkopp (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

To be able to appreciate the effects of different factors affecting complex trait variation, can the authors add the effect sizes of the important covariates to Figure 4A. Or, I would like to see these as supplemental figures. This is to put the AI ancestry effect in context, and see its magnitude relative to the effect of other non-genetic factors that have been modelled. While the authors have reported these in Supplementary file 3, figures will help more with being able to compare and parse the results. Further, I'd like to see a few sentences added to the Results and Discussion to summarize and discuss these results.

The authors have the following sentence in their Discussion "While height increases across all groups at a similar rate, illustrating the effects of non-genetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry." Given their new estimates of the rates of change in different groups the first part of this sentence needs to be revised.

https://doi.org/10.7554/eLife.56029.sa1

Author response

Essential revisions:

This paper's major finding is an observation of an increase in Amerindigenous ancestry in Mexican Americans in the 1940-1990 period. The manuscript is interesting and in theory suitable for publication in eLife, but after a number of points are addressed. First, the authors need to articulate clearly the factors that may have caused this primary observation, and what may be the most likely explanations outlined below.

We developed a simulation framework to investigate the evolutionary forces that can/cannot contribute to changes in ancestry proportions over such a short period of time. More details are included below.

Second, they need to address that the primary explanation for the ROH increase is likely the amerindigenous ancestry increase, and in that sense, determine and clearly articulate the place of the assortative mating observation in the manuscript.

We have redone the ROH analysis to control for global Amerindigenous ancestry as suggested and found that the pattern remains: ROH increases over time at a rate that exceeds the increase in Amerindigenous ancestry. In contrast, we find the opposite pattern when we perform the analogous analysis with European ancestry.

Last, they need to clearly admit in the manuscript the importance of the un-modelled socio-economic variable in the correlation between amerindigenous ancestry and complex traits, and only present this analysis after controlling for appropriate covariates.

We have removed the previous correlative analyses, and replaced them with a more thorough statistical analysis of AI ancestry with biomedical traits while controlling for several covariates. We now further discuss unmodelled socio-economic variables in the Results as well as Discussion sections.

It is a time when we are finally seeing some population genetic studies of understudied populations as they relate to complex trait variation (to which this manuscript can be an important contribution), and so the bar to be as rigorous as possible in considering alternative explanations and model all possible covariates should be set extremely high.

We completely agree that extreme care must be taken when studying marginalized populations, lest more harm may result.

1) The main finding is the increase in amerindigenous ancestry in the 1940 – 1990 period in Mexican-Americans in the United States.

a) The authors state: "In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003)." If I understand correctly, this regression is amerindigenous ancestry against time and other covariates. It would be helpful if the authors add to this sentence something along the lines of, "implying that individuals who arrived earlier to the US from Mexico had more European ancestry."

The reviewers understand correctly, and we have changed the sentence to reflect this: “In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003) suggesting that individuals who arrived earlier to the US had less AI ancestry.”

b) Given the above analysis, and independent migration analyses (see, for example, https://www.migrationpolicy.org/article/mexican-immigrants-united-states-2) , it seems that migration from Mexico to the US shifted over the years from states with less amerindigenous ancestry to states with higher amerindigeous ancestry (Chiapas, Oaxaca, Veracruz) in the South and South-east of the country. This seems a highly likely explanation for the pattern of increasing amerindigenous ancestry that they see, and should be stated as such in the manuscript. This seems especially likely given that the signal comes primarily from non-US born individuals, or US-born individuals with parents born outside the US.

We have added to the conclusion “Independent analyses have shown that migration from Mexico to the US has shifted over the years from states with less Amerindigenous ancestry to states with higher Amerindigenous ancestry” and have cited the suggested reference.

c) The authors state that "It is possible that the increase in global Amerindigenous ancestry over time could be biased by changes in the specific subcontinental Amerindigenous ancestries over time (though such an effect is not visible in our UMAP analysis, Figure 1B)." – It is not clear what is meant by this sentence – please re-phrase and articulate more clearly. If it alludes to the difference in migration sources over time I mention above, I don't think their analyses of Fst and genetic diversity rule that explanation out.

This sentence has been simplified to “We next explored whether the increase in global Amerindigenous ancestry over time could be biased by local changes in the specific subcontinental Amerindigenous ancestries over time.”

d) Assortative mating (Figure 2D and Supplementary figure 9). This argument is puzzling because if there is assortative mating along indigenous ancestry as they suggest, then would this not mean that there is also assortative mating along the collinear European ancestry? If this is the case, why would amerindigenous ancestry be increasing in particular? The authors do state that assortative mating would not cause an increase in one ancestry. In that case, the paper overall does not provide an explanation for why the amerindigenous ancestry is actually increasing – is the migration sources explanation the most likely explanation? Along with that individuals with higher amerindigenous ancestry must be reproducing more? The likely explanations of the primary result should be made very clear for the reader.

Within Appendix 1, we have added new simulations that demonstrate how different factors such as population growth, migration, fecundity and assortative mating can shape differences in global ancestry patterns. According to the simulations, migration can have a large effect on shaping patterns of Amerindigenous ancestry, but other factors can shape these patterns as well. In particular, while assortative mating does not lead to differences in Amerindigenous ancestry over time on its own, assortative mating can amplify other factors (such as ancestry-related differences in fecundity). Given these simulations and the data we have analyzed, we argue that there is no single cause for the increased Amerindigenous ancestry over time. Rather, this increase is a result of all factors: migration, ancestry-related fecundity differences, ancestry-biased assortative mating, and population growth.

e) The authors state: "and this increase cannot be entirely explained by very recent migration." What is the evidence backing this claim?

This is based on our analyses discussed in the “Dynamic Global Ancestry Proportions in Mexican Americans” section. Specifically starting with, “In our non-US born individuals (N=2987), we evaluated differences in ancestry estimates over time while accounting for years in the US and sampling weight and identified a significant effect of years in the US (𝛽=-0.0009; P=0.0006; SE=0.0003) suggesting that individuals who arrived earlier to the US had less Amerindigenous ancestry. However, this did not change the effect of birth year on the proportion of global Amerindigenous ancestry (𝛽 = 0.0028; P<2e-16, SE=0.0003).” To clarify the conclusion, we have rephrased the sentence: “and this increase cannot be entirely explained by very recent migration based on our analyses non-US born individuals "

f) "Sampling weight" is not actually defined in the Materials and methods. Can the authors clarify how this is defined and used to weight the major analysis?

We have added the definition of the sampling weight to the “Study dataset and initial quality control” section within the Materials and methods and have cited the paper. Specifically, “The sample survey for design for HCHS/SOL has been described previously. Briefly, census block groups were selected in defined communities near each of the four recruitment centers, and households were sampled within census block groups. Households with Hispanic/Latino surnames and individuals as well as residents over 45 years old were oversampled in order to increase representation of the Hispanic/Latino target population and achieve a uniform age distribution. Sampling weights were calculated for each individual to reflect the probability of sampling.”

g) Please describe the bootstrap resampling performed – is the bootstrapping performed over individuals or segments of the genome? Please justify the strategy picked. This should be described in the manuscript.

We performed bootstrap resampling over individuals and this has been further described in within the manuscript. Specifically, we now say “Bootstrap analyses (Figure 2B and Figure 2—figure supplement 3) were performed by calculating relevant statistics based on repeated resampling of individuals with replacement. Bootstrap resampling results in an estimate of the variance of the statistics that we are calculating in our data, and allows us to assess the impact of outliers (who are only resampled in a subset of iterations).”

2) The pattern of ROH change over time observed.

a) What are the number of individuals in each decade? Please show this in the manuscript. If there are more individuals in later birth decades (as may be expected), you would see an increase in the ROH summed over each genome with time, simply because you are summing over more individuals at later time periods. It is not clear if the analyses for Figure 2C are normalized by the number of individuals in each decade – if not, this would be important to do, and only the normalized results should be reported.

Supplementary file 2. We have clarified within the figure captions that the ROH sums are ROH sums per person, but to address the below comment as well, we show the normalized data in Figure 3C.

b) The ROHsum increasing with time could simply be due to the amerindigenous ancestry increasing with time, as amerindigenous ancestry carries more short ROH segments than European ancestry (see for example Ceballos et al. Nature Review Genetics 2018). The authors should explicitly describe this, as this simplest explanation would not require assortative mating to be invoked either.

We thank the reviewers for drawing our attention to this point that we missed in our original analysis. We have redone this analysis by normalizing the ROH sums per person by their global Amerindigenous ancestry. We redid the analyses with the normalized data and this is reflected now within the “Increased runs of homozygosity over time” Results section including Figure 3C and Figure 3—figure supplement 3. Our prior conclusion remains. ROH increases at a rate faster than the increase in Amerindigenous ancestry.

3) Correlation of amerindigenous ancestry and complex traits

Many of the traits studied would also be affected by socioeconomic status (for example, height, cholesterol). Do the authors have this variable available? If yes, it should be included in the multiple regression. If not, it should be clearly mentioned that they are not able to account for this likely important effect, leaving their estimates confounded by socio-economic differences that likely correlate with amerindigenous ancestry. For Figure 3, we don't think it is fair to show tau between only amerindigenous ancestry and traits as this analysis does not account for important covariates, and would like to see supplementary file 3 instead to replace Figure 3 (in a figure form as the authors prefer) such that only the multiple regression effect sizes are reported in the manuscript that account for covariates.

We agree with the reviewers, our correlation analysis was too simplistic. As suggested, we have replaced this analysis with our multiple regression model that accounts for birth year, center, sex, and sampling weight (and included all regression statistics in Supplementary file 3). While some of the particular traits that are significant after Bonferroni correction changed slightly (we are now controlling for 5*69=345 tests instead of just 69), the overall conclusion remains: nearly 1/3 of the traits are correlated with genomic Amerindigenous ancestry. While we agree that socio-economic factors can have a direct impact on biomedical traits (and can also be correlated with Amerindigenous ancestry), HCHS/SOL did not collect this variable so we cannot include it in our analysis.

Why do a first pass of this analysis without covariates included, and then re-run with covariates in the Bonferroni significant subset of traits only? (Given that there will be confounding between Amerindigenous ancestry and socioeconomic, environmental and other non-genetic factors, and especially age). Furthermore, looking in the Materials and methods section, we cannot seem to find the full description of how this analysis was performed i.e. what models were run, and how/if phenotypic measures used were cleaned and normalized etc. Please provide this.

The reviewers are entirely correct, the first-pass correlative analysis was unwarranted. As discussed above, we replaced this analysis with the multiple regression model that accounts for birth year (e.g. age), center, sex, and sampling weight. To compare the effect of AI ancestry across traits, we quantile normalized all traits, and include a justification for our use of quantile normalization in the main text. We also compared the p-values for Amerindigenous ancestry effects across traits when the data were untransformed vs quantile normalized, and found a strong correlation (rho=0.944; p<2.2e16) with no statistical evidence for a difference in the distributions of p-values (MannWhitney U test p-value=0.857).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Summary:

The manuscript is significantly improved, and the addition of the new simulation analyses greatly help in the interpretation of the trends that they see. The authors have sufficiently addressed concerns about two of their main results: (1) Interpretation of ancestry change over time and (2) Interpretation of ROH change over time. I still have the following points I would ask them to consider regarding their third main result (3) the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits, and to make appropriate additions/revisions to their analyses and text before submitting a revised manuscript. Beyond that, I see that the new simulation results don't appear in the manuscript until the Discussion. I would consider them a result and would like to see the authors try to integrate the main simulation results in the Results section, when they report their empirical observed patterns.

We appreciate the overall positive sentiment of our revision, and the focus on what additional steps would further improve our manuscript. We have now moved the simulations from Appendix 1 to the main Results section. They are now discussed after the “Strong ancestry related assortative mating in HCHS/SOL Mexicans” section and before the “Genetic association of global AI ancestry with biomedical traits” section. We hope our revisions on the potential effects of ancestry and ancestry change over time on the genetic architecture of complex traits section sufficiently address the requested revisions, as we describe further below.

Revisions for this paper:

The authors state in the manuscript that, "As illustrated in Figure 4A, 20 of these traits (29%) are significantly correlated after Bonferroni correction (P<0.000145), highlighting the need for increased investigation into the role of AI genetic ancestry and other unmodelled socio-economic variables in admixed populations such as Mexican Americans." As written, it understates the correlation between AI ancestry and unmodelled socio-economic variables in the United States which we know exists on the basis of historical and social science research. Given this, I would like to see the text revised to say that it is not clear what the correlation with AI ancestry implies, and while it could be reflecting genetic effects, it can also be reflecting socio-economic variables that are correlated with AI ancestry and that AI ancestry could be serving as a proxy for. The authors themselves show that AI ancestry is correlated with educational attainment levels which they state is a proxy for socio-economic status. I would like to see their model for testing the effect of ancestry on traits include as covariates: (1) educational attainment as a proxy for socioeconomic status, (2) whether they are US born or not, and whether their parents are US born or not, to help capture the effects of different environments they would have been born in, and that their parents would have created for them on various levels. I would also encourage the authors to make their model as rich as possible by adding other environmental variables they could obtain on their recruitment sites (altitude, latitude, longitude, population density, average obesity rates to name some that are likely relevant). Ideally, this kind of analysis would be done in a mixed model framework as well, correcting for the full genomic relationship matrix, and adding a random effect to account for unaccounted for environmental factors, but they should at least add covariates to their multiple regression framework that they have access to, or could access. They should also consider issues of collinearity as they may affect their estimates and study and report the Variance Inflation Factors (VIF) of the different variables. They should report in the manuscript, the results for the full model, giving coefficients and p-values for not only AI ancestry but also the other test variables, and should describe these in the results as well, and report and discuss the contribution of other test variables relative to AI ancestry as estimated from their model.

We appreciate the reviewers’ focus on improving our statistical model. As suggested, we added covariates for educational attainment, US born status, and number of US born parents in our multiple regression as these were the variables that we had access to. These results have been updated in Figure 4A, Supplementary file 3 and Figure 4—figure supplement 3. Supplementary file 3 specifically includes the coefficients and p-values for all test variables for each trait. Here we have included a figure (Author response image 1) to illustrate the differences in the effect size of AI ancestry before and after the additional adjustments accounting for educational attainment, US born status and number of US born parents. Notably, there was very little change. This was in additional to the original adjustments of birthyear, gender, center, and sampling weight. However, even after accounting for these additional variables, the effect sizes of AI ancestry were largely unchanged (Pearson correlation coefficient = 0.984, P<2.2E-16). There was one trait (% Immature granulocytes) that changed from a negative association with AI ancestry to a positive association with AI ancestry, but the AI effect on this trait was not statistically significant before or after the addition of additional covariates.

We modified the above reviewer-quoted statement to “As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction P<9.1E-5) after adjusting for several factors including birth year, educational attainment, US-born, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns. Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.”

We have included Supplementary file 3, which includes results for all phenotypes (raw and quantile normalized) with the effect size, SE, and P-value for each covariate.

Author response image 1

Further, the authors results show three observations that I would like to see described in the Results and discussed in the Discussion, as the implications are important. The authors observe an increase in height in Mexican Americans (at roughly the same rate in all amerindigenous ancestry stratifications, see note below) with birth year. First, they do not see any trend of the polygenic height score with birth year. This suggests that while the genetic predisposition of the trait remains the same, the trait has changed significantly due to non-genetic environmental factors.

We have elaborated on this further below.

Second, even though amerindigenous ancestry is negatively correlated with height, and amerindigenous ancestry is increasing over time, the trait value/height increases rather than decreases over time. This also points to the effects of non-genetic factors playing an important role in values of the trait.

This is true. Both genetic and non-genetic factors drive variation in height. Height is estimated to have a broad-sense heritability of 80% in Northern European populations, suggesting that environmental factors explain ~20% of the variation in height in these populations. It is unclear how these estimates of heritability translate to Mexican Americans. Within the Discussion, we have elaborated further on this. We have added, “While height increases across all groups at a similar rate, illustrating the effects of nongenetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry. Individuals with lower percentages of AI ancestry were taller on average than individuals with higher AI ancestry pointing the role of AI ancestry on the trait.”

Last, if amerindigenous ancestry is negatively correlated with height due to genetic reasons, then shouldn't we expect to see the polygenic height score decrease with birth year, as amerindigenous ancestry increases with birth year? How do the authors interpret this meta pattern across their analyses, and what implications does it have for how temporal change in ancestry can alter the genetic architecture of traits? Overall, this could mean that (1) the correlation of amerindigenous ancestry with height is at least partly due to genetic reasons. Given that, while ancestry trends (AI ancestry increasing), combined with correlations of ancestry with traits, may make you predict one thing with respect to the genetic architecture of traits (height will decrease with birth year), the way heredity interacts with the environment through the randomness of development makes the trait move in the opposite direction than the model would predict. Or (2) the correlation of amerindigenous ancestry with height is fully picking a signal of height being lower in individuals with higher amerindigenous ancestry due to primarily environmental reasons, and therefore, as environment changes, the correlation is not meaningful for prediction. I would like to see the authors consider the above, and state these patterns as they stand across analyses, and discuss their implications for their overall thesis (as it relates to height and traits in general).

This is a very complex issue, and we have attempted to be conservative in the way we describe these patterns. For example, Figure 4—figure supplement 2 shows that there is indeed a slight negative overall trend for polygenic height score and birth year when we accounted for additional environmental variables including educational attainment, US born status and number of US born parents. The slope is not significant (P=0.14), so the approach we take is to not draw conclusions upon it. As a further point of clarification, PHS is only correlated with observed height in the bottom two quartiles of AI ancestry (i.e., only the Mexican Americans with highest European ancestry). As AI ancestry increases over time, we expect the performance of PHS to decrease. Such a decrease in accuracy could also manifest as an elimination of signal with birth-year.

It is possible we did not see a significant trend because the of the way the polygenic height score is calculated as a metric. As we know, the majority of GWASs have been performed in populations with primarily European ancestry thus providing insight into our understanding of the genetic architecture of height. However, due to the exclusion of diverse populations, we are still limited in our full understanding of the genetics of height.

A recent study of Peruvians (1) demonstrated the role of population specific variants and their contributions to differences in height. As we do not fully understand the genetics of height in Amerindigenous or admixed populations, it is possible there may be ancestral specific variants within the AI component in Mexicans. These variants may have a significant effect on the trait but these variants would not have been captured in the PHS as these GWAS results were derived from an analysis on European individuals.

Individuals with higher AI ancestry may harbor variants that may contribute more to differences in height that would not have been detected in a European GWAS. Even for Mexican individuals with higher European ancestry, they still may have variants within the AI component that could have a significant impact on the trait. However, these analyses exceed the limitations of the data available and are therefore outside the scope of this manuscript.

Note: They say in the study, "We find a similar trend in the HCHS/SOL Mexican Americans (Figure 4B). Indeed, when we stratified individuals by quartiles of global AI ancestry, we see that all quartiles have increased in height by a similar amount over the period investigated." Can the authors make this statement more specific in the manuscript – what are the rates of change in the different quartiles? Are they higher in quartiles with higher indigenous ancestry? Please integrate this into the Discussion above as well.

We have elaborated this further in the manuscript by specifically adding to the results, “The rates of change in height between AI quartiles were all positive and significant (P<5E-6). The largest was for the quartile with the highest AI ancestry, but the rates did not change monotonically with respect to AI ancestry across quartiles. The estimates for the quartiles with their 95% CIs are: 𝛽=0.135 (CI:0.097-0.173) for AI>0.58; 𝛽=0.124 (CI:0.089-0.160) for 0.46<=AI<=0.58; 𝛽=0.083 (CI:0.047-0.119) for 0.37<=AI<=0.46; and 𝛽=0.113 (CI:0.074-0.151) for AI<0.37.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

To be able to appreciate the effects of different factors affecting complex trait variation, can the authors add the effect sizes of the important covariates to Figure 4A. Or, I would like to see these as supplemental figures. This is to put the AI ancestry effect in context, and see its magnitude relative to the effect of other non-genetic factors that have been modelled. While the authors have reported these in Supplementary file 3, figures will help more with being able to compare and parse the results. Further, I'd like to see a few sentences added to the Results and Discussion to summarize and discuss these results.

We agree with the suggestion of a new figure and how the effects of all of the different factors can be better appreciated in addition to Supplementary file 3. Instead of adding to the main Figure 4A, we added a supplementary figure (now Figure 4—figure supplement 1) due to the high number of total points (traits x covariates). Within the Results section we have included, “While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1).”

For context the paragraph now reads as, “As illustrated in Figure 4A, 18 of these traits (26%) are significantly associated with AI ancestry (Bonferroni correction P<6.6E-5) after adjusting for several factors including birth year, center, gender, sampling weight, educational attainment, US-born status, and number of US-born parents. While this suggests that genetic ancestry has an effect on several traits, other unmodeled socio-economic variables that are correlated with AI ancestry may also be contributing to these patterns (though AI ancestry has among the strongest effects on a range of biomedical traits, comparable to the effects of gender; Figure 4—figure supplement 1). Regardless, these findings highlight the need for increased investigation into the role of AI genetic ancestry in admixed populations such as Mexican Americans.”

Within the Discussion, we have rephrased one of the sentences, “We identify several biomedical traits that are associated with Amerindigenous ancestry, and show that in the case of height, there are both ancestry and temporal effects” to “We identify several biomedical traits that are associated with Amerindigenous ancestry, with effects comparable to the high effects of gender, and show that in the case of height, there are both ancestry and temporal effects.”

We kept the new minor additions simple as we believe the sections in the Discussion that we had previously written about the importance of studying diverse populations are solid.

The authors have the following sentence in their Discussion "While height increases across all groups at a similar rate, illustrating the effects of non-genetic factors having an important role in the values of the trait, we do see differences based on percentage of AI ancestry." Given their new estimates of the rates of change in different groups the first part of this sentence needs to be revised.

We have reworded the sentence to “While we do see differences in mean height based on percentage of AI ancestry, height increases over time in all groups at similar rates.” We hope this clarifies that the “differences based on percentage of AI ancestry” were referring to the mean heights for each group rather than the rates.

References

1) Asgari S, Luo Y, Akbari A, Belbin GM, Li X, Harris DN, et al. A positively selected FBN1 missense variant reduces height in Peruvian individuals. Nature. 2020;582(7811):234-9.

https://doi.org/10.7554/eLife.56029.sa2

Article and author information

Author details

  1. Melissa L Spear

    1. Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, United States
    2. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    3. McGill Genome Centre, McGill University, Montreal, Canada
    4. Department of Human Genetics, McGill University, Montreal, Canada
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft
    For correspondence
    mlspear09@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3252-8411
  2. Alex Diaz-Papkovich

    1. McGill Genome Centre, McGill University, Montreal, Canada
    2. Quantitative Life Sciences Program, McGill University, Montreal, Canada
    Contribution
    Resources, Formal analysis, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2867-5494
  3. Elad Ziv

    1. Division of General Internal Medicine, University of California, San Francisco, San Francisco, United States
    2. Department of Medicine, University of California, San Francisco, San Francisco, United States
    3. Institute of Human Genetics, University of California, San Francisco, San Francisco, United States
    4. Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Visualization, Methodology, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2324-2884
  4. Joseph M Yracheta

    1. Native BioData Consortium, Eagle Butte, United States
    2. Bloomberg School of Public Health, Johns Hopkins University, Baltimore, United States
    Contribution
    Formal analysis, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0691-8504
  5. Simon Gravel

    1. McGill Genome Centre, McGill University, Montreal, Canada
    2. Department of Human Genetics, McGill University, Montreal, Canada
    Contribution
    Resources, Supervision, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9183-964X
  6. Dara G Torgerson

    1. McGill Genome Centre, McGill University, Montreal, Canada
    2. Department of Human Genetics, McGill University, Montreal, Canada
    3. Department of Epidemiology and Biostatistics University of California, San Francisco, San Francisco, United States
    Contribution
    Resources, Supervision, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Ryan D Hernandez

    1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    2. McGill Genome Centre, McGill University, Montreal, Canada
    3. Department of Human Genetics, McGill University, Montreal, Canada
    4. Institute of Human Genetics, University of California, San Francisco, San Francisco, United States
    5. Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, United States
    6. Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Writing - review and editing
    For correspondence
    ryan.hernandez@me.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5249-504X

Funding

National Institutes of Health (R01HG007644)

  • Ryan D Hernandez

National Institutes of Health (F31HG010104)

  • Melissa L Spear

Canadian Institutes of Health Research (MOP-136855)

  • Simon Gravel

National Cancer Institute (R01184545)

  • Elad Ziv

National Cancer Institute (K24169004)

  • Elad Ziv

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank many colleagues who commented on our preprint prior to submission, particularly Reed Cartwright for suggestions on terminology. MLS was supported through the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under Award Number F31HG010104. ADP and SG were supported, in part, thanks to funding from the Canada Research Chairs program and CIHR grant MOP-136855. EZ was supported, in part, by NIH grants R0184545 and K24CA169004. RDH was supported, in part, by NHGRI grant R01HG007644 and the Canadian Research Chairs program.

Senior Editor

  1. Patricia J Wittkopp, University of Michigan, United States

Reviewing Editor

  1. Mashaal Sohail, Brigham and Women's Hospital and Harvard Medical School, United States

Reviewers

  1. Mashaal Sohail, Brigham and Women's Hospital and Harvard Medical School, United States
  2. Genevieve L Wojcik, Stanford University School of Medicine, United States

Publication history

  1. Received: February 13, 2020
  2. Accepted: December 13, 2020
  3. Version of Record published: December 29, 2020 (version 1)

Copyright

© 2020, Spear et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,943
    Page views
  • 67
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    Claudia Igler et al.
    Research Article Updated

    The success of antimicrobial treatment is threatened by the evolution of drug resistance. Population genetic models are an important tool in mitigating that threat. However, most such models consider resistance emergence via a single mutational step. Here, we assembled experimental evidence that drug resistance evolution follows two patterns: (i) a single mutation, which provides a large resistance benefit, or (ii) multiple mutations, each conferring a small benefit, which combine to yield high-level resistance. Using stochastic modeling, we then investigated the consequences of these two patterns for treatment failure and population diversity under various treatments. We find that resistance evolution is substantially limited if more than two mutations are required and that the extent of this limitation depends on the combination of drug type and pharmacokinetic profile. Further, if multiple mutations are necessary, adaptive treatment, which only suppresses the bacterial population, delays treatment failure due to resistance for a longer time than aggressive treatment, which aims at eradication.

    1. Evolutionary Biology
    2. Physics of Living Systems
    Joy Bergelson et al.
    Review Article

    The immeasurable complexity at every level of biological organization creates a daunting task for understanding biological function. Here, we highlight the risks of stripping it away at the outset and discuss a possible path toward arriving at emergent simplicity of understanding while still embracing the ever-changing complexity of biotic interactions that we see in nature.