1. Genes and Chromosomes
  2. Human Biology and Medicine
Download icon

Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures

  1. Joshua M Galanter Is a corresponding author
  2. Christopher R Gignoux
  3. Sam S Oh
  4. Dara Torgerson
  5. Maria Pino-Yanes
  6. Neeta Thakur
  7. Celeste Eng
  8. Donglei Hu
  9. Scott Huntsman
  10. Harold J Farber
  11. Pedro C Avila
  12. Emerita Brigino-Buenaventura
  13. Michael A LeNoir
  14. Kelly Meade
  15. Denise Serebrisky
  16. William Rodríguez-Cintrón
  17. Rajesh Kumar
  18. Jose R Rodríguez-Santana
  19. Max A Seibold
  20. Luisa N Borrell
  21. Esteban G Burchard Is a corresponding author
  22. Noah Zaitlen Is a corresponding author
  1. University of California, United States
  2. University of California, San Francisco, United States
  3. Stanford University, United States
  4. Hospital Universitario Nuestra Señora de Candelaria, Spain
  5. Instituto de Salud Carlos III, Spain
  6. Baylor College of Medicine and Texas Children’s Hospital, Texas
  7. Feinberg School of Medicine, Northwestern University, Illinois
  8. Kaiser Permanente-Vallejo Medical Center, United States
  9. Bay Area Pediatrics, United States
  10. Children’s Hospital and Research Center, United States
  11. Jacobi Medical Center, United States
  12. Veterans Caribbean Health System, United States
  13. The Ann and Robert H Lurie Children’s Hospital of Chicago, United States
  14. Centro de Neumología Pediátrica, United States
  15. National Jewish Health, United States
  16. City University of New York, United States
Research Article
Cited
0
Views
2,876
Comments
0
Cite as: eLife 2017;6:e20532 doi: 10.7554/eLife.20532

Abstract

Populations are often divided categorically into distinct racial/ethnic groups based on social rather than biological constructs. Genetic ancestry has been suggested as an alternative to this categorization. Herein, we typed over 450,000 CpG sites in whole blood of 573 individuals of diverse Hispanic origin who also had high-density genotype data. We found that both self-identified ethnicity and genetically determined ancestry were each significantly associated with methylation levels at 916 and 194 CpGs, respectively, and that shared genomic ancestry accounted for a median of 75.7% (IQR 45.8% to 92%) of the variance in methylation associated with ethnicity. There was a significant enrichment (p=4.2×10-64) of ethnicity-associated sites amongst loci previously associated environmental exposures, particularly maternal smoking during pregnancy. We conclude that differential methylation between ethnic groups is partially explained by the shared genetic ancestry but that environmental factors not captured by ancestry significantly contribute to variation in methylation.

https://doi.org/10.7554/eLife.20532.001

eLife digest

Whether a person develops a particular disease can depend on both genetic and environmental factors. Many studies have found that people of different races and ethnicities have different likelihoods of acquiring certain diseases. Race and ethnicity are social constructs; that is, they are not necessarily defined biologically. However, shared ancestry will produce genetic links between members of a group. In addition, members of an ethnic group often share a culture or environment that may influence their risk of disease. For example, the ‘Mediterranean diet’ inspired by the dietary habits of Southern Italians has been shown to reduce the risk of heart disease, diabetes and cancer.

The addition of chemical groups – such as methyl groups – to DNA strands can affect the activity of nearby genes. Methylation is controlled by both genetic and environmental factors, and altered patterns of DNA methylation are seen in some diseases. It is therefore an ideal biological process to study to determine how race/ethnicity and ancestry contribute to a person’s susceptibility to disease.

Galanter et al. have now studied the patterns of methylation found in the blood of 573 people from diverse Latino ethnic sub-groups. The different groups displayed significantly different patterns of methylation at hundreds of locations across the genome. Genetic ancestry explained approximately 75% of the variation in methylation between the sub-groups. In addition, the methylation patterns at DNA locations known to be affected by environmental exposures – for example, by exposure to tobacco while in the womb – were disproportionately likely to be methylated differently in different sub-groups.

Now that more is known about the relative effects of race/ethnicity and genetic ancestry on methylation, the next step is to apply this knowledge to disease processes. This will help us to better understand the source of health disparities across different groups of people.

https://doi.org/10.7554/eLife.20532.002

Introduction

Race, ethnicity, and genetic ancestry have had a complex and often controversial history within biomedical research and clinical practice (Risch et al., 2002; Cooper et al., 2003; Yudell et al., 2016; Burchard et al., 2003; Phimister, 2003). For example, race- and ethnicity-specific clinical reference standards are based on population-based sampling on a given physical trait such as pulmonary function (Hankinson et al., 1999; Quanjer et al., 2012). However, because race and ethnicity are social constructs and poor markers for genetic diversity, they fail to capture the heterogeneity present within racial/ethnic groups and in admixed populations (Borrell, 2005). To account for these heterogeneities and to avoid social and political controversies, the genetics community has grouped individuals by genetic ancestry instead of race and ethnicity (Yudell et al., 2016). Indeed, recent work from our group and others have demonstrated that genetic ancestry improves diagnostic precision compared to racial/ethnic categorizations for specific medical conditions and clinical decisions (Kumar et al., 2010; Udler et al., 2015; Nalls et al., 2008).

However, racial and ethnic categories also reflect the shared experiences and exposures to known risk factors for disease, such as air pollution and tobacco smoke, poverty, and inadequate access to medical services, which have all contributed to worse disease outcomes in certain populations (Nguyen et al., 2014; Evans and Kantrowitz, 2002). Thus, it is unclear whether defining groups through genetic ancestry can capture these shared exposures. In this work we seek to explore the contributions of genetically defined ancestry and social, cultural and environmental factors to understanding differential methylation between ethnic groups.

Epigenetic modification of the genome through methylation plays a key role in the regulation of diverse cellular processes (Smith and Meissner, 2013). Changes in DNA methylation patterns have been associated with complex diseases, including various cancers (Kulis and Esteller, 2010), cardiovascular disease (Udali et al., 2013; Kato et al., 2015), obesity (Bell et al., 2010), diabetes (Chambers et al., 2015), autoimmune and inflammatory diseases (Liu et al., 2013), and neurodegenerative diseases (Lardenoije et al., 2015). Epigenetic changes are thought to reflect influences of both genetic (Bell et al., 2011) and environmental factors (Feil and Fraga, 2011), and have been shown to vary between racial groups (Barfield et al., 2014). The discovery of methylation quantitative trait loci (meQTL’s) across populations by Bell et al. established the influence of genetic factors on methylation levels in a variety of tissue types (Bell et al., 2011), with meQTL’s explaining between 22% and 63% of the variance in methylation levels. Multiple environmental factors have also been shown to affect methylation levels, including endocrine disruptors, tobacco smoke (Joubert et al., 2012, 2016), polycyclic aromatic hydrocarbons, infectious pathogens, particulate matter, diesel exhaust particles (Jiang et al., 2014), allergens, heavy metals, and other indoor and outdoor pollutants (Ho et al., 2012). Psychosocial factors, including measures of traumatic experiences (Chen et al., 2013; Ressler et al., 2011; van der Knaap et al., 2014), socioeconomic status (Lam et al., 2012; Borghol et al., 2012), and general perceived stress (Vidal et al., 2014), also affect methylation levels. Since both genetic and environmental exposures affect methylation, this represents an ideal phenotype to explore the contributions of these two factors on differential methylation between ethnic groups.

In this work, we leveraged genome-wide methylation data in 573 Latino children of diverse Latino sub-ethnicities enrolled in the Genes-Environment and Admixture in Latino Americans (GALA II) study (Oh et al., 2012) whose genetic ancestry had been determined from dense genotyping arrays. This allowed us to explore the extent to which the differences in methylation between Latino sub-groups could be explained by their shared genetic ancestry. We found that many of the methylation differences associated with ethnicity could be explained by shared genetic ancestry. However, even after adjusting for ancestry, significant differences in methylation remained between the groups at multiple loci, reflecting social and environmental influences upon methylation.

Our findings have important implications for both the use of ancestry to capture biological changes and of race/ethnicity to account for social and environmental exposures. Epigenome-wide association studies in diverse populations may be susceptible to confounding due to environmental exposures in addition to confounding due to population stratification (Michels et al., 2013). The findings also have implications for the common practice of considering individuals of Latino descent, regardless of origin as a single ethnic group.

Results

The study included 573 participants, the majority of whom self-identified as being either of Puerto Rican (n = 220) or Mexican origin (n = 276). Table 1 displays baseline characteristics of the GALA II study participants with methylation data included in this study, stratified by ethnic subgroups (Puerto Rican, Mexican, Other Latino, and Mixed Latinos who had grandparents of more than one national origin). Figure 1 shows the distribution of African, European, and Native American ancestry among the 524 participants with genomic ancestry estimates.

Ancestry estimates for GALA II participants, by ethnic group.

Mexicans, on average, had a greater proportion of Native American ancestry than Puerto Ricans; Puerto Ricans had a greater proportion of European and African ancestry. Mixed and other Latinos were intermediate.

https://doi.org/10.7554/eLife.20532.003
Table 1

Baseline characteristics of GALA II participants with methylation data, stratified by ethnicity. Continuous variables are reported with inter-quartile range in brackets.

https://doi.org/10.7554/eLife.20532.004

Mexican

Puerto rican

Mixed latino

Other latino

n

276

220

16

61

Males (%)

125 (45.3%)

127 (57.7%)

6 (37.5%)

28 (45.9%)

Age

11.4 [9.3: 14.7]

12.3 [10.4: 14.2]

11.8 [10.7: 14.9]

11.8 [10: 15.7]

Asthma cases (%)

124 (44.9%)

147 (66.8%)

9 (56.3%)

31 (50.8%)

Ancestry (n = 524)

African

4.3%

[2.9%: 6.0%)

22.8%

[16.6%: 29.4%)

8.5%

[5.6%: 19.2%)

12.3%

[6.3%: 25.8%)

Native American

55.4%

[44.5%: 65.7%)

11.2%

[9.8%: 13%)

31.5%

[20.9%: 45.6%)

32.8%

[10.4%: 49.3%)

European

40.5%

[29.9%: 50.2%)

65.7%

[59.2%: 71%)

50.5%

[44.6%: 57.6%)

48.9%

[40%: 58.5%)

Recruitment Site

Chicago

140 (50.7%)

15 (6.8%)

11 (68.9%)

15 (24.6%)

New York

18 (6.5%)

10 (4.5%)

1 (6.3%)

23 (37.7%)

Puerto Rico

0

193 (87.7%)

0

0

San Francisco

78 (28.3%)

0

2 (12.5%)

23 (37.7%)

Houston

40 (14.5%)

2 (0.9%)

2 (12.5%)

5 (8.2%)

Cell Counts (estimated)

Granulo cytes

51.2%

[46.0%: 55.7%)

51.6%

[46.8%: 57%)

51%

[43.6%: 57.2%)

49.1%

[43.8%: 55.8%)

Lympho cytes

41.9%

[36.9%: 46.6%)

41.8%

[36.9%: 46.5%)

41.9%

[36.1%: 51.6%)

43.9%

[36.8%: 49.6%)

Mono cytes

7.1%

[5.8%: 8.3%)

6.74%

[5.74%: 8.24%)

6.6%

[5.7%: 7.6%)

7.4%

[6.2%: 8.6%)

Methylation data used in this study has been previously made publicly available at the Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE77716 (Rahmani et al., 2016). Genotyping data has been deposited in dbGaP; link will be activated when the data becomes publicly available (Burchard, http://www.ncbi.nlm.nih.gov/gap/?term=phs001180).

Global patterns of methylation

Differences in ethnicity and ancestry resulted in discernible patterns in the global methylation profile as demonstrated in a multidimensional scaling analysis (Figure 2A). As expected (Houseman et al., 2012; Lam etal., 2012), the first few principal coordinates are strongly correlated to imputed cell composition (Figure 2B–C). There are also significant associations of self-identified sub-ethnicity with PC2 (p-ANOVA = 0.003), PC3 (p-ANOVA = 0.004), PC6 (p-ANOVA = 0.0001), PC7 (p-ANOVA = 0.0003) (Figure 3A), and PC8 (p-ANOVA = 0.0003), after adjusting for age, sex, disease status, cell components, and technical laboratory factors (plate and position). Genetic ancestry was associated with PC3 (p=0.002), PC7 (p=0.0004) (Figure 3B) and PC8 (p=0.001) in a two degree of freedom ANOVA test, adjusting for age, sex, disease status, cell components, technical factors, and ethnicity. Supplementary file 1A summarizes the results of the simple correlation analysis of methylation with ethnicity and ancestry, as well as the adjusted nested ANOVA models described above and the mediation results described below.

Patterns of global methylation.

(A) Distribution of the first 10 principal coordinates of the methylation data. Plots in the diagonal show the univariate distribution; those in the lower left triangle show bivariate relationship between each pair of PCs, while those in the upper right show the bivariate density. (B) Bivariate or ANOVA associations between principal coordinates and technical factors (chip, position), cell counts, genetic ancestry (European, Native American, African), recruitment site (New York, NY, San Francisco, CA, Chicago, IL, Houston, TX, and Puerto Rico), demographic factors (ethnicity, age, sex), and case status. (C) Correlation coefficients between the various factors and principal coordinates.

https://doi.org/10.7554/eLife.20532.005
Associations between ethnicity, ancestry and global methylation.

(A) Association between ethnicity and principal coordinate 7. (B) Association between Native American ancestry proportion and PC7, colored by ethnicity. Native American ancestry explains approximately 81% of the association between PC7 and ethnicity.

https://doi.org/10.7554/eLife.20532.006

A mediation analysis (Tingley et al., 2014) revealed that the associations between ethnicity and PCs 3, 7, and eight were significantly mediated by Native American ancestry, which explained ~100% (95% CI: 37–100%, p=0.01) of PC3, 83% (95% CI 37–100%, p<0.001) of PC7 and 66% (95% CI: 25% to 100%, p<0.001) of PC8. Inclusion of Native American ancestry in the regression model of PCs 3, 7, and eight caused the ethnicity associations to be non-significant. However, the associations of ethnicity with PCs 2 and 6 were not explained by Native American, African or European ancestry (mediation p>0.05), suggesting that the ethnic differences in these principal components are associated with global methylation patterns not captured by the shared genetic ancestry of each ethnic group. When genetic ancestry was regressed on the methylation data with the principal coordinates recalculated using the residuals of the regression between methylation and ancestry, there was an association between ethnicity and PC6 (p-ANOVA = 0.003). However, there was no association with any of the other principal coordinates. These observations suggest that while shared genetic ancestry can explain over 50% of the association between ethnicity and global methylation patterns in three PC’s, other non-genetic factors, such as environmental and social exposure differences associated with ethnicity influence methylation and are not captured by measures of genetic ancestry in two others.

Epigenome-wide association of self-identified ethnicity

An epigenome-wide association study of self-identified ethnicity (see Materials and methods for details of ascertainment of ethnicity) and methylation identified a significant difference in methylation M-values between ethnic groups at 916 CpG sites at a Bonferroni-corrected significance level of less than 1.6 × 10−7 (Figure 4A and Supplementary file 1B). The most significant association with ethnicity occurred at cg12321355 in the ABO blood group gene (ABO) on chromosome 3 (p-ANOVA 6.7 × 10−22) (Figure 4B). A two degree of freedom ANOVA test for genomic ancestry was also significantly associated with methylation level at this site (p=2.3×10−5) (Figure 4C), and when the analysis was stratified by ethnic sub-group, showed an association in both Puerto Ricans and Mexicans (p=0.001 for Puerto Ricans, p=0.003 for Mexicans). Although adjusting for genomic ancestry attenuated the effect of ethnicity, a significant association between ethnicity and methylation remained (p=0.04). Recruitment site, an environmental exposure proxy, was not significantly associated with methylation at this locus (p=0.5), suggesting that environmental differences associated with ethnicity beyond geography and ancestry are driving the association.

Associations between ethnicity and methylation (A) Manhattan plot showing the associations between ethnicity and methylation at individual CpG loci.

(B) Violin plot showing one such locus, cg19145607. Mexicans are relatively hypermethylated compared to Puerto Ricans (p=1.4×10–19). (C) Plot showing the association between Native American ancestry at the locus and methylation levels at the locus colored by ethnicity; Native American ancestry accounts for 58% of the association between ethnicity and methylation at the locus.

https://doi.org/10.7554/eLife.20532.007

To determine the contribution of shared genetic ancestry and other factors associated with ethnicity, we repeated the analysis adjusting for ancestry. A significant association remained in 314 of the 834 (37.8%, p=1.7×10−183 for enrichment) CpG sites associated with ethnicity (Figure 5A and Supplementary file 1B) (82 sites were excluded because they demonstrated unstable coefficient estimates and inflated standard errors due to strong correlations between ethnicity and ancestry, especially Native American ancestry [see Figure 1]).

Table 2 and Figure 5b show the proportion of variance explained by ethnicity, genomic ancestry, and their joint effect in the 916 CpG’s associated with ethnicity, as well as the 314 CpG’s that remained associated with ethnicity after adjustment for ancestry and the 520 CpG’s whose association with ethnicity was no longer significant when ancestry terms were introduced into the model. Even after adjusting for genomic ancestry, ethnicity explained 1.7% (IQR 0.785% to 3.0%) but as much as 13.4% of the variance in methylation across these loci. Genomic ancestry explained a median of 4.2% (IQR 1.8% to 8.3%) of the variance in methylation at all loci associated with ethnicity and accounts for a median of 75.7% (IQR 45.8% to 92%) of the total variance in methylation explained jointly by ethnicity and ancestry (median of 6.8%, IQR 4.5% to 10.0%) (Figure 5B).

Relationship between genomic ancestry and the association between ethnicity and methylation.

(A) Venn diagram showing the effect of adjustment for ancestry on the association between ethnicity and methylation. The components of the diagram represent the number of CpG’s that remained associated with ethnicity after adjustment for ancestry and the number of CpG’s that were associated with ancestry. (B) Relative proportion of variance in methylation explained by ethnicity and genomic ancestry across loci significantly associated with ethnicity. Mediation analysis of associations between ethnicity and methylation M-values for (C) Native American ancestry and (D) African ancestry. For simplicity, only significant mediation effects are shown.

https://doi.org/10.7554/eLife.20532.008
Table 2

Proportion of variance in methylation explained by ethnicity and ancestry. Numbers represent the median and interquartile range.

https://doi.org/10.7554/eLife.20532.009

Component

All CpG’s associated with ethnicity (n = 916)

CpG’s associated with ethnicity after adjusting for ancestry (n = 314)

CpG’s whose association with ethnicity is explained by ancestry (n = 520)

Joint

6.8% (4.5% to 10%)

6.2% (4.4% to 8.8%)

7.8% (5.3% to 11.1%)

Ethnicity

1.7% (0.78% to 3.0%)

3.5% (2.2% to 5.1%)

<1%

Ancestry

4.2% (1.8% to 8.3%)

1.8% (0.8% to 4.0%)

6.6% (4.0% to 10.2%)

Ethnicity and ancestry jointly explained as much as 38.5% of the variance in methylation in one CpG (cg0966827) and there were 17 CpG’s where ethnicity and ancestry jointly explain more 25% of the variance. Among the 314 CpG’s that remained associated with ethnicity after adjustment for ancestry, ethnicity accounted for a larger share of the joint variance than genomic ancestry (3.5%, IQR 2.2% to 5.1% versus 1.8%, IQR 0.8% to 4.0%). We saw a moderate amount of correlation between the 314 methylation sites associated with ethnicity after adjusting for ancestry (median R2 of 0.044, IQR 0.01 to 0.13).

Sensitivity tests for departures from linearity, fine scale population substructure and the exclusion of the 16 participants who self-identified as ‘Mixed Latino’ sub-ethnicity, did not meaningfully affect our results (See Supplementary file 1B–F). To rule out any residual confounding due to recruitment sites, we conducted an additional analysis on the effect of recruitment site on methylation both for the overall study and for the Mexican participants (the largest study population in this analysis). We observed no significant independent effect of recruitment site suggesting that confounding due to recruitment region was limited, at least within the United States.

To explore the effect of departures from a linear association between ancestry and methylation, we incorporated both higher order polynomials and cubic splines of ancestry into our models. We observed a significant departure from linearity (p<0.05) in only 26 (for splines) and 25 (for polynomials) of the 314 CpG’s where an association between ethnicity and methylation remained after adjusting for ancestry; however, the association between ethnicity and methylation remained even after adjusting for non-linearity at all sites (Supplementary file 1C,D).

Environmental differences between geographic locations or recruitment sites are a potential non-genetic explanation for ethnic differences in methylation. We investigated the independent effect of recruitment site on methylation by analyzing the associations between recruitment site and individual methylation loci after adjusting for ethnicity. We did not find any loci significantly associated with recruitment site at a significance threshold of 1.6×10−7. We then performed an analysis to assess the effect of recruitment sites on methylation stratified by ethnicity. We did not find any loci significantly associated with recruitment site and methylation among Mexican participants. We were underpowered to perform a similar analysis for Puerto Ricans because there were only 27 Puerto Rican participants recruited outside of Puerto Rico. To ensure that the absence of association in Mexicans was not due to the loss of power from the smaller sample size, we repeated our analysis of the association between ethnicity and ancestry randomly down-sampling to 276 participants to match the sample size in the analysis of geography in Mexicans. While down-sampling the study to this degree resulted in a loss of power, 128 methylation sites were still associated with ancestry. We conclude that recruitment site was unlikely to be a significant confounder of our associations between ethnicity and methylation and was not a significant independent predictor of methylation.

While most population substructure in Latinos would be expected to arise from differences in continental ancestry (Galanter, 2012; Bryc et al., 2010), there is evidence of finer scale (sub-continental) ancestry in Latino populations (Moreno-Estrada et al., 2014). We tested for the effect of fine scale substructure by calculating principal components for all participants with genotyping data using Eigensoft (Patterson et al., 2006). We found significant associations between principal components 3–10 (PC’s 1 and 2 were almost perfectly collinear with ancestry, with an adjusted R2 > 0.998 for all three ancestry proportions, and were therefore excluded) and ethnicity. We therefore added these 8 PC’s to models of ethnicity and methylation, and found an association between these genetic PC’s and methylation in 63/314 CpG’s that had remained associated with ethnicity after adjusting for ancestry. Adjusting for higher order substructure in these CpG’s explained the association between ethnicity and methylation in 51 additional loci. This left 263 loci associated with ethnicity after adjustment for ancestry where there was either no association between PC’s 3–10 and methylation or the inclusion of these PC’s did not affect the association between ethnicity and methylation. (Supplementary file 1E)

At these 314 loci, the median total variance accounted for by ethnicity, ancestry, and fine-scale substructure was 10.4% (IQR 6.6% to 16.1%), of which ethnicity explained a median of 1.7% (IQR 0.8% to 3.8%), ancestry explained a median of 2.9% (IQR 1.0 to 4.6%) and fine scale substructure explained a median of 3.4% (IQR 2.0% to 4.2%). Among the 263 CpG’s whose association with ethnicity could not be explained by fine-scale substructure, ethnicity explained a median of 1.9% (IQR 1.0% to 4.0%; max 26.7%), ancestry explained 2.8% (IQR 1.0% to 6.2%), and fine scale ancestry explained 3.2% (IQR 1.9% to 4.7%).

As only 16 participants self-identified as ‘Mixed Latino’, we performed a sensitivity analysis to test the effect of excluding these participants from the analysis and only examining Puerto Ricans, Mexicans, and ‘Other Latinos’. We found that excluding self-identified ‘Mixed Latino’ participants from the analysis did not significantly alter the results in most cases (Supplementary file 1F). Of the 916 CpG’s associated with ethnicity at a genome-wide scale (p<1.6×10–7) in models including individuals self-identified as ‘Mixed Ethnicity’, 894 (97.5%) were still significant at a genome-wide scale when ‘Mixed Latinos’ were excluded. All but two of the CpG’s that did not meet genome-wide significance were significant when correcting for 916 tests (p<5×10–5). In addition, an additional 290 CpG loci that did not meet genome-wide significance in the original analysis were significant at a genome-wide scale when self-identified ‘Mixed Latinos’ were excluded. While these loci did not meet genome-wide significance in the original analysis that included Mixed Latinos, they all had p-values lower than 2 × 10−6. Thus we conclude that a sensitivity test excluding individuals of mixed Latino ethnicity did not significantly alter the conclusions.

We conclude that shared genetic ancestry explains much but not all of the association between ethnicity and methylation. Other, non-genetic factors associated with ethnicity likely explain the ethnicity-associated methylation changes that cannot be accounted for by genomic ancestry alone.

Ethnic differences in environmentally-associated methylation sites

Methylation at CpG loci that had previously been reported to be associated with environmental exposures whose exposure prevalence differs between ethnic groups were tested for association with ethnicity in this study. A recent meta-analysis of maternal smoking during pregnancy, an exposure that varies significantly by ethnicity (Oh et al., 2012), identified associations with methylation at over 6000 CpG loci (Joubert et al., 2016). We found 1341 of 4404 that passed QC in our own study (30.4%) were nominally associated with ethnicity (p<0.05), which represented a highly significant (p<2×10−16) enrichment. Using a Bonferroni correction for the 4404 loci tested, 126 maternal-smoking related loci were associated with ethnicity (p<1.1×10−5), and 27 loci were among the 916 CpG’s reported above as associated with ethnicity (Supplementary file 1G). Of these, 14 were among the 314 CpG’s whose association with ethnicity could not be explained by ancestry and 12 were among the 263 CpG’s whose association with ethnicity could not be explained by ancestry or fine-scale substructure. We also examined methylation loci from an earlier study of maternal smoking in Norwegian newborns (Joubert et al., 2012) as well as studies of diesel exhaust particles (Jiang et al., 2014) and exposure to violence (Chen et al., 2013). These results are supportive of our hypothesis that environmental exposures may be responsible for the observed differences in methylation between ethnic groups and are presented in Supplementary file 1H.

In an earlier study of maternal smoking in Norwegian newborns (Joubert et al., 2012) that identified 26 loci associated with maternal smoking during pregnancy, 19 passed quality control (QC) in our own analysis, and the association between methylation and ethnicity was found to be nominally significant (p<0.05)at 6 (31.6%) CpG loci. Adjusting for 19 tests (p<0.0026), cg23067299 in the aryl hydrocarbon receptor repressor (AHRR) gene on chromosome five remained statistically significant (Supplementary file 1H). These results suggest that ethnic differences in methylation at loci known to be responsive to tobacco smoke exposure in utero may be explained in part by ethnic-specific differences in the prevalence of maternal smoking during pregnancy.

We also found that CpG loci previously reported to be associated with diesel-exhaust particle (DEP) exposure (Jiang et al., 2014) were significantly enriched among the set of loci whose methylation levels varied between ethnic groups. Specifically, of the 101 CpG sites that were significantly associated with exposure to DEP and passed QC in our dataset, 31 were nominally associated with ethnicity (p<0.05), and five were associated with ethnicity after adjusting for 101 comparisons (p<0.005). Finally, we found that methylation levels at cg11218385 in the pituitary adenylate cyclase-activating polypeptide type I receptor gene (ADCYAP1R1), which had been associated with exposure to violence in Puerto Ricans (Chen et al., 2013) and with heavy trauma exposure in adults (Ressler et al., 2011), was significantly associated with ethnicity (p=0.02).

We also found 194 loci with a significant association between global genetic ancestry and methylation levels (after adjusting for ethnicity) at a Bonferroni corrected association p-value of less than 1.6 × 10−7 (Figure 6 and Supplementary file 1I), including 48 that were associated with ethnicity in our earlier analysis. Of these significant associations, 55 were driven primarily by differences in African ancestry, 94 by differences in Native American ancestry, and 45 by differences in European ancestry. The most significant association between methylation and ancestry occurred at cg04922029 in the Duffy antigen receptor chemokine gene (DARC) on chromosome 1 (ANOVA p-value 3.1 × 10−24) (Figure 6B). This finding was driven by a strong association between methylation level and global African ancestry; each 25 percentage point increase in African ancestry was associated with an increase in M-value of 0.98, which corresponds to an almost doubling in the ratio of methylated to unmethylated DNA at the site (95% CI 0.72 to 1.06 per 25% increase in African ancestry, p=1.1×10−21). There was no significant heterogeneity in the association between genetic ancestry and methylation between Puerto Ricans and Mexicans (p-het = 0.5). Mexicans have a mean unadjusted methylation M-value 0.48 units lower than Puerto Ricans (95% CI 0.35 to 0.62 units, p=1.1×10−11). However, adjusting for African ancestry accounts for the differences in methylation level between the two sub-groups (p-adjusted = 0.4), demonstrating that ethnic differences in methylation at this site are due to differences in African ancestry.

Associations between genomic ancestry and individual methylation loci.

(A) Manhattan plot showing the associations between genomic ancestry and methylation at individual CpG loci. (B) Plot showing one such locus, cg04922029, and genomic African ancestry, showing a strong correlation between African ancestry and hypermethylation at that site.

https://doi.org/10.7554/eLife.20532.010

The distribution of methylation M-values at cg04922029 is tri-modal, raising the possibility that a SNP whose allele frequency differs between African and non-African populations may be driving the association. We therefore looked at the association between methylation at cg0422029 and ancestry at that locus. We found almost perfect correlation between methylation and African ancestry at the locus (p=6×10−162) (Figure 7A). Each African haplotype at the CpG site was associated with an increase in methylation M-value of 2.7, corresponding to a 6.5-fold increase in the ratio of methylated to unmethylated DNA per African haplotype at that locus. We then looked for SNPs within 10,000 base pairs of the CpG site that explained the admixture mapping association. We found that methylation at cg04922029 was significantly correlated with SNP rs2814778 (Figure 7B), the Duffy null mutation, 212 base pairs away; each copy of the C allele was associated with an increase in M-value of 1.5, or a 2.9-fold increase in the ratio of methylated to unmethylated DNA (p=3.8×10−90) (Figure 7C).

Association between local ancestry and methylation.

(A) Association between cg04922029 on the DARC locus and African ancestry, color coded by ethnic group. There is near perfect correlation between the two. (B) Association between SNPs located within 1 Mb of cg04922029 and methylation levels at that CpG. (C) Association between rs2814778 (Duffy null) genotype and methylation at cg04922029, color coded by the number of African alleles present. There is near perfect correlation between genotype, ancestry and methylation at the locus. (D) Allele frequency of rs2814778 by 1000 Genomes population. The C allele is nearly ubiquitous in African populations and nearly absent outside of African populations and their descendants.

https://doi.org/10.7554/eLife.20532.011

When we examined the effect of local ancestry at the other 194 CpG’s we find that a substantial proportion of the effect of global ancestry on local methylation levels is due to local ancestry acting in –cis. Among the 194 CpG sites associated with global ancestry, local ancestry at the CpG site explained a median of 10.4% (IQR 3.0% to 19.4%) of the variance in methylation, accounting for a median of 52.8% (IQR 20.3% to 84.9%) of the total variance explained jointly by local and global ancestry (Figure 8).

Relative proportion of variance in methylation explained by global and local ancestry across loci significantly associated with global ancestry.
https://doi.org/10.7554/eLife.20532.012

Discussion

In a diverse population of Latinos, we have shown that a substantial number of loci are differentially methylated between ethnic sub-groups. While genomic ancestry can explain the association between ethnicity and methylation at 66% of the 916 loci associated with ethnicity, factors other than shared ancestry that correlate with ethnicity, such as social, economic, cultural and environmental exposures account for the association between ethnicity and methylation at 34% (314/916) of loci.

We conclude that systematic environmental differences between ethnic subgroups likely play an important role in shaping the methylome for both individuals and populations. Loci previously associated with diverse environmental exposures such as in utero exposure to tobacco smoke (Joubert et al., 2012, 2016), as well as diesel exhaust particles (Jiang et al., 2014) and psychosocial stress (Chen et al., 2013) were enriched in our set of loci where methylation was associated with ethnicity. Twenty-seven of the loci associated with maternal smoking during pregnancy in a large consortium meta-analysis (Joubert et al., 2016) were differentially methylated between Latino sub-groups at a genome-wide significance threshold of 1.6 × 10−7. Interestingly, this included both loci whose association persisted after adjustment for ancestry and fine-scale population substructure and are thus presumed to be due to environmental differences between ethnic groups and loci in which the association between ethnicity and methylation could be fully explained by genetically defined ancestry.

There are a number of plausible reasons for overlap between CpG’s associated with ancestry and those associated with environmental exposure. It is possible that this represents a gene-environment interaction, and that individuals with certain genetic backgrounds are more susceptible to the effects of environmental exposures such as in utero tobacco smoke than those of other genetic backgrounds. It has been previously reported that Hispanic smokers with high Native American ancestry had reduced risk of methylation across 12 genes, suggesting an ancestry by smoking interaction (Leng et al., 2013). Because the majority of studies that comprised the consortium that identified differentially methylated regions enrolled participants of European descent, such interactions might not have been evident in their study.

It is also possible that environmental exposures correlate with ancestry and that participants with certain ancestral backgrounds may have been more exposed to in utero tobacco smoke than those of other backgrounds. Several studies have shown correlations between genetic ancestry and environmental exposures, including socioeconomic status (Florez et al., 2011), overweight and obesity (Ziv et al., 2006), and birth site and country of residence (González Burchard et al., 2005).

Though our analysis of global ancestry showed that a majority of the variance explained jointly by local and global ancestry can be traced to specific loci in the genome acting in –cis, a substantial proportion cannot. Some of the residual association between global ancestry and methylation may be due to genetic effects acting in –trans; however, the possibility that some of it may be due to environmental exposures correlating with global ancestry cannot be excluded. Thus, it is plausible that genomic ancestry is acting as a proxy for both genetic and environmental effects in our study. If this is the case, our study likely underestimates the degree to which environmental factors explain differential methylation between ethnic groups.

Finally, it is possible that our analysis identified DMRs that are independently modifiable by both genetic and environmental exposures. Thus, regions of the genome that are differentially methylated due to genetic polymorphisms may also be more susceptible to differential methylation due to environmental exposures.

Thus, inclusion of relevant social and environmental exposures in studies of methylation may help elucidate racial/ethnic disparities in disease prevalence, health outcomes and therapeutic response. However, in many cases, a detailed environmental exposure history is unknown, unmeasurable or poorly quantifiable, and race/ethnicity may be a useful, albeit imperfect proxy. However, if a comprehensive catalog of the effects of exposures can be compiled, it may be possible to use genome-wide methylation analysis as a biomarker of exposure long after the exposure has passed and can no longer be measured.

Our comprehensive analysis of high-density methyl- and genotyping from genomic DNA allowed us to investigate the genetic control of methylation in great detail and without the potential destabilizing effects of EBV transformation and culture in cell lines (Grafodatskaya et al., 2010). The strongest patterns of methylation are associated with cell composition in whole blood (Lam et al., 2012). However, the specific type of Latino ethnic-subgroups (Puerto Rican, Mexican, other, or mixed) is also associated with principal coordinates of genome-wide methylation.

Our approach has some potential limitations. It is possible that fine-scale population structure (sub-continental ancestry) within European, African, and Native American populations may contribute to ethnic differences in methylation, as we had previously reported in the case of lung function (Moreno-Estrada et al., 2014). However, despite the presence of additional substructure among the GALA II participants, PC’s 3–10 explained the association between ethnicity and ancestry at only 51 loci. PCs from chip-based genotypes will not capture all forms of genetic variation. Clusters of ethnicity specific rare variants of large effect or strong ethnicity-specific selective sweeps in the last 8–12 generations (Galanter et al., 2012) could also give rise to methylation differences, but these are inconsistent with existing rare variant and selection analyses (Hernandez et al., 2011; Tang et al., 2007). Our models of genetic ancestry assumed a linear effect of ancestry on methylation, whereas a nonlinear association or other model misspecification could have led to incomplete adjustment for genetic ancestry, and thus, led to a residual association between ethnicity and methylation. However, when we added second and third order polynomials or cubic splines to our models, we found evidence for a nonlinear association between ancestry and methylation at only 25 and 26 loci, respectively, and it did not affect the association between ethnicity and methylation. Although it is impossible to account for all types of non-linearity and non-additivity (such as gene by gene or gene by environment interaction), our analysis suggests that non-linear effects are unlikely to be significant. Since our study was geographically diverse, recruiting participants at five recruitment sites in the United States and Puerto Rico, it is possible that systematic differences associated with site of recruitment might have influenced observed methylation differences between ethnic groups. However, when we included recruitment site as a covariate, we found no significant effect on methylation independent of ethnicity.

The presence of a strong association between genetic ancestry and methylation raises the possibility that epigenetic studies can be confounded by population stratification, similar to genetic association studies, and that adjustment for either genetic ancestry or selected principal components is warranted. This possibility was first demonstrated in a previous analysis of the association between self-described race and methylation (Barfield et al., 2014). However, the study only evaluated two distinct racial groups (African Americans and Whites), while the present study demonstrates the possibility of population stratification in an admixed and heterogeneous population with participants from diverse Latino national origins. The tendency to consider Latinos as a homogenous or monolithic ethnic group makes any analysis of this population particularly challenging. Our finding of loci whose methylation patterns differed between Latino ethnic subgroups, even after adjusting for genetic ancestry, suggests that any analysis of these populations in disease-association studies without adjusting for ethnic heterogeneity is likely to result in spurious associations even after controlling for genomic ancestry. However, the methylation loci identified in this study, as well as studies of environmental exposures, could be particularly interesting loci for the study of biomedical outcomes, particularly those with disparate prevalence between racial/ethnic groups, such as asthma (Barr et al., 2016). If methylation loci associated with ethnicity or ancestry were shown to be associated with a biomedical outcome, it could help explain racial/ethnic disparities in disease.

In summary, this study provides a framework for understanding how genetic, social and environmental factors can contribute to systematic differences in methylation patterns between ethnic subgroups, even between presumably closely related populations such as Puerto Ricans and Mexicans. Methylation QTL’s whose allele frequency varies by ancestry lead to an association between local ancestry and methylation level. This, in turn, leads to systematic variation in methylation patterns by ancestry, which then contributes to ethnic differences in genome-wide patterns of methylation. However, although genetic ancestry has been used to adjust for confounding in genetic studies, and can account for much of the ethnic differences in methylation in this study, ethnic identity is associated with methylation beyond the effects of shared genetic ancestry. This is likely due to social and environmental effects captured by ethnicity. Indeed, we find that CpG sites known to be influenced by social and environmental exposures are also differentially methylated between ethnic subgroups. These findings called attention to a more complete understanding of the effect of social and environmental variables on methylation in the context of race and ethnicity to fully understanding this complex process.

Our findings have important implications for the independent and joint effects of race, ethnicity, and genetic ancestry in biomedical research and clinical practice, especially in studies conducted in diverse or admixed populations. Our conclusions may be generalizable to any population that is racially mixed such as those from South Africa, India, and Brazil, though we would encourage further study in diverse populations, and likely has implications for all studies of diverse populations. As the National Institutes of Health (NIH) embarks on a precision medicine initiative, this research underscores the importance of including diverse populations and studying factors capturing the influence of social, cultural, and environmental factors, in addition to genetic ones, upon disparities in disease and drug response.

Materials and methods

Participant recruitment

All research on human subjects was approved by the Institutional Review Board at the University of California and each of the recruitment sites (Kaiser Permanente Northern California, Children’s Hospital Oakland, Northwestern University, Children’s Memorial Hospital Chicago, Baylor College of Medicine on behalf of the Texas Children’s Hospital, VA Medical Center in Puerto Rico, the Albert Einstein College of Medicine on behalf of the Jacobi Medical Center in New York and the Western Review Board on behalf of the Centro de Neumologia Pediatrica), and all participants/parents provided age-appropriate written assent/consent. Latino children were enrolled as a part of the ongoing GALA II case-control study (Oh et al., 2012).

A total of 4702 children (2374 participants with asthma and 2328 healthy controls) were recruited from five centers (Chicago, Bronx, Houston, San Francisco Bay Area, and Puerto Rico) using a combination of community- and clinic-based recruitment. Participants were eligible if they were 8–21 years of age and self-identified as a specific Latino ethnicity and had four Latino grandparents. Asthma cases were defined as participants with a history of physician diagnosed asthma and the presence of two or more symptoms of coughing, wheezing, or shortness of breath in the two years preceding enrollment. Participants were excluded if they reported any of the following: (1) 10 or more pack-years of smoking; (2) any smoking within 1 year of recruitment date; (3) history of lung diseases other than asthma (cases) or chronic illness (cases and controls); or (4) pregnancy in the third trimester. Further details of recruitment are described elsewhere (Oh et al., 2012). Latino sub-ethnicity was determined by self-identification and the ethnicity of the their four grandparents. Due to small numbers, ethnicities other than Puerto Rican and Mexican were collapsed into a single category, ‘other Latino’. Participants whose four grandparents were of discordant ethnicity were considered to be of ‘mixed Latino’ ethnicity.

Trained interviewers, proficient in both English and Spanish, administered questionnaires to gather baseline demographic data, as well as information on general health, asthma status, acculturation, social, and environmental exposures.

Methylation

Genomic DNA (gDNA) was extracted from whole blood using Wizard Genomic DNA Purification Kits (Promega, Fitchburg, WI). A subset of 573 participants (311 cases with asthma and 262 healthy controls) was selected for methylation. Methylation was measured using the Infinium HumanMethylation450 BeadChip (Illumina, Inc., San Diego, CA) following the manufacturer’s instructions.

1 µg of gDNA was bisulfite-converted using the Zymo EZ DNA Methylation Kit (Zymo research, Irvine, CA) according to the manufacturer’s instructions. Bisulfite converted DNA was isothermally amplified overnight, enzymatically fragmented, precipitated, and re-suspended in hybridization buffer. The fragmented, re-suspended DNA samples were dispensed onto Infinitum HumanMethylation450 BeadChips and incubated overnight in an Illumina hybridization oven. Following hybridization, free DNA was washed away, and the BeadChips were extended through single nucleotide extensions with fluorescent labels. The BeadChips were imaged using an Illumina iScan system, and processed using the Illumina GenomeStudio Software.

Failed probes were identified using detection p-values using Illumina’s recommendations. Probes on sex chromosomes and those known to contain genetic polymorphisms in the probe sequence were also excluded, leaving 321,503 probes for analysis. Raw data were normalized using Illumina’s control probe scaling procedure. Beta values of methylation (ranging from 0 to 1) were converted to M-values via a logit transformation (Du et al., 2010).

Genotyping

Details of genotyping and quality control procedures for single nucleotide polymorphisms (SNPs) and individuals have been described elsewhere (Galanter et al., 2014). Briefly, participants were genotyped at 818,154 SNPs on the Axiom Genome-Wide LAT 1, World Array 4 (Affymetrix, Santa Clara, CA) (Hoffmann et al., 2011). We removed SNPs with >5% missing data and failing platform-specific SNP quality criteria (n = 63,328), along with those out of Hardy-Weinberg equilibrium (n = 1845; p<10–6) within their respective populations (Puerto Rican, Mexican, and other Latino), as well as non-autosomal SNPs. Subjects were filtered based on 95% call rates and sex discrepancies, identity by descent and standard Affymetrix Axiom metrics. The total number of participants passing QC was 3804 (1902 asthmatic cases, 1902 healthy controls), and the total number of SNPs passing QC was 747,129. The number of participants with both methylation and genotyping data was 524.

Ancestry and PCA analysis

GALA II participants were combined with ancestral data from 1000 Genomes European (CEU) and African (YRI) populations and 71 Native American (NAM) samples genotyped on the Axiom Genome-Wide LAT one array. A final sample of 568,037 autosomal SNPs with relevant ancestral data was used to estimate local and global ancestry. Global ancestry was estimated using the program ADMIXTURE (Alexander et al., 2009), with a three population model. Local ancestry at all positions across the genome was estimated using the program LAMP-LD (Baran et al., 2012), assuming three ancestral populations.

Principal components for the genetic data were determined using the program EIGENSTRAT (Patterson et al., 2006).

Statistical analysis

Using a variance in methylation m-value of 0.2 units, which corresponded to approximately the 90th percentile of the variance in m-value in our pilot data, we determined that in order to have an 80% power to detect a difference in mean methylation between the two major ethnic groups of 0.25 units, using a Bonferroni significance threshold of 1.6 × 10−7 a sample, a sample size of 251 participants in each group was required. That total sample size of 502 participants gave us 80% power to detect correlations between ancestry and methylation of medium (Pearson r > 0.25) effect, meaning that we had 80% power to detect loci where ancestry accounted for at least 6.25% of the variance in methylation.

Unless otherwise noted, all regression models were adjusted for case status, age, sex, estimated cell counts, and plate and position. To account for possible heterogeneity in the cell type makeup of whole blood we inferred white cell counts using the method by Houseman et al (Houseman et al., 2012). Indicator variables were used to code categorical variables with more than two categories, such as ethnicity. In these cases, a nested analysis of variance (ANOVA) was used to compare models with and without the variables to obtain an omnibus p-value for the association between the categorical variable and the outcome. For analyses of dependent beta-distributed variables (such as African, European, and Native American ancestries), or cell proportion, k-1 variables were included in the analysis, and a nested analysis of variance (ANOVA) was used to compare models with and without the variables to obtain an k-1 degree of freedom omnibus p-value for the association between predictor (such as ancestry) and the outcome variable.

The Bonferroni method was used to adjust for multiple comparisons. For methylome-wide associations, the significance threshold was adjusted for 321,503 probes, resulting in a Bonferroni threshold of 1.6 × 10−7. Analyses were performed using R version 3.2.1 (The R Foundation for Statistical Computing)(R Core Team) and the Bioconductor package version 2.13.

Multidimensional scaling of the logit transformed methylation data (M-values) was performed by first calculating the Euclidian distance matrix between each pair of individuals and then calculating the first 10 principal coordinates of the data (Figure 2A). We performed both a simple correlation analysis of these principal coordinates to demographic factors (age, sex, ethnicity), estimated cell counts and technical factors (batch, plate, and position) to identify factors that correlated with global methylation patterns [see Figure 2B). In addition, we performed a multiple regression analysis of methylation principal coordinates by ethnicity and ancestry, adjusting for case status, age, sex, estimated cell counts, and plate and position (Supplementary file 1A).

We also sought to establish the extent to which global differences in methylation between Puerto Ricans and Mexicans could be explained by differences in ancestry between the two groups. We estimated the proportion of the ethnicity association that was mediated by genomic ancestry using the R package ‘mediation’ (Tingley et al., 2014) for methylation principal coordinates, which demonstrated a significant association with ethnicity.

We also sought to correlate ethnicity and methylation at a locus-specific level. We thus performed a linear regression between methylation at each CpG site and self-reported ethnicity (Mexican, Puerto Rican, Mixed Latino, and Other Latino), followed by a three degree of freedom analysis of variance to determine the overall effect of ethnicity on methylation We repeated the analysis excluding the 16 participants that were self-described as ‘Mixed Latino’, and tested for non-linearity in two ways: by adding second and third order polynomials to the model, and by adding a 3-degree of freedom cubic spline and comparing models with the non-linear terms to those without using a nested ANOVA. At loci where there was evidence for non-linearity, we tested whether ethnicity remained associated with methylation after adjusting for ancestry as well as the deviations from linearity. Finally, we tested for the presence of population sub-structure beyond that conveyed through ancestry by adding the genetic principal components 3–10 (PCs 1 and 2 were co-linear with ancestry with a coefficient of determination R2 > 0.998) and comparing models with those PCs to those without. At loci where there was evidence for association between PC’s 3–10 and methylation, we tested whether ethnicity remained associated with methylation after adjusting for ancestry as well as the PC’s 3–10.

We calculated the proportion of variance in methylation explained by ethnicity and genomic ancestry at each site where ethnicity was significantly associated with methylation. To do this, we fit a model that included both ethnicity and global ancestry as well as the confounders described above and calculated the proportion of variance explained by multiplying the ratio of the variance between predictors (ethnicity and genomic ancestry) and outcome (methylation) by the square of the effect magnitude (ß).

We also examined whether differences in methylation patterns by ethnicity could be associated with known loci that had previously been reported to vary based on common environmental exposures, including maternal smoking during pregnancy (Joubert et al., 2012), diesel exhaust particles (DEP) (Jiang et al., 2014), and exposure to violence (Chen et al., 2013). We have previously shown that exposure to these common environmental exposures or similar exposures varied by ethnicity within our own GALA II study populations (Oh et al., 2012; Nishimura et al., 2013; Thakur et al., 2013).

In addition, we examined the association between global ancestry and methylation across all CpG loci using a two-degree of freedom likelihood ratio test as well as by examining the association between individual ancestral components (African, European, and Native American) and methylation at each CpG site. At each site where methylation was significantly associated with genomic ancestry proportions, we determined the relative effect of global ancestry (θ, theta) and local ancestry (γ, gamma) in a joint model by calculating the proportion of variance explained as above.

To determine whether ancestry associations with methylation were due to variation in local ancestry, we correlated local ancestry at each CpG site with methylation at the site. Because ancestry LD is much stronger than genotypic LD, it is possible to accurately interpolate ancestry at each CpG site based on the ancestry estimated at the nearest SNPs (Galanter et al., 2014; Rosenberg et al., 2010). Measures of locus-specific ancestry were correlated with local methylation using linear regression. We performed a two-degree of freedom analysis of variance test evaluating the overall effect of all three ancestries as well as single-ancestry associations comparing methylation at a given locus with the number of African, European and Native American chromosomes at that CpG site.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Trans-ancestry genome-wide association study identifies 12 genetic loci influencing blood pressure and implicates a role for DNA methylation
    1. N Kato
    2. M Loh
    3. F Takeuchi
    4. N Verweij
    5. X Wang
    6. W Zhang
    7. TN Kelly
    8. D Saleheen
    9. B Lehne
    10. I Mateo Leach
    11. AW Drong
    12. J Abbott
    13. S Wahl
    14. ST Tan
    15. WR Scott
    16. G Campanella
    17. M Chadeau-Hyam
    18. U Afzal
    19. TS Ahluwalia
    20. MJ Bonder
    21. P Chen
    22. A Dehghan
    23. TL Edwards
    24. T Esko
    25. MJ Go
    26. SE Harris
    27. J Hartiala
    28. S Kasela
    29. A Kasturiratne
    30. CC Khor
    31. ME Kleber
    32. H Li
    33. ZY Mok
    34. M Nakatochi
    35. NS Sapari
    36. R Saxena
    37. AF Stewart
    38. L Stolk
    39. Y Tabara
    40. AL Teh
    41. Y Wu
    42. JY Wu
    43. Y Zhang
    44. I Aits
    45. A Da Silva Couto Alves
    46. S Das
    47. R Dorajoo
    48. JC Hopewell
    49. YK Kim
    50. RW Koivula
    51. J Luan
    52. LP Lyytikäinen
    53. QN Nguyen
    54. MA Pereira
    55. I Postmus
    56. OT Raitakari
    57. MS Bryan
    58. RA Scott
    59. R Sorice
    60. V Tragante
    61. M Traglia
    62. J White
    63. K Yamamoto
    64. Y Zhang
    65. LS Adair
    66. A Ahmed
    67. K Akiyama
    68. R Asif
    69. T Aung
    70. I Barroso
    71. A Bjonnes
    72. TR Braun
    73. H Cai
    74. LC Chang
    75. CH Chen
    76. CY Cheng
    77. YS Chong
    78. R Collins
    79. R Courtney
    80. G Davies
    81. G Delgado
    82. LD Do
    83. PA Doevendans
    84. RT Gansevoort
    85. YT Gao
    86. TB Grammer
    87. N Grarup
    88. J Grewal
    89. D Gu
    90. GS Wander
    91. AL Hartikainen
    92. SL Hazen
    93. J He
    94. CK Heng
    95. JE Hixson
    96. A Hofman
    97. C Hsu
    98. W Huang
    99. LL Husemoen
    100. JY Hwang
    101. S Ichihara
    102. M Igase
    103. M Isono
    104. JM Justesen
    105. T Katsuya
    106. MG Kibriya
    107. YJ Kim
    108. M Kishimoto
    109. WP Koh
    110. K Kohara
    111. M Kumari
    112. K Kwek
    113. NR Lee
    114. J Lee
    115. J Liao
    116. W Lieb
    117. DC Liewald
    118. T Matsubara
    119. Y Matsushita
    120. T Meitinger
    121. E Mihailov
    122. L Milani
    123. R Mills
    124. N Mononen
    125. M Müller-Nurasyid
    126. T Nabika
    127. E Nakashima
    128. HK Ng
    129. K Nikus
    130. T Nutile
    131. T Ohkubo
    132. K Ohnaka
    133. S Parish
    134. L Paternoster
    135. H Peng
    136. A Peters
    137. ST Pham
    138. MJ Pinidiyapathirage
    139. M Rahman
    140. H Rakugi
    141. O Rolandsson
    142. MA Rozario
    143. D Ruggiero
    144. CF Sala
    145. R Sarju
    146. K Shimokawa
    147. H Snieder
    148. T Sparsø
    149. W Spiering
    150. JM Starr
    151. DJ Stott
    152. DO Stram
    153. T Sugiyama
    154. S Szymczak
    155. WH Tang
    156. L Tong
    157. S Trompet
    158. V Turjanmaa
    159. H Ueshima
    160. AG Uitterlinden
    161. S Umemura
    162. M Vaarasmaki
    163. RM van Dam
    164. WH van Gilst
    165. DJ van Veldhuisen
    166. JS Viikari
    167. M Waldenberger
    168. Y Wang
    169. A Wang
    170. R Wilson
    171. TY Wong
    172. YB Xiang
    173. S Yamaguchi
    174. X Ye
    175. RD Young
    176. TL Young
    177. JM Yuan
    178. X Zhou
    179. FW Asselbergs
    180. M Ciullo
    181. R Clarke
    182. P Deloukas
    183. A Franke
    184. PW Franks
    185. S Franks
    186. Y Friedlander
    187. MD Gross
    188. Z Guo
    189. T Hansen
    190. MR Jarvelin
    191. T Jørgensen
    192. JW Jukema
    193. M Kähönen
    194. H Kajio
    195. M Kivimaki
    196. JY Lee
    197. T Lehtimäki
    198. A Linneberg
    199. T Miki
    200. O Pedersen
    201. NJ Samani
    202. TI Sørensen
    203. R Takayanagi
    204. D Toniolo
    205. H Ahsan
    206. H Allayee
    207. YT Chen
    208. J Danesh
    209. IJ Deary
    210. OH Franco
    211. L Franke
    212. BT Heijman
    213. JD Holbrook
    214. A Isaacs
    215. BJ Kim
    216. X Lin
    217. J Liu
    218. W März
    219. A Metspalu
    220. KL Mohlke
    221. DK Sanghera
    222. XO Shu
    223. JB van Meurs
    224. E Vithana
    225. AR Wickremasinghe
    226. C Wijmenga
    227. BH Wolffenbuttel
    228. M Yokota
    229. W Zheng
    230. D Zhu
    231. P Vineis
    232. SA Kyrtopoulos
    233. JC Kleinjans
    234. MI McCarthy
    235. R Soong
    236. C Gieger
    237. J Scott
    238. YY Teo
    239. J He
    240. P Elliott
    241. ES Tai
    242. P van der Harst
    243. JS Kooner
    244. JC Chambers
    245. BIOS-consortium
    246. CARDIo GRAMplusCD
    247. LifeLines Cohort Study
    248. InterAct Consortium
    (2015)
    Nature Genetics 47:1282–1293.
    https://doi.org/10.1038/ng.3405
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
    Categorization of humans in biomedical research: genes, race and disease
    1. N Risch
    2. E Burchard
    3. E Ziv
    4. H Tang
    (2002)
    Genome Biology 3:comment2007.
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
    mediation: R package for causal mediation analysis
    1. D Tingley
    2. T Yamamoto
    3. K Hirose
    4. L Keele
    5. K Imai
    (2014)
    Journal of Statistical Software, 59, 10.18637/jss.v059.i05.
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60

Decision letter

  1. Magnus Nordborg
    Reviewing Editor; Austrian Academy of Sciences, Austria

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures" for consideration by eLife. Your article has been reviewed by Frank Johannes, Mark Shriver, and Marcus Nordborg (who is a member of our Board of Reviewing Editors) and the evaluation has been overseen by Mark McCarthy as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

A very clear paper that convincingly argues that self-described ethnicity explains methylation variation between individuals even after genetic ancestry has been taken into account, presumably because shared ethnicity implies shared environment. Although there is no demonstration that the findings have clinical importance, our consensus opinion is that the work is fundamentally interesting.

Essential revisions:

1) Principal component analysis of global patterns of methylation showed that when genetic ancestry was adjusted for, self-identified ethnicity remain significant for PC 6, suggesting that this measured contains other, non-genetic, factors. In an effort to identify specific DMRs that are associated with self-identified ethnicity you performed a EWA scan. This scan initially uncovered 916 genome-wide significant CpG sites. However, repeating this analysis after adjusting for genetic ancestry yielded only 314 significant associations. Hence, most of the initial 916 associations were due to genetic effects. Variance component of the remaining 314 sites showed that "genomic ancestry explained a median of 4.2% (IQR 1.8% to 8.3%) of the variance in methylation at these loci and accounts for a median of 75% (IQR 45.8% to 92%) of the variance in methylation explained jointly by ethnicity and ancestry". In the Abstract of the manuscript, you only report that "[…] shared genomic ancestry accounted for a median of 75.7% (IQR 45.8% to 92%) of the variance in methylation associated with ethnicity." Hence, we find the use of these results slightly misleading. Translating the 75% into effect sizes means that genetic ancestry and ethnicity jointly explain a median of 5.6% of the variance in DNA methylation at the 314 identified CpG sites, and that ethnicity explains merely 1.4% (median) of the variation. These effect sizes seem very modest, particularly the effects of ethnicity. It is also not clear to which extent these associations are independent. You should show the extent to which methylation scores at the 314 CpG sites are correlated.

2) You used Eigensoft to detect fine-scale sub-continental genetic ancestry. Eight of the Eigensoft PCs were used in a re-analysis of the 314 CpGs. Interestingly, an additional 51 loci seem to be explained by fine-scale genetic diversity; thus leaving 263 loci where ethnicity had significant marginal effects. You do not report the relative contribution of ethnicity from this re-analysis (i.e. are we still talking about 1.4% (median) variance explained, or is it now less?). This should probably be done.

3) Starting in subsection “Ethnic differences in environmentally-associated methylation sites”, you search for overlaps between the 916 DMRs from their initial EWA scan and CpGs sites previously shown to be associated with specific environmental variables. The motivation for this analysis seems to follow from the fact that (at least some of) these CpG sites are associated with ethnicity and are likely mediate uncharacterized environmental exposures. You do indeed find a significant enrichment with CpG sites shown to be correlated with maternal-smoking, diesel-exhaust particle exposure and heavy trauma exposure in adults. You conclude that "these results are supportive of our hypothesis that environmental exposures may be responsible for the observed differences in methylation between ethnic groups and are presented in Table S8". We don't understand why you go back to the 916 CpG sites for this analysis, considering that you already established that 916-263 = 653 of these are explained by genetic ancestry? Are you suggesting that there is genotype x environmental correlation? Furthermore, which of the 916 show enrichment with the environmental-associated CpG sites from previous studies? Are these mostly the 263 CpGs for which we have evidence that they are not fully explained by genetic ancestry? Or are we talking about many CpG sites that are strongly affected by genetic ancestry? The later would indeed imply that environmental associations of previous studies are mediated by genetic effects. If this is the case, the conclusions of this manuscript (and probably those of previous studies) would substantial change. This consideration should be thoroughly investigated and discussed.

4) You state in the Introduction that it is desirable to replace self-identified ethnicity (a social-construct) which biological constructs in biomedical research, because the latter are potentially better (and less biased) predictors of disease outcomes. Self-identified ethnicity subsumes genetic ancestry as well complex environmental variables such as social-economic status, diet, exposures to toxins, life-style choices, etc. Your earlier work already showed that genetic ancestry (inferred to genotype data) can be a better predictor of biomedical outcomes than self-identified ethnicity. The reason is that these self-reports poorly tag true genetic ancestry. Similarly, one can expect that these self-reports do not adequately tag specific environmental factors. If such environmental factors impact biomedical outcomes via their effects on DNA methylation it would be sensible to try to use DMRs directly as predictors of biomedical outcomes, in addition to genetic ancestry, even if these DMRs are not associated with self-identified ethnicity. Conversely, many of the 916 CpGs the authors identified in their EWAs analysis should be ideally be assessed for their impact on biomedical outcomes. It may well be that their effects are mostly neutral, or that their effect sizes are too small. This also applies to DMRs at CpGs that have been shown to be associated with maternal smoking. If this cannot be done, it should at least be discussed.

https://doi.org/10.7554/eLife.20532.018

Author response

Essential revisions:

1) Principal component analysis of global patterns of methylation showed that when genetic ancestry was adjusted for, self-identified ethnicity remain significant for PC 6, suggesting that this measured contains other, non-genetic, factors. In an effort to identify specific DMRs that are associated with self-identified ethnicity you performed a EWA scan. This scan initially uncovered 916 genome-wide significant CpG sites. However, repeating this analysis after adjusting for genetic ancestry yielded only 314 significant associations. Hence, most of the initial 916 associations were due to genetic effects. Variance component of the remaining 314 sites showed that "genomic ancestry explained a median of 4.2% (IQR 1.8% to 8.3%) of the variance in methylation at these loci and accounts for a median of 75% (IQR 45.8% to 92%) of the variance in methylation explained jointly by ethnicity and ancestry". In the Abstract of the manuscript, you only report that "[…] shared genomic ancestry accounted for a median of 75.7% (IQR 45.8% to 92%) of the variance in methylation associated with ethnicity." Hence, we find the use of these results slightly misleading. Translating the 75% into effect sizes means that genetic ancestry and ethnicity jointly explain a median of 5.6% of the variance in DNA methylation at the 314 identified CpG sites, and that ethnicity explains merely 1.4% (median) of the variation. These effect sizes seem very modest, particularly the effects of ethnicity. It is also not clear to which extent these associations are independent. You should show the extent to which methylation scores at the 314 CpG sites are correlated.

Ancestry and ethnicity jointly explain a median of 6.8% of the variance in methylation (IQR 4.5% to 10.0%); the discrepancy from the reviewers’ calculation is due to the fact that the median of the sum is different than the sum of medians. The observed effect sizes appear modest in many cases because the reported numbers include all DMRs that were statistically associated with ethnicity, regardless of effect size. We note that in one CpG (cg09668627), ethnicity and ancestry jointly explained 38.5% of the variance in methylation and there were 17 CpG’s where ethnicity and ancestry jointly explain more 25% of the variance.

Moreover, because there is imperfect correlation between ethnicity and ancestry on the one hand, and the causative meQTL’s and presumed environmental factors on the other, the observed effects of ancestry and ethnicity on methylation are reduced. Thus, we find larger effect sizes in our analysis of the variance explained jointly by global and local ancestry. We find that among the 194 CpG sites that were associated with global genetic ancestry, local and global ancestry jointly explain a median of 21.4% of the variance (IQR 16.4% to 27.3%). Over 25% of the variance is explained in 65 of the 194 sites. The proportion of variance in methylation explained was as high as 72.2% at cg04922029 in the DARC gene (as one would expect, almost all of the variance is explained by local ancestry). However, when we examine the variance explained by ethnicity and global ancestry at cg04922029, we are able to explain a smaller proportion of the variance (30.7%, almost all of it captured by global ancestry). We hypothesize that a similar effect would be seen with environmental measures since ethnicity is an imperfect proxy for environmental exposure.

We would like to clarify that the variance component analysis was reported on all 916 CpG’s, not just the 314 that remained significant after adjusting for ancestry. We apologize for any confusion that arose from writing “at these loci”. At the 314 loci that remained associated with ethnicity, the median total variance explained jointly by ethnicity and ancestry was 6.2% (IQR 4.4% to 8.8%); ethnicity accounted for a median of 3.5% of the variance in methylation (IQR 2.2% to 5.1%) while ancestry accounted for a median of 1.8% (IQR 0.8% to 4.0%) and explained 32% (IQR 16.8% to 56.2%) of the total variance accounted for jointly by ethnicity and ancestry. Among the 520 CpG’s that were no longer associated with ethnicity when adjusted for ancestry, the median total variance explained jointly by ethnicity and ancestry was 7.8% (IQR 5.3% to 11.1%); ethnicity explains less than 1% of the variance in methylation, while ancestry explains 6.6% of the variance (IQR 4.0% to 10.2%), corresponding to a median of 88.0% of the variance jointly explained (IQR 75.6% to 95.4%). We added a table (Table 2) describing these findings and made reference to them in the Results section.

Generally, there was a moderate amount of correlation between the 314 methylation sites associated with ethnicity after correcting for ancestry. Among the 49,141 pairs of CpGs the median R2 was 0.044 (IQR 0.010 to 0.125). This is more correlation than was seen between 100 random methylation sites where median R2 was 0.012 (IQR 0.003 to 0.035), though we would expect greater correlation due to the common effect of ethnicity at the sites. We have made reference to this correlation in the Results section.

2) You used Eigensoft to detect fine-scale sub-continental genetic ancestry. Eight of the Eigensoft PCs were used in a re-analysis of the 314 CpGs. Interestingly, an additional 51 loci seem to be explained by fine-scale genetic diversity; thus leaving 263 loci where ethnicity had significant marginal effects. You do not report the relative contribution of ethnicity from this re-analysis (i.e. are we still talking about 1.4% (median) variance explained, or is it now less?). This should probably be done.

As suggested by the reviewers, we performed a similar analysis among the 314 CpG’s that we examined for higher order associations. Overall, among the 314 CpG’s, that remained associated with methylation after adjustment for ethnicity, the median total variance accounted for by ethnicity, ancestry, and fine-scale substructure was 10.4% (IQR 6.6% to 16.1%), of which ethnicity explained a median of 1.7% (IQR 0.8% to 3.8%), ancestry explained a median of 2.9% (IQR 1.0 to 4.6%) and fine scale substructure explained a median of 3.4% (IQR 2.0% to 4.2%). Among the CpG’s whose ethnicity association was not explained by fine-scale substructure, ethnicity explained a median of 1.9% (IQR 1.0% to 4.0%) and as high as 26.7%, ancestry explained 2.8% (IQR 1.0% to 6.2%), and fine scale ancestry explained 3.2% (IQR 1.9% to 4.7%). Note, however, that we would expect measures of the proportion of variance explained by fine scale population structure to be somewhat inflated due to the inclusion of 8 additional terms for each of the PC’s. This text was added to the Results section.

3) Starting in subsection “Ethnic differences in environmentally-associated methylation sites”, you search for overlaps between the 916 DMRs from their initial EWA scan and CpGs sites previously shown to be associated with specific environmental variables. The motivation for this analysis seems to follow from the fact that (at least some of) these CpG sites are associated with ethnicity and are likely mediate uncharacterized environmental exposures. You do indeed find a significant enrichment with CpG sites shown to be correlated with maternal-smoking, diesel-exhaust particle exposure and heavy trauma exposure in adults. You conclude that "these results are supportive of our hypothesis that environmental exposures may be responsible for the observed differences in methylation between ethnic groups and are presented in Table S8". We don't understand why you go back to the 916 CpG sites for this analysis, considering that you already established that 916-263 = 653 of these are explained by genetic ancestry? Are you suggesting that there is genotype x environmental correlation? Furthermore, which of the 916 show enrichment with the environmental-associated CpG sites from previous studies? Are these mostly the 263 CpGs for which we have evidence that they are not fully explained by genetic ancestry? Or are we talking about many CpG sites that are strongly affected by genetic ancestry? The later would indeed imply that environmental associations of previous studies are mediated by genetic effects. If this is the case, the conclusions of this manuscript (and probably those of previous studies) would substantial change. This consideration should be thoroughly investigated and discussed.

Our procedure for looking for enrichment among ethnicity associated loci identified in the Joubert paper is a bit different than what was understood by the reviewers. We first looked to see if there was an enrichment in nominal associations with ethnicity (p <.05) among all the CpG’s associated with in utero tobacco smoke that passed QC in our study (n = 4404). We found that 1341 (30.4% of them) were at least nominally associated with ethnicity, which was highly enriched (p = 2 x 10-16). We also found that 126 of those CpG’s were associated with ethnicity at a Bonferroni corrected p-value of 1.1x10-5 (corrected for 4404 comparisons). It was only then that we examined whether any were among the CpG’s associated with ethnicity at a genome-wide Bonferroni correction, and found that 27 of them were associated with ethnicity. Since we did not perform the ancestry adjustment of ethnicity genome-wide (we only performed it for the 916 loci that were significantly associated with ethnicity), we felt it would be most appropriate to look whether the in utero tobacco associated DMRs were within this larger group.

In light of the reviewers’ question, we examined the 314 DMRs that were associated with ethnicity even after adjusting for ancestry and found that 14 CpG’s were associated with in utero tobacco smoke in the Joubert et al. paper. We also examined the 263 CpG’s whose association with ethnicity could not be explained by ancestry or fine-scale population structure and found that 12 of them were associated with in utero tobacco smoke in the Joubert et al. paper. Neither of these results are significantly disproportionate to the proportion of CpG’s associated with (unadjusted) ethnicity.

There are a number of plausible reasons for overlap between CpG’s associated with ancestry and those associated with in utero tobacco smoke (or other environmental exposures). As the reviewers suggest, it is possible that this represents a gene-by-environment interaction, and that individuals with certain genetic backgrounds are more susceptible to the effects of in utero tobacco smoke than those of other genetic backgrounds. Leng et al. (AJRCCM, 2013) have showed that Hispanic smokers with high Native American ancestry had reduced risk of methylation across 12 genes, suggesting an ancestry by smoking interaction. Because the majority of studies in the consortium in the Joubert study enrolled participants of European descent, such interactions might not have been evident in their study.

It is also possible that environmental exposures correlate with ancestry and that participants with certain ancestral backgrounds may have been more exposed to in utero tobacco smoke than those of other backgrounds. Several studies have shown correlations between genetic ancestry and environmental exposures, including socioeconomic status (Florez et al., 2011), overweight and obesity (Ziv et al., 2006), and birth site and country of residence (Burchard et al., 2005). Though our analysis of global ancestry showed that a majority of the variance explained jointly by local and global ancestry can be traced to specific loci in the genome acting in -cis, a substantial proportion cannot. Although some of the residual association between global ancestry and methylation may be do to genetic effects acting in -trans, we cannot exclude the possibility that some of it may be due to environmental exposures correlating with global ancestry. Thus, it is plausible that genomic ancestry is acting as a proxy for both genetic and environmental effects in our study.

Finally, it is possible that our analysis identified DMRs that are independently modifiable by both genetic and environmental exposures. Thus, regions of the genome that are differentially methylated due to genetic polymorphisms may also be more susceptible to differential methylation due to environmental exposures.

At the reviewers’ suggestion, we reported the additional analysis in subsection “Ethnic differences in environmentally-associated methylation sites” and discussed the findings and their implications (Discussion section).

4) You state in the Introduction that it is desirable to replace self-identified ethnicity (a social-construct) which biological constructs in biomedical research, because the latter are potentially better (and less biased) predictors of disease outcomes. Self-identified ethnicity subsumes genetic ancestry as well complex environmental variables such as social-economic status, diet, exposures to toxins, life-style choices, etc. Your earlier work already showed that genetic ancestry (inferred to genotype data) can be a better predictor of biomedical outcomes than self-identified ethnicity. The reason is that these self-reports poorly tag true genetic ancestry. Similarly, one can expect that these self-reports do not adequately tag specific environmental factors. If such environmental factors impact biomedical outcomes via their effects on DNA methylation it would be sensible to try to use DMRs directly as predictors of biomedical outcomes, in addition to genetic ancestry, even if these DMRs are not associated with self-identified ethnicity. Conversely, many of the 916 CpGs the authors identified in their EWAs analysis should be ideally be assessed for their impact on biomedical outcomes. It may well be that their effects are mostly neutral, or that their effect sizes are too small. This also applies to DMRs at CpGs that have been shown to be associated with maternal smoking. If this cannot be done, it should at least be discussed.

We agree with the reviewers that one of the great promises for studies of methylation is their ability to tag environmental factors that may play a role in disease. Even in cases where the effect of an environmental exposure on disease is not directly mediated via methylation, the patterns of methylation that result from the exposure could be used as a lasting biomarker of exposure long after the exposure has passed. For example, in the study by Joubert et al. cited in our paper, measures of in utero exposure to tobacco persisted into childhood. Thus, as the reviewers point out, it would be sensible to perform an analysis examining associations between DMRs, particularly those known to be associated with ethnicity, ancestry and environmental exposures, and relevant biomedical outcomes (Discussion section).

Although it was beyond the scope of this paper to look at the association between DMRs and biomedical outcomes, we have performed a preliminary epigenome-wide analysis of asthma in this Latino population. Within that analysis, we find no evidence of enrichment in CpG’s associated with asthma among the 916 CpG’s associated with ethnicity in the current manuscript (p = 0.06 for enrichment, minimum p = 0.0009 for cg23702046, Bonferroni adjusted minimum p = 0.8). In the overall EWAS of asthma, we did find one CpG that was significantly associated with asthma, cg02458554 in chromosome 18 (p = 2.9 x 10-7). This CpG was not significantly associated with ethnicity (p = 0.4), but intriguingly, it was associated with genetic ancestry (p = 4.5 x 10-4). There did appear to be marginal enrichment in CpG’s associated with asthma among the results reported by Joubert et al. (p = 0.01), however, no individual CpG was associated with asthma when corrected for the number of comparisons (min p = 3.3 x 10-5, adjusted min-p = 0.1). These findings are awaiting replication and are outside the scope of this manuscript, but as recommended by the reviewers, we have expanded our Discussion along the lines noted.

https://doi.org/10.7554/eLife.20532.019

Article and author information

Author details

  1. Joshua M Galanter

    1. Department of Medicine, University of California, San Francisco, United States
    2. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
    3. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
    Present address
    1. Genentech, South San Francisco, United States
    Contribution
    JMG, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    1. galanter@protonmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-2561-6384
  2. Christopher R Gignoux

    1. Department of Genetics, Stanford University, Stanford, United States
    Contribution
    CRG, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-9728-6567
  3. Sam S Oh

    1. Department of Medicine, University of California, San Francisco, United States
    2. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
    3. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
    Contribution
    SSO, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-2815-6037
  4. Dara Torgerson

    1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
    Contribution
    DT, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  5. Maria Pino-Yanes

    1. Hospital Universitario Nuestra Señora de Candelaria, Tenerife, Spain
    2. CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
    Contribution
    MP-Y, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0003-0332-437X
  6. Neeta Thakur

    1. Department of Medicine, University of California, San Francisco, United States
    Contribution
    NT, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-6126-6601
  7. Celeste Eng

    1. Department of Medicine, University of California, San Francisco, United States
    Contribution
    CE, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  8. Donglei Hu

    1. Department of Medicine, University of California, San Francisco, United States
    Contribution
    DH, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  9. Scott Huntsman

    1. Department of Medicine, University of California, San Francisco, United States
    Contribution
    SH, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  10. Harold J Farber

    1. Department of Pediatrics, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas
    Contribution
    HJF, Recruited participants, Acquisition of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  11. Pedro C Avila

    1. Division of Allergy and Immunology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois
    Contribution
    PCA, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  12. Emerita Brigino-Buenaventura

    1. Kaiser Permanente-Vallejo Medical Center, Vallejo, United States
    Contribution
    EB-B, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  13. Michael A LeNoir

    1. Bay Area Pediatrics, Oakland, United States
    Contribution
    MAL, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  14. Kelly Meade

    1. Department of Pediatrics, Children’s Hospital and Research Center, Oakland, United States
    Contribution
    KM, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  15. Denise Serebrisky

    1. Jacobi Medical Center, Bronx, United States
    Contribution
    DS, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  16. William Rodríguez-Cintrón

    1. Veterans Caribbean Health System, San Juan, United States
    Contribution
    WR-C, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  17. Rajesh Kumar

    1. Division of Allergy and Immunology, The Ann and Robert H Lurie Children’s Hospital of Chicago, Chicago, United States
    Contribution
    RK, Recruited participants, Acquisition of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  18. Jose R Rodríguez-Santana

    1. Centro de Neumología Pediátrica, San Juan, United States
    Contribution
    JRR-S, Recruited participants, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  19. Max A Seibold

    1. Center for Genes, Environment, and Health, Department of Pediatrics, National Jewish Health, Denver, United States
    Contribution
    MAS, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  20. Luisa N Borrell

    1. Graduate School of Public Health and Health Policy, City University of New York, New York, United States
    Contribution
    LNB, Recruited participants, Acquisition of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  21. Esteban G Burchard

    1. Department of Medicine, University of California, San Francisco, United States
    2. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
    Contribution
    EGB, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Noah Zaitlen
    For correspondence
    1. esteban.burchard@ucsf.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-7475-2035
  22. Noah Zaitlen

    1. Department of Medicine, University of California, San Francisco, United States
    Contribution
    NZ, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Esteban G Burchard
    For correspondence
    1. noah.zaitlen@ucsf.edu
    Competing interests
    The authors declare that no competing interests exist.

Funding

National Institutes of Health (multiple)

  • Joshua M Galanter
  • Christopher R Gignoux
  • Neeta Thakur
  • Harold J Farber
  • Rajesh Kumar
  • Max A Seibold
  • Esteban G Burchard
  • Noah Zaitlen

American Asthma Foundation

  • Esteban G Burchard

Sandler Family Foundation

  • Esteban G Burchard

Tobacco-Related Disease Research Program (24RT-0025)

  • Esteban G Burchard

Flight Attendant Medical Research Institute

  • Esteban G Burchard

Hewett Fellowship

  • Joshua M Galanter

Parker B. Francis Fellowship Program

  • Neeta Thakur

American Thoracic Society

  • Neeta Thakur

University of California, San Francisco (Chancellor's Research Fellowship)

  • Christopher R Gignoux

University of California, San Francisco (Dissertation of the Year Fellowship)

  • Christopher R Gignoux

Ernest S. Bazley Grant

  • Pedro C Avila

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors acknowledge the families and patients for their participation and thank the numerous health care providers and community clinics for their support and participation in GALA II. In particular, the authors thank study coordinator Sandra Salazar; the recruiters who obtained the data: Duanny Alva, MD, Gaby Ayala-Rodriguez, Lisa Caine, Elizabeth Castellanos, Jaime Colon, Denise DeJesus, Blanca Lopez, Brenda Lopez, MD, Louis Martos, Vivian Medina, Juana Olivo, Mario Peralta, Esther Pomares, MD, Jihan Quraishi, Johanna Rodriguez, Shahdad Saeedi, Dean Soto, Ana Taveras. We also thank Sasha Gusev for helpful discussion. Computations in this manuscript were performed using the UCSF Biostatistics High Performance Computing System.

Ethics

Human subjects: All research on human subjects was approved by the Institutional Review Board at the University of California and each of the recruitment sites (Kaiser Permanente Northern California, Children's Hospital Oakland, Northwestern University, Children's Memorial Hospital Chicago, Baylor College of Medicine on behalf of the Texas Children's Hospital, VA Medical Center in Puerto Rico, the Albert Einstein College of Medicine on behalf of the Jacobi Medical Center in New York and the Western Review Board on behalf of the Centro de Neumologia Pediatrica), and all participants/parents provided age-appropriate written assent/consent.

Reviewing Editor

  1. Magnus Nordborg, Reviewing Editor, Austrian Academy of Sciences, Austria

Publication history

  1. Received: August 10, 2016
  2. Accepted: November 23, 2016
  3. Version of Record published: January 3, 2017 (version 1)
  4. Version of Record updated: January 4, 2017 (version 2)

Copyright

© 2017, Galanter et al

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,876
    Page views
  • 715
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Scopus, Crossref.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genomics and Evolutionary Biology
    2. Microbiology and Infectious Disease
    Kevin S Bonham et al.
    Research Article