1. Genetics and Genomics
  2. Human Biology and Medicine
Download icon

Identification of an emphysema-associated genetic variant near TGFB2 with regulatory effects in lung fibroblasts

  1. Margaret M Parker
  2. Yuan Hao
  3. Feng Guo
  4. Betty Pham
  5. Robert Chase
  6. John Platig
  7. Michael H Cho
  8. Craig P Hersh
  9. Victor J Thannickal
  10. James Crapo
  11. George Washko
  12. Scott H Randell
  13. Edwin K Silverman
  14. Raúl San José Estépar
  15. Xiaobo Zhou  Is a corresponding author
  16. Peter J Castaldi  Is a corresponding author
  1. Brigham and Women’s Hospital, United States
  2. School of Medicine, University of Alabama at Birmingham, United States
  3. National Jewish Health, United States
  4. The University of North Carolina at Chapel Hill, United States
Research Article
  • Cited 0
  • Views 483
  • Annotations
Cite this article as: eLife 2019;8:e42720 doi: 10.7554/eLife.42720

Abstract

Murine studies have linked TGF-β signaling to emphysema, and human genome-wide association studies (GWAS) studies of lung function and COPD have identified associated regions near genes in the TGF-β superfamily. However, the functional regulatory mechanisms at these loci have not been identified. We performed the largest GWAS of emphysema patterns to date, identifying 10 GWAS loci including an association peak spanning a 200 kb region downstream from TGFB2. Integrative analysis of publicly available eQTL, DNaseI, and chromatin conformation data identified a putative functional variant, rs1690789, that may regulate TGFB2 expression in human fibroblasts. Using chromatin conformation capture, we confirmed that the region containing rs1690789 contacts the TGFB2 promoter in fibroblasts, and CRISPR/Cas-9 targeted deletion of a ~ 100 bp region containing rs1690789 resulted in decreased TGFB2 expression in primary human lung fibroblasts. These data provide novel mechanistic evidence linking genetic variation affecting the TGF-β pathway to emphysema in humans.

https://doi.org/10.7554/eLife.42720.001

eLife digest

It is well known that smoking is bad for the lungs. Not only can smoking cause lung cancer, it can also lead to conditions such as emphysema. This is the gradual damage to lung tissue that occurs when the walls of the tiny air-sacs in the lungs where the blood takes up oxygen, called the alveoli, weaken and break. Emphysema causes shortness of breath and difficulty pushing air out of the lungs, and it is part of chronic obstructive pulmonary disease (also known as COPD).

Genetic differences mean that certain people are more likely to develop emphysema than others. As an example, if someone has genetic mutations that alter the activity of a gene called TGFB2, their risk of developing emphysema increases. However, the specific genetic mutations that modify the activity of TGFB2 were previously unknown.

Parker et al. analyzed the genetic sequences of TGFB2 from patients with emphysema and compared them to those from healthy individuals. This revealed that certain mutations near the TGFB2 gene were more common in patients with emphysema. Next, Parker et al. showed that, in healthy lung cells called fibroblasts, the stretch of DNA that was mutated in patients with emphysema touched the part of TGFB2 that controls when the gene is activated. Deleting that same stretch of DNA in the fibroblasts meant the cells could no longer activate the TGFB2 gene as efficiently. Together, these results reveal a genetic difference that increases the risk for emphysema.

COPD affects approximately 175 million people worldwide, causing over three million deaths each year. The findings of Parker et al. suggest that developing drugs that safely and efficiently target TGFB2 may be a way to help patients with early signs of emphysema.

https://doi.org/10.7554/eLife.42720.002

Introduction

Emphysema, that is pathologic destruction of lung parenchyma resulting in airspace enlargement, is one of the major manifestations of chronic obstructive pulmonary disease (COPD). Emphysema occurs in distinct pathologic patterns, but these patterns are not captured by traditional quantitative measures of emphysema from lung computed tomography (CT). In order to have more detailed radiographic measures of emphysema, we developed novel image extraction techniques to quantify the distinct patterns of emphysema based on the analysis of local lung density histograms (Mendoza, 2012). These local histogram emphysema (LHE) measures are more predictive of clinical outcomes than standard CT emphysema quantifications (Castaldi et al., 2013), and in a previous genome-wide association study (GWAS) we identified genome-wide significant associations with these distinct LHE patterns (Castaldi et al., 2014). However, the mechanisms by which these GWAS loci affect emphysema patterns are unknown.

The majority of GWAS-identified loci for genetically complex diseases are located in non-coding DNA and influence gene regulatory elements (Maurano et al., 2012; Nicolae et al., 2010). Thus, for the functional characterization of emphysema GWAS loci, it is necessary to localize causal variants in regulatory elements and identify the gene(s) regulated by that element. Since multiple cell types contribute to emphysema, large-scale functional annotation projects such as the Genotype-Tissue Expression Project (GTEx) (GTEx Consortium, 2015) and the Encyclopedia of Regulatory Elements (ENCODE) (ENCODE Project Consortium, 2012) can be integrated with GWAS signals to identify candidate regulatory regions, tissues, and cell types of interest for more detailed functional characterization.

In this study, we hypothesized that human emphysema is influenced by functional genetic variants that disrupt gene regulatory elements. As a screening approach, we cross-referenced GWAS results against large compendia of gene regulatory data from tissues and cell types to prioritize emphysema-associated loci for further functional study. This analysis identified rs1690789 as a high-probability functional variant in the GWAS-identified region near TGFB2. Using chromatin conformation capture, we confirmed that the region spanning this SNP interacts with the TGFB2 promoter region. Via CRISPR/Cas-9 targeted deletion, we then demonstrated that a ~ 100 bp segment containing rs1690789 increases TGFB2 expression in primary human lung fibroblasts, providing novel evidence that genetic variation affecting TGF-β signaling contributes to the genetic predisposition to emphysema.

Results

Validation of LHE clinical and genetic associations

In subjects from the COPDGene Study, we have previously demonstrated that LHE measures are associated with COPD-related phenotypes (Castaldi et al., 2013) and with common genetic variants at genome-wide significance (Castaldi et al., 2014). To confirm these associations in an independent cohort and discover new genetic associations, we generated new LHE measures in 1519 subjects from the ECLIPSE Study, and we replicated the previously observed relationships between LHE pattern and GOLD (Global Initiative for Obstructive Lung Disease) spirometric grade (Figure 1). In the combined GWAS meta-analysis, we identified 10 independent regions with genome-wide significant associations to at least one LHE phenotype, six of which had been previously described (Table 1). One of the four novel associations is rs28929474, the pathogenic Glu→Lys substitution in SERPINA1 which is known to be associated with COPD. There was no evidence of systematic inflation in the QQ-plots of these GWAS (Figure 2). Subject characteristics are shown in Supplementary file 1 Table 1, and complete results by cohort are shown in Supplementary file 1 Table 2.

Percentage of each LHE-based emphysema pattern by Global Obstructive Lung Disease (GOLD) stage in ECLIPSE.

NE = Non-emphysematous lung. CL1 = Mild centrilobular. CL2 = Moderate centrilobular. CL3 = Severe centrilobular. PL = Panlobular. PS = Paraseptal. %LAA-950 = emphysema based on −950 Hounsfield unit threshold.

https://doi.org/10.7554/eLife.42720.003
QQ plots for GWAS meta-analyses of non-transformed LHE phenotypes.

(A) Nonemphysematous lung. (B) Mild centrilobular pattern. (C) Moderate centrilobular. (D) Severe centrilobular. (E) Panlobular.

https://doi.org/10.7554/eLife.42720.004
Table 1
Genome-wide significant meta-analysis results for local histogram emphysema phenotypes.
https://doi.org/10.7554/eLife.42720.005
LHE patternLead SNPChromosomePosition (hg19)EAFEffect alleleP value metaEffect metaSE effect meta
Moderate Centrilobularrs5607733315788990030.67a2.2E-130.0162.2E-03
rs17368582111027380750.14t8.1E-120.0243.5E-03
rs5611385019413531070.59t1.5E-09−0.0162.6E-03
rs79639512186819710.52a6.1E-090.0132.2E-03
rs13864140241454457790.38a3.3E-080.0142.5E-03
Nonemphysematous (Normal Lung)rs13864140241454457790.38a4.2E-08−0.0213.9E-03
rs717006815789129430.78a4.8E-120.0284.0E-03
rs17368659111027427610.86t6.3E-110.0365.4E-03
rs28929474*14948449470.98t6.2E-09−0.0711.2E-02
Panlobularrs1185237215788013940.35a7.1E-12−0.0035.0E-04
rs76756075*111123498440.02t2.0E-08−0.0071.2E-03
rs78070126*1165746080.97t2.8E-080.0061.1E-03
rs14577077021524878080.99a3.8E-08−0.0152.8E-03
Severe Centrilobularrs978872115788028690.38t3.4E-15−0.0056.0E-04
rs37912317308918140.40t3.7E-08−0.0047.0E-04
  1. LHE - local histogram emphysema.

    EAF - effect allele frequency in 1000 Genomes CEU population.

  2. *indicates novel association not previously associated in GWAS of COPD or emphysema (rs28929474 was associated to FEV1/FVC in smokers in Li et al., 2018, during preparation of this manuscript).

Since the more severe emphysema patterns (severe centrilobular and panlobular emphysema) are non-normally distributed, we performed a sensitivity analysis for these top results after performing inverse normal transformation of the LHE pattern phenotypes (Supplementary file 1 Table 3). In this analysis, four loci remained genome-wide significant (loci on chromosome 15, 14, 11, and 1), two loci had p-values<5×10−7, and four associations to the panlobular and severe centrilobular patterns had notably lower p-values suggesting that these specific associations are driven by extreme phenotype values and should be interpreted with caution.

With regard to the association with the SERPINA1 Z-allele (rs28929474), subjects with known alpha-1 antitrypsin deficiency had been excluded from our primary analysis. However, when we examined the imputed genotypes of rs28929474, we identified six individuals in ECLIPSE with imputed PiZZ genotypes. When we repeated the genetic analysis without these subjects, there was an increase in association p-value in ECLIPSE (0.003 versus 0.0004, consistent direction of effect), and the meta-analysis association p-value was 1.6 × 10−7.

To determine whether these variants were associated with other COPD-related phenotypes, we queried the LHE GWAS significant associations against the results from two recent large GWAS studies for FEV1, FEV1/FVC, and COPD status (Shrine et al., 2019; Sakornsakolpat et al., 2019). Five of the 10 LHE loci (lead variants rs56113850, rs796395, rs17368659, rs145770770, and the 15q25 locus) were associated to at least one of these outcomes at p<0.05 with a consistent direction of effect (Supplementary file 1 Table 4).

Some loci associated with COPD and related phenotypes have also been associated with smoking behavior, raising the question of whether the COPD associations at these loci are mediated through smoking. To determine how many of our associations were also associated to smoking behavior, we queried our results against the UK Biobank Pheweb server GWAS for prior history of smoking, and we observed that the only associations that were nominally associated to smoking were the previously known smoking associations in the 15q25 and 19q13 loci (Supplementary file 1 Table 5).

eQTL colocalization analysis to identify candidate GWAS target genes and tissue enrichment of LHE GWAS signals

To generate functional hypotheses for emphysema-associated loci and prioritize regions for further functional study, we integrated our GWAS results with large-scale genome-wide eQTL and cell type epigenomic data, as shown in Figure 3. To identify emphysema-associated loci that overlap with eQTL signals from multiple tissues, we cross-referenced our LHE GWAS results against eQTL results from 44 GTEx tissues and blood eQTLs from COPDGene. Since overlap between GWAS and eQTL signals can be due to chance, we used a Bayesian colocalization method (Giambartolomei et al., 2014) to quantify the probability that the local GWAS and eQTL signals were attributable to a shared causal variant. Four genome-wide significant LHE regions overlapped with eQTL regions with an estimated >80% probability of a shared causal variant responsible for the GWAS and eQTL associations (Table 2).

Overview of integrative analyses to prioritize genome-wide significant emphysema-associated loci for functional studies.
https://doi.org/10.7554/eLife.42720.006
Table 2
Genomewide significant LHE loci that colocalize with eQTL.
https://doi.org/10.7554/eLife.42720.007
GWASSNPCHRPOSTissueGENE
Moderate centrilobularrs7963951218681971FibroblastsTGFB2
rs560773331578899003Fibroblasts, TestisPSMA4, CHRNA5
Moderate centrilobular, normalrs561138501941353107Lung, TestisCYP2A6, AKT2
Panlobularrs118523721578801394TestisCHRNA5
All loci have colocalization probability > 80%, reflecting the estimated probability that the GWAS and eQTL association signals arise from a shared causal variant.
LHE pattern – local histogram emphysema phenotype used for GWAS.
SNP - lead GWAS variant in locus.
Tissue - tissue of origin for gene expression data in eQTL analysis.
Gene – eQTL targeted gene in specified GTEx tissue.

To identify additional candidate colocalization loci that may be present below the stringent genome-wide significance threshold, we studied SNPs with a GWAS p<5×10−5. At this threshold, the number of GWAS-eQTL overlap loci ranged from 78 (panlobular pattern) to 159 (moderate centrilobular pattern), representing between 15% to 33% of the total number of loci with a GWAS p<5×10−5. Of these loci, 32 had a > 80% estimated probability of having a shared causal GWAS-eQTL variant, and we identified the genes whose expression levels are altered by these loci (Supplementary file 1 Table 6). Full results of this analysis are available at https://cdnm.shinyapps.io/lhemphysema_eqtlcolocalization/.

To test for tissue-specific enrichment of LHE GWAS signals, we quantified the enrichment of LHE GWAS regions associated at p<5×10−5 in DNaseI peak regions from ENCODE and Roadmap cell types using the Garfield method (Iotchkova et al., 2019). The most commonly enriched cell types were fibroblasts and fetal lung tissue, as can be seen in the enrichment results for moderate centrilobular emphysema (Figure 4). Out of 424 tested cell type annotations, there were 15, 25, and 1 cell type that exceeded the significance threshold for the moderate centrilobular, nonemphysematous, and severe centrilobular LHE phenotypes, respectively (Supplementary file 1 Table 7).

Cell type and tissue enrichment for moderate centribloular emphysema GWAS signals.

Using Garfield (Iotchkova et al., 2019) for enrichment analysis, we tested the enrichment of moderate centrilobular GWAS loci (harboring associations a p<5×10−5) in DNaseI peaks from 424 cell lines and cell types in ENCODE and Roadmap. Significant enrichments were observed in fetal lung, fetal stomach, and multiple fibroblast cell types. These significant enrichments are denoted by colored dots located just inside the boundary of the circle of cell types.

https://doi.org/10.7554/eLife.42720.008

Fine mapping identifies a candidate causal variant in the TGFB2 locus

One of the top GWAS-eQTL colocalization signals associated with the moderate centrilobular emphysema pattern spans a 200 kb region that includes the 3’ UTR of TGFB2 and extends 100 kb downstream. Multiple SNPs in this region were significantly associated with TGFB2 expression in human tissues from the GTEx project (Figure 5) with the highest colocalization present with the eQTL signal in cultured fibroblasts. Given the essential roles of TGF-β signaling and fibroblasts in lung repair pathways, we selected this locus for further investigation.

The locus zoom plot of GWAS p-values suggests two independent associations (Panel A), and the GWAS signal colocalizes with an eQTL signal in fibroblasts from GTEx (Panel B).

rs1690789 is located at one of these GWAS association peaks and lies within a context-specific DNaseI peak (Panel C). GM12878, H1hESC, and K562 cell lines are shown for reference, and the remaining cell types are those with DNaseI peaks that overlap rs1690789. Raw DNaseI data from only one experimental replicate are shown. GM12878 = lymphblastoid cell line. H1hESC = human embryonic stem cell. K562 = leukemia cell line. AG0449-AG10803 refer to fibroblasts from different subjects and sampling sites. HGF = gingival fibroblasts. HMVECLLy = lung derived microvascular endothelial cells. HPdLF - Periodontal ligament fibroblasts. HRPEpiC – retinal pigment epithelial cells. HSMM – skeletal muscle myoblasts. NHDF – dermal fibroblasts. NHLF – lung fibroblasts.

https://doi.org/10.7554/eLife.42720.009

To confirm the colocalization results for TGFB2, we performed a separate colocalization analysis using the same eQTL data but a separate colocalization methodology (He et al., 2013). Sherlock analysis for the moderate centrilobular GWAS results and GTEx eQTL data from fibroblasts, lung tissue, and whole blood confirmed TGFB2 as a colocalization target for moderate centrilobular emphysema in fibroblasts, and a total of nine colocalizing genes or transcripts were identified at a p-value<1×10−4 (Supplementary file 1 Table 8).

The GWAS signal in this region appears to demonstrate two independent peaks of association spanning a recombination hotspot, with the fibroblast eQTL signal appearing to colocalize with only one of these signals. We performed conditional genetic association analysis of this region, confirming the presence of two independent signals (secondary association lead SNP rs3009942 p=4.4×10−7, Figure 6). To confirm that these are independent signals, we also performed conditional association adjusting for rs3009942, which minimally attenuated the primary association (rs796395 conditional p-value=3.3×10−7).

Secondary association for moderate centrilobular emphysema at the TGFB2 locus.

Results from genetic association meta-analysis conditioned on the lead SNP (rs796395) at this region.

https://doi.org/10.7554/eLife.42720.010

Focusing on the primary association peak which colocalized with the fibroblast eQTL signal, we estimated the causal probability (i.e. the likelihood that each individual SNP is the causal variant) of each SNP in this region using the PICS method (Farh et al., 2015), identifying seven variants each with a > 5% estimated likelihood to be causal (Supplementary file 1 Table 9). We then queried whether any of these seven SNPs were predicted to alter transcription factor occupancy using the results of a previously published model developed from ENCODE data (Maurano et al., 2015), identifying rs1690789 (minor allele frequency of 0.48 in 1000 Genomes EUR population) as the only variant in this set predicted to have allele-specific effects on transcription factor occupancy.

Analysis of DNaseI accessibility near rs1690789 across various cell types in publicly available data

Using the ENCODE uniformly processed DNaseI hypersensitivity dataset of 125 cell types, we observed that rs1690789 lies within a DNaseI hypersensitivity peak identified in 13 cell types (Figure 5, Panel C). Eight of these 13 cell types were fibroblasts, although this peak was not universally detected in all fibroblast DNaseI experiments, suggesting that this may be a context-specific regulatory element or that DNaseI accessibility may be influenced by genetic variation in these cell types.

Chromatin interaction between GWAS peak regions and the TGFB2 promoter

Since rs1690789 is located ~200 kb from the transcription start site of TGFB2, we hypothesized that this region may regulate TGFB2 expression via a long-range chromatin interaction. Using publicly available 4C-Seq chromatin conformation data from IMR90 human lung fibroblasts (Rao et al., 2014), we observed that the 10 kb region containing rs1690789 contacts multiple upstream and downstream regions around TGFB2 (Figure 7A), suggesting that this region is a hotspot of chromosomal interaction.

Publicly available chromatin conformation capture (4C) results in IMR90 cells show multiple peaks of interaction for the 10 kb region containing the context-specific regulatory element around rs1690789 (Panel A – blue spikes in top figure indicate regions of high interaction frequency and light blue curved lines in lower figure indicate chromosomal interactions with the 10 kb region containing rs1690789).

Newly generated 3C assays in IMR90 fibroblasts verify the interaction between the region containing rs1690789 and TGFB2 promoter (Panel B). Primer sequences are listed in Supplementary file 1 Table 10.

https://doi.org/10.7554/eLife.42720.011

To confirm whether rs1690789 region indeed interacts with the promoter of TGFB2 in lung fibroblasts, we performed chromatin conformation capture (3C) experiments in human lung fibroblasts (IMR90). Using the TGFB2 promoter region as the anchor region, we detected interaction between the rs1690789-containing region and the TGFB2 promoter in lung fibroblasts (Figure 7B), suggesting long range regulation of TGFB2 by the region containing rs1690789.

Deletion of the rs1690789 region alters TGFB2 expression in lung fibroblasts

To determine whether the DNA region near rs1690789 has regulatory effects on the expression of TGFB2 in human lung fibroblasts in the endogenous genomic context, we generated CRISPR/Cas-9 constructs containing gRNA pairs targeting the ~100 bp region spanning rs1690789 (Figure 8A) to generate genomic deletions in normal primary human lung fibroblasts. With sufficient deletion efficiency of the region spanning rs1690789, we detected reduced expression of TGFB2 (Figure 8B and C, Supplementary file 1 Table 11), indicating that this distal genomic region has regulatory effects on the expression of TGFB2 in normal primary human lung fibroblasts.

Regional knockout of rs1690789 in primary human lung fibroblasts using CRISPR/Cas9 editing.

Three pairs of sgRNAs were applied to delete a DNA region of ~100 bp spanning rs1690789 (A). The expression of TGFB2 is downregulated in rs1690789 knockout lung fibroblast cells with qPCR quantification, n = 4 (B). The editing efficiency is examined to confirm the effect of CRISPR/Cas9 regional knockout, n = 4 (C). *p<0.05 compared to control by unpaired t test.

https://doi.org/10.7554/eLife.42720.012

Discussion

Previous GWAS studies have demonstrated that common genetic variation contributes to emphysema (Cho et al., 2015), likely through the perturbation of gene regulatory mechanisms (Castaldi et al., 2014). In order to identify putative causal variants and regulatory mechanisms for these loci, we used a screening approach that leverages large compendia of gene regulatory information in the GTEx and ENCODE projects. Using Bayesian colocalization, we identified 32 emphysema-associated loci at p<5×10−5 where it is likely that colocalized GWAS and eQTL signals arise from the same causal variant. It should be noted that these are putative and not confirmed disease variants due to our use of a relaxed GWAS significance threshold and the inherent complexities of colocalization, which continues to be an area of active methodological development. For the genome-wide significant locus near TGFB2, multiple sources of publicly available and newly generated experimental data link a functional variant, rs1690789, to TGFB2 expression in fibroblasts. These data suggest that naturally occurring genetic variability in TGF-β signaling plays a causal role in the development of emphysema.

The TGF-β family of proteins constitutes a set of highly conserved signaling pathways that play a key role in human development and many other cellular functions (Huminiecki et al., 2009; Massagué, 2012). With respect to the lung, TGF-β family proteins participate in normal lung development and are dysregulated in COPD, emphysema, asthma, and pulmonary fibrosis (Verhamme et al., 2015; Morris et al., 2003; Thomas et al., 2016). Genetic variants near TGF-β superfamily members TGFB2 (Castaldi et al., 2014Cho et al., 2014), ACVR1B (Boueiz et al., 2019), LTBP4 (Wain et al., 2017), and BMP6 (Loth et al., 2014) have been identified in GWAS for lung function and COPD, but prior to this study the region near ACVR1B was the only one linked to a gene in the TGF-β pathway through functional studies (Boueiz et al., 2019). Our findings demonstrate that the emphysema-associated variant rs1690789 is located in an active gene regulatory region in human lung fibroblasts that interacts with the promoter region of TGFB2 and regulates TGFB2 expression.

These analyses highlight the genetic and gene regulatory complexity of this region. Conditional association analyses identified two independent associations with moderate centrilobular emphysema near TGFB2, and both associations are in linkage equilibrium (i.e. low linkage disequilibrium) with the lead variant identified in a previous GWAS of severe COPD (Cho et al., 2014). In addition, the region containing rs1690789 has multiple interactions with other DNA regions, including the TGFB2 promoter and other downstream regions, indicating that this is a region of active chromatin interaction in human lung fibroblasts.

While our analyses provide evidence that the emphysema-associated GWAS region downstream from TGFB2 interacts with the promoter of TGFB2 and regulates the expression of TGFB2 in human primary lung fibroblasts, many important questions remain about the function of the emphysema-associated locus near TGFB2. First, the rs1690789 variant appears to be an eQTL for expression of TGFB2 in fibroblasts, but it is also strongly associated with TGFB2 expression in thyroid tissue in GTEx with an opposite direction of effect, suggesting complex and possibly context-dependent activity of this region. This is further supported by the observation that rs1690789 lies within a DNaseI peak in some but not all fibroblasts in the ENCODE and Roadmap projects, suggesting that the regulatory element in this region may be active only in certain fibroblast subsets, under certain conditions, or that the regulatory activity of this region is influenced by common (but unmeasured) genetic variation in these cells. Additional investigations are warranted to examine the context-specific function of this region. Second, our studies do not explain the function of the secondary association signal in this region, and it is also possible that both association regions may have functional effects in other cell types that contribute to COPD susceptibility. Third, it is possible that even within a single, statistically independent association peak, there may be multiple functional variants in tight linkage disequilibrium that contribute to the emphysema-related effects of this region. Future functional screening studies of this region can address this question. Finally, gene-level functional studies will be required to characterize the functional consequences of increased and decreased TGFB2 expression on lung fibroblast function.

In summary, integrative GWAS-eQTL analysis of emphysema patterns identified 32 candidate loci with strong evidence of harboring gene regulatory variants responsible for the GWAS signal, including a locus near TGFB2. Functional investigation of the associated region near TGFB2 confirmed the presence of a functional variant, rs1690789, that likely contributes to the genetic predisposition to emphysema by regulating TGFB2 expression in fibroblasts. This region has multiple independent association signals and an extensive pattern of chromosomal interaction, indicating that additional investigations are required to fully characterize the gene regulatory activity at this locus. In addition to the association near TGFB2, we identified dozens of other high confidence regions in our colocalization analysis, indicating additional functional variants that could be identified by high-throughput functional characterization approaches such as massively parallel reporter assays or CRISPR-mediated mutagenesis.

Materials and methods

Study subjects

COPDGene

Request a detailed protocol

The Genetic Epidemiology of COPD Study (COPDGene, NCT00608764, www.copdgene.org) is an ongoing multicenter, longitudinal study designed to investigate the genetic and epidemiologic characteristics of COPD. The protocols for subject recruitment and data collection for the COPDGene study have been previously described (Regan et al., 2010). At baseline, COPDGene enrolled 10,192 Non-Hispanic White (NHW) and African-American (AA) subjects at 21 centers across the United States between the ages of 45 and 80 years with a minimum of 10 pack-years smoking history. Subjects represented the full spectrum of disease severity as defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) spirometric grading system. In addition to completing detailed questionnaires, pre- and post-bronchodilator spirometry, and volumetric computed tomography of the chest, participants provided whole blood for DNA genotyping.

Genotyping was performed by Illumina (San Diego, CA) on the HumanOmniExpress array. Subjects were excluded for missingness, heterozygosity, chromosomal aberrations, gender check, population outliers, and cryptic relatedness. Genotyping at the Z and S alleles was performed in all subjects. Subjects known or found to have alpha-1 antitrypsin deficiency were excluded. Markers were excluded based on missingness, Hardy- Weinberg P-values, and low minor allele frequency. Imputation on the COPDGene cohorts was performed using MaCH and minimac (version 2012-10-09) (Li et al., 2010; Howie et al., 2012). Reference panels for the non- Hispanic whites and African-Americans were the 1000 Genomes3 Phase I v3 European (EUR) and cosmopolitan reference panels, respectively.

ECLIPSE

Request a detailed protocol

The Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints Study (ECLIPSE; SCO104960, NCT00292552, www.eclipse-copd.com) is a longitudinal study with three-year follow-up data available for 2501 smoking subjects (2164 subjects with COPD and 337 smoking controls). The detailed study protocol and inclusion criteria have been previously published (Vestbo et al., 2008). For this analysis, 1519 subjects with COPD (defined as GOLD spirometric stages 2–4) and available CT scans were analyzed. COPD was defined by FEV1 <80% of predicted and FEV1/FVC < 0.7.

Genotyping was performed using the Illumina HumanHap 550 V3 (Illumina, San Diego, CA). Subjects and markers with a call rate of <95% were excluded. Subjects with alpha-1 antitrypsin deficiency based on serum protein levels were excluded from this analysis. Population stratification and genotype imputation was performed using the same procedures and software as described above for COPDGene. GWAS models were adjusted for age, gender, pack-years of smoking history, and genetic ancestry via principal components.

Common variant genetic association analysis of LHE measures

Request a detailed protocol

We performed GWAS analyses of the 5 LHE measurements separately in the three cohorts (COPDGene NHW, COPDGene AAs, and ECLIPSE, total N = 11,282 subjects, 18,383,174 SNPs imputed to the 1000 Genomes reference panel, version 3, hg19). Analysis was limited to imputed SNPs with an imputation r2 >0.3. Imputed genotypes were analyzed using the --dosage command in PLINK v1.9 (Chang et al., 2015), though for SNPs with genotyped data the observed genotypes were used. GWAS models were adjusted for age, gender, pack-years of smoking history, and genetic ancestry via principal components (Price et al., 2006). Results were meta-analyzed using the METAL (Willer et al., 2010) program using fixed effects meta-analysis with inverse variance weighting using SNP effect sizes and standard errors. We analyzed SNPs with a MAF >1%, and we meta-analyzed SNPs with results in at least two of the three cohorts.

CT scan acquisition and generation of LHE measures

Request a detailed protocol

The generation of LHE measures in COPDGene has been previously described (Castaldi et al., 2013). For the current studies, additional LHE measures were generated in ECLIPSE CT scans using the same method.

Clinical associations of LHE measures in ECLIPSE

Request a detailed protocol

LHE measurements have been previously associated with key COPD-related measures (e.g. spirometry, MMRC) (Castaldi et al., 2013). To test if this relationship was consistent in the measurements generated in ECLIPSE, we visualized the median percentage of each emphysema pattern by GOLD stage.

GWAS lookups of LHE significant variants in GWAS of FEV1, FEV1/FVC, COPD, and smoking behavior

Request a detailed protocol

For the 14 lead SNPs associated with one or more of the LHE phenotypes at genome-wide significance, we queried other COPD-related GWAS for these variants or variants in linkage disequilibrium with these variants (r2 >0.8 in the 1000 Genomes EUR reference panel). The queried GWAS studies were published studies of FEV1 and FEV1/FVC (Shrine et al., 2019), COPD status (Sakornsakolpat et al., 2019), or history of smoking. The smoking GWAS results were obtained from the UK Biobank Pheweb server (http://pheweb.sph.umich.edu:5000/) on July 7, 2019 for the phenotype ‘20116_1: Smoking status: Previous.’

eQTL data and colocalization analysis

Request a detailed protocol

For colocalization and cell type enrichment analyses, GWAS SNPs significant at p<5×10−5 were considered. GTEx version six full results for 44 tissues were downloaded from the GTEx portal (https://www.gtexportal.org/home/datasets), and eQTLs were calculated from blood RNAseq data in 385 NHW subjects from the COPDGene study using the same methods used in the GTEx Study v6 analysis. Details on the generation of COPDGene RNAseq data have been previously described (Parker et al., 2017). GWAS-eQTL integrative analysis was performed according to the approach previously described in Castaldi et al. (2015). Briefly, for each set of eQTL results, SNPs with a significant cis eQTL association at a 10% FDR threshold were extracted from each of the five sets of LHE GWAS results. Q-values were calculated for each subset of GWAS SNPs separately using the q-value package (Storey et al., 2019), and SNPs demonstrating both significant eQTL and GWAS associations were retained for subsequent analysis (i.e. eQTL-GWAS SNPs). Within each set of eQTL-GWAS SNPs, association regions for colocalization were defined by selecting all SNPs within 250 kilobases (kb) of each independent GWAS association. Colocalization of the GWAS and eQTL signals in these regions was calculated using the Bayesian colocalization method implemented in the R package coloc (Giambartolomei et al., 2014) using the default settings for the prior probability of a SNP being associated to target gene expression, the GWAS phenotype, and both measures (prior probability 1 × 10−4, 1 × 10−4, and 1 × 10−5, respectively).

To confirm the colocalization results for TGFB2, colocalization was also performed for the GWAS results for moderate centrilobular emphysema using the Sherlock method (He et al., 2013). This analysis was performed using all the moderate centrilobular GWAS results referenced against three GTEx v6 eQTL datasets (transformed fibroblasts, lung, and whole blood). The following parameter settings were used: cis eQTL significance threshold p<0.001, trans eQTL significance threshold p<1×10−5.

Causal SNP estimation with PICS

Request a detailed protocol

To narrow the list of putative causal variants for the primary association near TGFB2, we used the probabilistic inference of causal SNPs algorithm (PICS) (Farh et al., 2015) which infers per SNP causal probabilities from the strength of association of the lead SNP and linkage disequilibrium information from 1000 Genomes reference populations. The EUR reference population was used for this analysis, which was conducted via the PICS web interface (https://pubs.broadinstitute.org/pubs/finemapping/pics.php).

Identification of variants predicted to effect transcription factor occupancy

Request a detailed protocol

For SNPs with a PICS causal probability of 5% or greater, we queried these SNPs against their Contextual Analysis of Transcription Factor Occupancy (CATO) model predictions (Maurano et al., 2015), which was trained on deep DNaseI sequencing data from the Roadmap project to predict per-SNP effects on transcription factor occupancy based on the predicted effects of each SNP on the binding energy of overlapping TF motifs and a number of factors related to local genomic sequence content. SNPs exceeding a CATO score of 0.1 were considered likely to alter TF occupancy.

Cell type and cell line GWAS enrichment analysis with garfield

Request a detailed protocol

To determine whether LHE GWAS association were enriched in gene regulatory annotations from ENCODE and Roadmap Epigenomics data, we performed enrichment analysis for the LHE phenotypes with genome-wide significant results using the Garfield program and its pre-processed epigenomic annotations (Iotchkova et al., 2019). The GWAS significance threshold was set at p<5×10−5, and the default parameters were used for LD pruning (r2 >0.1), LD proxy threshold (r2 >0.8), minor allele frequency binning (five bins), LD tag binning (five bins), and TSS distance binning (five bins). The significance threshold was set at p<0.0001 corresponding to Bonferroni adjustment for the effective number of independent annotations.

Overlap of rs1690789 with cell-specific DNaseI peaks

Request a detailed protocol

Imputed DNaseI hypersensitivity peaks from Roadmap Epigenomics cell types or cell lines (Ernst and Kellis, 2015) were downloaded from http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidatedImputed/narrowPeak/. The overlap of rs1690789 with DNaseI peaks and enhancer marks was identified using the GoShifter program (Trynka et al., 2015), and the raw DNaseI data for these cell types was visualized using the UCSC Genome browser.

Cell culture

Request a detailed protocol

IMR-90 fibroblasts were purchased from ATCC and cultured in Eagle's Minimal Essential Medium (EMEM, #12–611F, Lonza) supplemented with 10% fetal bovine serum, penicillin and streptomycin. The cells tested negative for mycoplasma by MycoAlert Detection Kit (#LT07-418, Lonza). Primary human lung fibroblast cells were isolated from the lung tissue of healthy individuals (Marsico Lung Institute, University of North Carolina at Chapel Hill, North Carolina) as previously described (Fulcher et al., 2005). Briefly, lung tissue samples were cut into small pieces and seeded onto culture dishes supplemented with DMEM/F12 medium, 10% fetal bovine serum, penicillin, streptomycin, amphotericin B and gentamicin. Amphotericin B and gentamicin were removed from the medium after the cells were passaged. The primary human lung fibroblasts were passaged twice and grown to 90% confluence prior to subsequent experiments. Human lung tissue was obtained under protocol #03–1396 approved by the University of North Carolina at Chapel Hill Biomedical Institutional Review Board.

4C data in IMR90 cell lines

Request a detailed protocol

4C chromosome conformation interaction results from the paper by Rao et al. (2014) were queried from the Yue Lab public website (http://promoter.bx.psu.edu/) using the following search parameters: Species = human, Assembly = hg19, Tissue = IMR90, Type = Lieberman VC-norm, Resolution = 10 kb, SNP = rs1690789, Extended Region = 500 kb.

Chromatin conformation capture assay (3C)-PCR

Request a detailed protocol

Human lung fibroblasts IMR90 cells were cultured to 80% confluency then cross-linked and lysed followed by digestion with BglII overnight. DNA fragments were then ligated with T4 ligase (New England Biolabs, #M0202L) for 6 hr at 16°C. After purification, 3C templates were used in PCR detection with unidirectional primers to indicate specific chromatin interaction by comparing relative band intensity from targeted regions against negative and positive control regions with three technical replicates (i.e. same 3C templates, multiple PCR repeats). Primer sequences used for 3C-PCR are listed in Supplementary file 1 Table 10. Detailed description of our methods has been published previously (Zhou et al., 2012).

CRISPR/Cas9 rs1690789 knockout

Request a detailed protocol

To generate the rs1690789 CRISPR/Cas9 regional knockout primary human lung fibroblast cells, two guide RNAs (u1 forward: 5’- GATACTCCAGTACATTGAGAAGG-3’; u2 forward: 5’-TGGAGTATCATTTCAGTGTTAGG-3’) located upstream from the SNP and two guide RNAs (d1 forward: 5’-CAGCAGCGAGTTTGGCACTCAGG-3’; d2 forward: 5’-TGTCTCATTGCACACTCATGGGG-3’) located downstream from the SNP were cloned into pSpCas9 (BB)−2A-Puro (PX459) V2.0 vectors (Addgene plasmids #62988), individually. Plasmids were verified by DNA sequencing. FuGENEHD was applied to transfect three pairs of gRNA plasmids (u1 and d1, u1 and d2, u2 and d2) into primary normal human lung fibroblast (NHLF) cells according to the manufacturer’s instructions. PX459 empty vectors were transfected as control. Forty-eight hours after transfection, cells were selected with 1.2 µg/mL puromycin. After 2–3 weeks of recovery and expansion, cells were collected for DNA, RNA extraction and qPCR. Four biological replicates were performed (i.e. same donor, four different transfections).

Assessment of CRISPR/Cas9 editing efficiency

Request a detailed protocol

DNA samples from human lung fibroblast cells were extracted using QuickExtract DNA Extraction solution (#QE0905T, Lucigen, WI) following manufacturer’s instructions. SYBRGreen dye-based quantitative RT-PCR was performed using the same equipment system and analysis method mentioned above, with the following primers to assess editing efficiency (forward: 5’- GTTACCGATGCTTAAATGCCAC-3’; reverse: 5’- AGAATATCCCCATGAGTGTGC-3’). The control was cells transfected with PX459 empty vector.

Gene expression measurements by RT-PCR

Request a detailed protocol

Human lung fibroblast cell RNA was extracted using RNeasy Mini Kit (#74106, Qiagen, MD), and reverse transcription was performed by using High-Capacity cDNA Reverse Transcription Kit (#4374966, Applied Biosystems, MA). Quantitative RT-PCR was performed on QuantStudio 12K Flex Real-Time PCR System (Applied Biosystems) with gene-specific TaqMan probes (Hs.PT.58.24824921) from IDT (Integrated DNA technologies, IA) for detecting TGFB2 expression. Relative gene expression level was calculated based on the standard 2−ΔΔCT method, using GAPDH as a reference gene. For both the TGFB2 expression and editing efficiency tests, qPCR values were normalized against the mean qPCR value for the control cells for each experiment. Comparisons were performed using unpaired t-tests.

Study approval

Request a detailed protocol

Written, informed consent was obtained for all participants, and all study and consent forms were approved by the institutional review boards of the participating institutions.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Genome-wide association analysis identifies six new loci associated with forced vital capacity
    1. DW Loth
    2. M Soler Artigas
    3. SA Gharib
    4. LV Wain
    5. N Franceschini
    6. B Koch
    7. TD Pottinger
    8. AV Smith
    9. Q Duan
    10. C Oldmeadow
    11. MK Lee
    12. DP Strachan
    13. AL James
    14. JE Huffman
    15. V Vitart
    16. A Ramasamy
    17. NJ Wareham
    18. J Kaprio
    19. XQ Wang
    20. H Trochet
    21. M Kähönen
    22. C Flexeder
    23. E Albrecht
    24. LM Lopez
    25. K de Jong
    26. B Thyagarajan
    27. AC Alves
    28. S Enroth
    29. E Omenaas
    30. PK Joshi
    31. T Fall
    32. A Viñuela
    33. LJ Launer
    34. LR Loehr
    35. M Fornage
    36. G Li
    37. JB Wilk
    38. W Tang
    39. A Manichaikul
    40. L Lahousse
    41. TB Harris
    42. KE North
    43. AR Rudnicka
    44. J Hui
    45. X Gu
    46. T Lumley
    47. AF Wright
    48. ND Hastie
    49. S Campbell
    50. R Kumar
    51. I Pin
    52. RA Scott
    53. KH Pietiläinen
    54. I Surakka
    55. Y Liu
    56. EG Holliday
    57. H Schulz
    58. J Heinrich
    59. G Davies
    60. JM Vonk
    61. M Wojczynski
    62. A Pouta
    63. A Johansson
    64. SH Wild
    65. E Ingelsson
    66. F Rivadeneira
    67. H Völzke
    68. PG Hysi
    69. G Eiriksdottir
    70. AC Morrison
    71. JI Rotter
    72. W Gao
    73. DS Postma
    74. WB White
    75. SS Rich
    76. A Hofman
    77. T Aspelund
    78. D Couper
    79. LJ Smith
    80. BM Psaty
    81. K Lohman
    82. EG Burchard
    83. AG Uitterlinden
    84. M Garcia
    85. BR Joubert
    86. WL McArdle
    87. AB Musk
    88. N Hansel
    89. SR Heckbert
    90. L Zgaga
    91. JB van Meurs
    92. P Navarro
    93. I Rudan
    94. YM Oh
    95. S Redline
    96. DL Jarvis
    97. JH Zhao
    98. T Rantanen
    99. GT O'Connor
    100. S Ripatti
    101. RJ Scott
    102. S Karrasch
    103. H Grallert
    104. NC Gaddis
    105. JM Starr
    106. C Wijmenga
    107. RL Minster
    108. DJ Lederer
    109. J Pekkanen
    110. U Gyllensten
    111. H Campbell
    112. AP Morris
    113. S Gläser
    114. CJ Hammond
    115. KM Burkart
    116. J Beilby
    117. SB Kritchevsky
    118. V Gudnason
    119. DB Hancock
    120. OD Williams
    121. O Polasek
    122. T Zemunik
    123. I Kolcic
    124. MF Petrini
    125. M Wjst
    126. WJ Kim
    127. DJ Porteous
    128. G Scotland
    129. BH Smith
    130. A Viljanen
    131. M Heliövaara
    132. JR Attia
    133. I Sayers
    134. R Hampel
    135. C Gieger
    136. IJ Deary
    137. HM Boezen
    138. A Newman
    139. MR Jarvelin
    140. JF Wilson
    141. L Lind
    142. BH Stricker
    143. A Teumer
    144. TD Spector
    145. E Melén
    146. MJ Peters
    147. LA Lange
    148. RG Barr
    149. KR Bracke
    150. FM Verhamme
    151. J Sung
    152. PS Hiemstra
    153. PA Cassano
    154. A Sood
    155. C Hayward
    156. J Dupuis
    157. IP Hall
    158. GG Brusselle
    159. MD Tobin
    160. SJ London
    (2014)
    Nature Genetics 46:669–677.
    https://doi.org/10.1038/ng.3011
  21. 21
    Tgfβ signalling in context
    1. J Massagué
    (2012)
    Nature Reviews Molecular Cell Biology 13:616–630.
    https://doi.org/10.1038/nrm3434
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40

Decision letter

  1. Andrew P Morris
    Reviewing Editor; University of Liverpool, United Kingdom
  2. Mark I McCarthy
    Senior Editor; University of Oxford, United Kingdom
  3. Andrew P Morris
    Reviewer; University of Liverpool, United Kingdom
  4. Louise Wain
    Reviewer; University of Leicester, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Identification of an emphysema-associated genetic variant near TGFB2 with regulatory effects in lung fibroblasts" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Andrew P Morris as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Louise Wain (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The reviewers appreciated the work in bringing together the largest GWAS of emphysema to date, and agreed that the integration of GWAS findings with eQTL resources and additional epigenomic data was a good demonstration of how to move from locus to causal variant and effector gene. However, the reviewers were concerned that the association signal at the TGFB2 locus was driven by just COPDGene NHW, with no evidence of association in COPDGene AA or ECLIPSE.

Essential revisions:

1) Independent replication of the emphysema association signal at the TGFB2 locus is essential. Ideally, this would be in an additional study of emphysema, although supporting evidence from related phenotypes (e.g. COPD or lung function) would be an alternative.

2) More details are needed for the colocalization part of the work. As of now, only the reference for the approach is noted. It would be helpful to know the thresholds used and the results they observed. In addition, there are many different colocalization approaches and they tend to not always agree. We ask that the authors assess whether the results are robust to different colocalization methods.

3) The provision of the full results of the eQTL colocalisation analysis is commendable and potentially useful resource for the community. However, the authors should provide the caveat that at P<5x10-5 there are likely to be many false positive emphysema GWAS associations and so replication of the GWAS results should be sought prior to embarking on pursuit of the genes implicated by this analysis.

4) The associations on chromosome 15 (CHRNA3/f locus) and chromosome 19 (CYP2A6) reflect an effect on lung function via smoking behaviour and thus point primarily to addiction pathways. Accepting that it is difficult to entirely adjust for smoking in these analyses, the authors could comment on this and perhaps report (through comparison with published smoking GWAS data) which of the signals might be driven by smoking.

5) The association with the SERPINA1 z-allele is of interest but suggests that there might be individuals with alpha-1 anti-tryspin deficiency amongst the cases. This information should be provided.

6) The authors mentioned that they used PICS to derive "likelihoods" that variants are causal. It would be useful to have a brief description of this approach in the Materials and methods. What is the likelihood of the chosen SNP versus the other six? What is the motivation for the 5% threshold? A more usual approach would be to build a 99% credible set, and then interrogate those variants instead. Posterior probabilities should be provided for variants considered in downstream interrogation

7) For enrichments, the authors use 125 cell types from the Roadmap Epigenomics segmentations (subsection “Overlap of LHE SNPs with Epigenomic Marks in Roadmap Epigenomics Cell Types”) but then state that enhancer were defined by collapsing states 13-18 from Ernst and Kellis, 2015. However, Ernst and Kellis, 2015, does not describe the Roadmap Epigenomics chromatin states, so the reviewers were confused about how the enhancer states were actually generated.

8) The GoShifter approach has been shown to be suboptimal for calculating enrichments (Iotchkova et al., 2019) and we would instead recommend running GARFIELD, which has built-in Roadmap and ENCODE annotations. This would strengthen the enrichment results.

9) In the Discussion the authors note that the underlying regulatory element is called in some but not all fibroblasts data sets and hypothesize that this could represent restricted activity to some subset of fibroblasts, or some subset of conditions. If this is truly the causal variant, then couldn't the element call be a function of sample genotype too? And, thus if the ENCODE/Roadmap/etc. sample did not have the "active" genotype, a peak call might not be observable. We would encourage the authors to add this as a possible interpretation, unless there is justified motivation to report otherwise.

https://doi.org/10.7554/eLife.42720.022

Author response

Essential revisions:

1) Independent replication of the emphysema association signal at the TGFB2 locus is essential. Ideally, this would be in an additional study of emphysema, although supporting evidence from related phenotypes (e.g. COPD or lung function) would be an alternative.

We have included new results where we have reported the significant and direction of effect for all of our genome-wide significant associations in relation to FEV1, FEV1/FVC, and COPD status. These results were queried from the two largest GWAS meta-analyses by Shrine (spirometry measures) and Sakornsakolpat (COPD) which were both published this year in Nature Genetics (Table 4 in Supplementary file 1, Introduction, and subsection “GWAS Lookups of LHE Significant Variants in GWAS of FEV1, FEV1/FVC, COPD, and Smoking Behavior”).

2) More details are needed for the colocalization part of the work. As of now, only the reference for the approach is noted. It would be helpful to know the thresholds used and the results they observed. In addition, there are many different colocalization approaches and they tend to not always agree. We ask that the authors assess whether the results are robust to different colocalization methods.

We have clarified in the Materials and methods section the relevant parameters used for the colocalization analysis (subsection “eQTL Data and Colocalization Analysis”). To address the issue of the robustness of key colocalization results, we performed a separate colocalization analysis for the moderate centrilobular GWAS associations using the Sherlock method (Table 8 in Supplementary file 1, subsection “Validation of LHE Clinical and Genetic Associations”, and, subsection “Identification of Variants Predicted to Effect Transcription Factor Occupancy”). This analysis also identified TGFB2 as a colocalizing gene for this phenotype in GTEx fibroblasts expression data. We agree that colocalization results can often be parameter and method dependent and that functional validation of colocalization results is necessary. We have added text to the Discussion to make this point clear.

3) The provision of the full results of the eQTL colocalisation analysis is commendable and potentially useful resource for the community. However, the authors should provide the caveat that at P<5x10-5 there are likely to be many false positive emphysema GWAS associations and so replication of the GWAS results should be sought prior to embarking on pursuit of the genes implicated by this analysis.

We agree and this caveat has been added to the Discussion.

4) The associations on chromosome 15 (CHRNA3/f locus) and chromosome 19 (CYP2A6) reflect an effect on lung function via smoking behaviour and thus point primarily to addiction pathways. Accepting that it is difficult to entirely adjust for smoking in these analyses, the authors could comment on this and perhaps report (through comparison with published smoking GWAS data) which of the signals might be driven by smoking.

We have collected the p-values for association to smoking status for our genome-wide significant variants from the UK Biobank Pheweb server (Table 5 in Supplementary file 1, subsection “GWAS Lookups of LHE Significant Variants in GWAS of FEV1, FEV1/FVC, COPD, and Smoking Behavior”). Two of the 10 loci show association to smoking status at the 15q25 and 19q13 loci (subsection “Validation of LHE Clinical and Genetic Associations”, last paragraph).

5) The association with the SERPINA1 z-allele is of interest but suggests that there might be individuals with alpha-1 anti-tryspin deficiency amongst the cases. This information should be provided.

Subjects with alpha-1 antitrypsin deficiency were excluded from both COPDGene and ECLIPSE by genotyping and serum protein levels, respectively. However, we then checked the imputed genotype classes in ECLIPSE, and we identified six subjects imputed to have the PiZZ genotype. We accordingly excluded these subjects from the ECLIPSE analyses, which resulted in a higher p-value (0.003 versus 0.0004) with a consistent direction of effect. We repeated the meta-analysis (new p-value 1x10-7) and included these results in the text (subsection “Validation of LHE Clinical and Genetic Associations”, Materials and methods subsection “ECLIPSE” and “Common Variant Genetic Association Analysis of LHE Measures”).

6) The authors mentioned that they used PICS to derive "likelihoods" that variants are causal. It would be useful to have a brief description of this approach in the Materials and methods. What is the likelihood of the chosen SNP versus the other six? What is the motivation for the 5% threshold? A more usual approach would be to build a 99% credible set, and then interrogate those variants instead. Posterior probabilities should be provided for variants considered in downstream interrogation

We had inadvertently omitted the description of the methods for PICS variant prioritization and the use of the previously published model by Maurano et al. for predicting SNPs likely to cause allelic imbalance. These have now been included (subsection “Identification of Variants Predicted to Effect Transcription Factor Occupancy”), and we have updated the text (subsection “Fine Mapping Identifies a Candidate Causal Variant in the TGFB2 Locus”) to clarify our approach, which is as follows: we wished to use multiple methods to produce a small a set of putative functional variants as possible for functional prioritization. In this case, rs1690789 was the only variant that had a reasonable PICS likelihood (i.e. > 5% causal probability) that was also predicted to cause allelic imbalance based on the Maurano model. Fortunately, this aggressive bioinformatic prioritization seems to have been successful for identifying a functional region near TGFB2, but we have taken care to state in the Discussion that there may well be other disease causing functional regions near TGFB2 that would be identified through a more comprehensive (and expensive) functional screening approach, such as MPRA.

7) For enrichments, the authors use 125 cell types from the Roadmap Epigenomics segmentations (subsection “Overlap of LHE SNPs with Epigenomic Marks in Roadmap Epigenomics Cell Types”) but then state that enhancer were defined by collapsing states 13-18 from Ernst and Kellis, 2015. However, Ernst and Kellis, 2015, does not describe the Roadmap Epigenomics chromatin states, so the reviewers were confused about how the enhancer states were actually generated.

Ernst and Kellis, 2015, refers to the development of ChromImpute epigenomic marks from Roadmap experiments, which are the marks that we used to identify overlap between rs1690789 and DNaseI peaks in Roadmap cell lines and cell types (subsection “CRISPR/Cas9 rs1690789 knockout”). We have removed reference to the enhancer states (which were generated by collapsing Chromimpute states 13-18), since our analysis of rs1690789 overlap is limited to DNaseI peaks.

8) The GoShifter approach has been shown to be suboptimal for calculating enrichments (Iotchkova et al., 2019) and we would instead recommend running GARFIELD, which has built-in Roadmap and ENCODE annotations. This would strengthen the enrichment results.

We completed the GARFIELD enrichment analysis and have included these results in the manuscript (subsection “eQTL Colocalization Analysis to Identify Candidate GWAS Target Genes and Tissue Enrichment of LHE GWAS signals”, last paragraph, Materials and methods subsection “Overlap of rs1690789 with Cell-Specific DNaseI Peaks”), which confirm enrichment of moderate centrilobular GWAS signals in fibroblast DNaseI peaks. We appreciate this very helpful suggestion.

9) In the Discussion the authors note that the underlying regulatory element is called in some but not all fibroblasts data sets and hypothesize that this could represent restricted activity to some subset of fibroblasts, or some subset of conditions. If this is truly the causal variant, then couldn't the element call be a function of sample genotype too? And, thus if the ENCODE/Roadmap/etc. sample did not have the "active" genotype, a peak call might not be observable. We would encourage the authors to add this as a possible interpretation, unless there is justified motivation to report otherwise.

We agree with this point and have included this in the text (Help “eQTL Colocalization Analysis to Identify Candidate GWAS Target Genes and Tissue Enrichment of LHE GWAS signals” third paragraph and Discussion, fourth paragraph) as a potential explanation for variability in epigenetic marks across fibroblast cell types in Roadmap.

https://doi.org/10.7554/eLife.42720.023

Article and author information

Author details

  1. Margaret M Parker

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Formal analysis, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  2. Yuan Hao

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Formal analysis, Investigation, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Feng Guo

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Formal analysis, Investigation, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Betty Pham

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Formal analysis, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  5. Robert Chase

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Data curation, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  6. John Platig

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Writing—review and editing
    Competing interests
    No competing interests declared
  7. Michael H Cho

    1. Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    2. Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Data curation, Writing—review and editing
    Competing interests
    reports grants from GSK and personal fees from Genentech
  8. Craig P Hersh

    1. Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    2. Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Funding acquisition, Project administration, Writing—review and editing
    Competing interests
    reports personal fees from Mylan, personal fees from AstraZeneca, Concert Pharmaceuticals, 23andMe, grants from Novartis, and Boehringer-Ingelheim
  9. Victor J Thannickal

    Division of Pulmonary, Allergy and Critical Care, Department of Medicine, School of Medicine, University of Alabama at Birmingham, Birmingham, United States
    Contribution
    Writing—review and editing
    Competing interests
    No competing interests declared
  10. James Crapo

    Division of Pulmonary, Critical Care and Sleep Medicine, National Jewish Health, Denver, United States
    Contribution
    Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared
  11. George Washko

    Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Data curation, Funding acquisition, Writing—review and editing
    Competing interests
    reports grants and other support from Boehringer Ingelheim, PulmonX, BTG Interventional Medicine, Janssen Pharmaceuticals and GSK
  12. Scott H Randell

    Marsico Lung Institute, The University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Resources, Data curation, Funding acquisition, Writing—review and editing
    Competing interests
    reports receiving personal fees from Amgen
  13. Edwin K Silverman

    1. Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    2. Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Funding acquisition, Writing—review and editing
    Competing interests
    received honoraria from Novartis for Continuing Medical Education Seminars and grant and travel support from GlaxoSmithKline
  14. Raúl San José Estépar

    Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Writing—review and editing
    Competing interests
    reports personal fees from Boehringer Ingelheim, Eolo Medical and Toshiba
  15. Xiaobo Zhou

    Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    xiaobo.zhou@channing.harvard.edu
    Competing interests
    No competing interests declared
  16. Peter J Castaldi

    1. Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, United States
    2. Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, United States
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    peter.castaldi@channing.harvard.edu
    Competing interests
    has received research support and consulting fees from GSK and Novartis
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9920-4713

Funding

National Heart, Lung, and Blood Institute (R01 HL124233)

  • Peter J Castaldi

National Heart, Lung, and Blood Institute (R01 HL126596)

  • Peter J Castaldi

NHLBI (R01HL089897)

  • Edwin K Silverman

NHLBI (R01HL089856)

  • James Crapo

NHLBI (R01HL113264)

  • Edwin K Silverman

NHLBI (P01105339)

  • Edwin K Silverman

NHLBI (P01HL114501)

  • Edwin K Silverman

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by NHLBI U01HL089897, R01HL089897, R01HL089856, R01HL124233, R01HL126596, R01HL113264, P01105339, and P01HL114501. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. The Norway GenKOLS (Genetics of Chronic Obstructive Lung Disease, GSK code RES11080) and the ECLIPSE studies (NCT00292552; GSK code SCO104960) were funded by GSK. The Marsico Lung Institute is supported by the Cystic Fibrosis Foundation (BOUCHE15R0) and NIH (DK065988). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.

Senior Editor

  1. Mark I McCarthy, University of Oxford, United Kingdom

Reviewing Editor

  1. Andrew P Morris, University of Liverpool, United Kingdom

Reviewers

  1. Andrew P Morris, University of Liverpool, United Kingdom
  2. Louise Wain, University of Leicester, United Kingdom

Publication history

  1. Received: October 9, 2018
  2. Accepted: July 25, 2019
  3. Accepted Manuscript published: July 25, 2019 (version 1)
  4. Version of Record published: August 14, 2019 (version 2)

Copyright

© 2019, Parker et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 483
    Page views
  • 82
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    M Florencia Camus et al.
    Research Article
    1. Evolutionary Biology
    2. Genetics and Genomics
    Lauren N Booth et al.
    Research Article Updated