Faroese whole genomes provide insight into ancestry and recent selection

  1. Iman Hamid
  2. Ólavur Mortensen
  3. Alba Refoyo-Martínez
  4. Leivur N Lydersen
  5. Anne-Katrin Emde
  6. Melissa Hendershott
  7. Katrin D Apol
  8. Guðrið Andorsdóttir
  9. Jonas Meisner
  10. Kaja A Wasik
  11. Fernando Racimo  Is a corresponding author
  12. Stephane E Castel  Is a corresponding author
  13. Noomi O Gregersen  Is a corresponding author
  1. Variant Bio Inc., United States
  2. FarGen, Department of Research, National Hospital of the Faroe Islands, Faroe Islands
  3. Centre of Health Science, University of the Faroe Islands, Faroe Islands
  4. Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Denmark
  5. Mental Health Centre Copenhagen, Copenhagen University Hospital, Denmark
  6. Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Denmark
5 figures and 6 additional files

Figures

Figure 1 with 1 supplement
A Faroese whole genome reference.

(A) Map of the Faroe Islands, colored by the six sampling regions. The number of minimally related FarGen participants from each region selected for whole genome sequencing is indicated. (B) Principal component analysis (PCA) of Faroese genomes jointly called with relevant 1000 Genomes reference data shows separation of European groups by PCs 3 and 4 (FARO, Faroese, CEU, Central Europeans, GBR, British, FIN, Finnish, IBS, Iberian, TSI, Tuscan, CHB, Han Chinese, YRI, Yoruban). (C) Faroese enriched putatively functional alleles visualized by minor allele count, CADD score, and Variant Effect Predictor (VEP) consequence. Variants shown are those with CADD >30 and at least two minor alleles observed in Faroese individuals, and no minor alleles observed in Finnish or Northern European reference individuals. (D) HLA-B allele frequencies for alleles detected at least twice in Faroese individuals. In this cohort, one minor allele corresponds to an allele frequency of 1.25%.

Figure 1—figure supplement 1
Quality control of whole genome sequencing data.

(A–D) Distributions of QC metrics across Faroese samples sequenced (mean depth, median insert size, mapping rate, duplicate rate). (E) Mean whole genome sequencing (WGS) depth versus number of genotype calls with GQ >20. Singletons and calls with GT ≤20 were discarded before imputation. (F) Principal component analysis (PCA) of Faroese genomes jointly called with relevant 10000 Genomes reference data captures African ancestry in the first component and East Asian ancestry in the second component (FARO, Faroese, CEU, Central Europeans, GBR, British, FIN, Finnish, IBS, Iberian, TSI, Tuscan, CHB, Han Chinese, YRI, Yoruban).

Figure 2 with 2 supplements
Runs of homozygosity by group.

Amount of the genome (Mb) contained in runs of homozygosity (ROH) stratified by group. Top panel is the sum total of the genome contained within ROH, with the other panels showing this split by length (short, medium, and long).

Figure 2—figure supplement 1
Kinship matrices between modern populations.

(A) Kinship estimated by popkin, including global reference populations from the 1000 Genomes and the Faroese whole genome sequencing (WGS) cohort (Faroese, FARO). (B) Kinship estimated by popkin for the Faroese WGS cohort only for better visualization. The matrix is rescaled after subsetting the individuals, so although the scales are different, the relative kinship within the cohort remains the same for the two plots. The matrices are symmetrical and ordered by population or region label as indicated by the colored bars along the rows and columns. The diagonal of each matrix is the estimated inbreeding coefficient. The Faroes region labels are: VM = Vágar and Mykines; SR = Suðuroy; SM = Suðurstreymoy; SD = Sandoy, Skúvoy, Stóra Dímun; NG = Norðoyggjar; and EN = Eysturoy og Norðstreymoy.

Figure 2—figure supplement 2
ROH length distributions by group.

Length (Mb) of all runs of homozygosity (ROH) stratified by group.

Figure 3 with 2 supplements
Selection scan results for Faroese and British cohorts.

(A) Log-transformed two-tailed p-value of the standardized integrated haplotype score (iHS) in the 40 Faroese genomes (FARO). (B) Log-transformed two-tailed p-value of the standardized iHS for 90 British whole genome sequencing (WGS) samples from 1000 Genomes (British, GBR). (C) log-transformed two-tailed p-value for the standardized cross-population expected haplotype homozygosity (XPEHH) for FARO vs GBR (only positive values, which indicate selection in FARO, are plotted). Some genes in the top loci are indicated on each plot. The p-value cutoffs which correspond to a False Discovery Rate (FDR) at 0.01 and 0.001 are, respectively, indicated by the red dotted line and blue dashed line in each plot. (A) For iHS in FARO, these cutoffs are p=2.72 × 10–6 (FDR = 0.01) and p=9.20 × 10–8 (FDR = 0.001). (B) For iHS in GBR, the cutoffs are p=2.78 × 10–6 (FDR = 0.01) and p=1.75 × 10–7 (FDR = 0.001). (C) For XPEHH in FARO vs GBR, the cutoffs are p=2.35 × 10–6 (FDR = 0.01) and p=3.01 × 10–8 (FDR = 0.001). See Methods for details on p-value and FDR estimation.

Figure 3—figure supplement 1
Q-Q plots and p-value histograms for selection statistics.

Histograms of p-value distributions for (A) integrated haplotype score (iHS) in the Faroese (FARO) haplotypes, (C) iHS in the British (GBR) haplotypes, (E) cross-population expected haplotype homozygosity (XPEHH) comparing FARO and GBR. Q-Q plots for observed versus expected log-transformed p-values for (B) iHS in FARO, (D) iHS in GBR (F) XPEHH between FARO and GBR. The estimated lambda inflation value is shown in each Q-Q plot.

Figure 3—figure supplement 2
Selection scan results for Faroese and British cohorts.

(A) Absolute value of the standardized integrated haplotype score (iHS) in the 40 Faroese genomes (FARO). (B) Absolute value of the standardized integrated haplotype score (iHS) for 90 British whole genome sequencing (WGS) samples from 1000 Genomes (GBR). (C) The standardized cross-population expected haplotype homozygosity (XPEHH) for FARO (all positive values). The top 0.5% of results are indicated by the blue dashed line in each plot, and the top 0.01% of results are indicated by the red dotted line in each plot. Some genes in the top loci are indicated on each plot.

Figure 4 with 2 supplements
Haplotype visualizations for the LCT/MCM6 locus.

(A) Decay in Expected Haplotype Homozygosity (EHH) and (B) haplotype furcation plot for Faroese (FARO) centered on lactase persistence allele rs4988235; chr2_135851076_G_A. (C) Decay in EHH for 1000 Genomes British (GBR) and (D) haplotype furcation for GBR centered on the same allele. (E) Haplostrips visualization of haplotype structure in the region chr2:135677850–135986443. In this panel, columns correspond to segregating alleles, and rows correspond to individuals. In the haplotype furcation plots (panels B and D), the haplotypes for the reference allele (G) are in blue, and those for the alternate allele (A) are in red.

Figure 4—figure supplement 1
Haplotype visualizations for top XPEHH variant in SLC10A1/SRSF5 locus.

(A) Decay in Expected Haplotype Homozygosity per Site (EHHS) for chr14_69775276_C_T, comparing Faroese (FARO, teal) and British (GBR, red) haplotypes. (B) Lengths for distinct haplotypes spanning chr14_69775276_C_T comparing FARO (teal) and GBR (red). (C) haplotype furcation plot for FARO centered on chr14_69775276_C_T (D) haplotype furcation for GBR centered on the same allele. In the haplotype furcation plots (panels C and D), haplotypes for the reference allele (C) are in blue, and those for the alternate allele (T) are in red.

Figure 4—figure supplement 2
Haplotype visualizations for top XPEHH variant in POLQ locus.

(A) Decay in Expected Haplotype Homozygosity per Site (EHHS) for chr3_121526194_G_A, comparing Faroese (FARO, teal) and British (GBR, red) haplotypes. (B) Lengths for distinct haplotypes spanning chr3_121526194_G_A comparing FARO (teal) and GBR (red). (C) haplotype furcation plot for FARO centered on chr3_121526194_G_A (D) haplotype furcation for GBR centered on the same allele. In the haplotype furcation plots (panels C and D), haplotypes for the reference allele (G) are in blue, and those for the alternate allele (A) are in red.

Figure 5 with 3 supplements
Principal component analysis (PCA) of 616 ancient imputed genomes from Europe and 40 present-day Faroese genomes.

Each individual is depicted as a pie chart, showing ancestry proportions estimated using HaploNet. Ancestry proportions for ancient individuals were estimated unsupervised, while those for present-day Faroese individuals were estimated semi-supervised using ancient genomes as references. The five colors represent different ancestral sources: orange for West Europe, green for North Europe, blue for Steppe, purple for the Levant and East Mediterranean, and red for East Europe. The geographical distribution (bottom-right) highlights historical samples (250 years BP) in red, this study’s samples in black, and an 800-year-old individual sample in blue.

Figure 5—figure supplement 1
Admixture plot showing proportions for 616 imputed ancient genomes from Europe together with 40 present-day Faroese genomes from this study.

Ancient individual groups are categorized based on patterns of IBD clustering as inferred in Allentoft et al., 2024. The plot uses five colors to represent different ancestral sources, which are maximized in individuals in different regions of Europe: orange for ‘West Europe’, green for ‘North Europe’, blue for ‘Steppe’, purple for the ‘Levant and East Mediterranean’, and red for ‘Eastern Europe’.

Figure 5—figure supplement 2
Map illustrating the geographical distribution of ancestry proportions for 616 ancient imputed individuals from Europe and 40 modern Faroese genomes from this study.

The map is divided into panels to capture both geographical and temporal variations across Europe. Each pie chart on the map represents admixture proportions with five colors: green, orange, blue, red, and purple maximize Northern European, Celtic, Steppe, Eastern European, and Levant/East Mediterranean ancestries, respectively, reflecting different ancestral sources.

Figure 5—figure supplement 3
DATES estimated ancestry covariance and least squares exponential fit.

Plots were output by the DATES software for (A) 11 historical Faroese individuals (dated to approximately the 18th century) and (B) 40 modern Faroese individuals.

Additional files

Supplementary file 1

Whole genome sequencing quality control metrics.

Unique identifier (subject), sex, region, mean WGS depth (mean_depth), mean depth on X chromosome (mean_depth_chrX), mean depth on Y chromosome (mean_depth_chrY), median insert size (insert_median), mean absolute deviation of insert size (insert_mad), contamination estimate (contamination), and number of genotype calls with GQ >20 (non_imputed_calls). The Faroes region labels are: VM = Vágar and Mykines; SR = Suðuroy; SM = Suðurstreymoy; SD = Sandoy, Skúvoy, Stóra Dímun; NG = Norðoyggjar; and EN = Eysturoy og Norðstreymoy.

https://cdn.elifesciences.org/articles/107428/elife-107428-supp1-v1.xlsx
Supplementary file 2

Faroese putatively functional alleles.

Variants in the joint call set with CADD >30 and at least two minor alleles observed in Faroese individuals, and no minor alleles observed in Finnish or Northern European reference individuals. Allele frequencies in Faroese individuals (af_FARO), allele frequencies in 1000 Genomes British (af_GBR), Central European (af_CEU), and Finnish (af_FIN), and allele frequencies in 1000 G Europeans (af_1000 G_EUR) and gnomAD (af_GNOMAD) references.

https://cdn.elifesciences.org/articles/107428/elife-107428-supp2-v1.xlsx
Supplementary file 3

HLA-B allele frequencies.

Counts of HLA-B alleles out of the total 80 haplotypes and mean quality scores in the Faroese WGS cohort, as well as HLA-B allele frequencies in three 1000 Genomes European cohorts (British [GBR], Central European [CEU], and Finnish [FIN]). HLA genotypes were determined using HLA*LA.

https://cdn.elifesciences.org/articles/107428/elife-107428-supp3-v1.xlsx
Supplementary file 4

Genomic regions with strong signals of positive selection.

Genomic regions for the top 10 standardized iHS and XPEHH values in the Faroese WGS cohort. We list the number of variants and all variant IDs in these regions which reach the p-value threshold corresponding to FDR <0.01 (iHS: p<2.72 × 10–6; XPEHH: p<2.35 × 10–6). The regions are defined by the first and last variant position with a p-value below the cutoff for each particular statistic, as indicated in the ‘statistic’ column. Protein-coding gene name(s), if any, with annotated variants reaching these thresholds are provided. We additionally provide the maximum absolute value of the given statistic (‘max_statistic’), its corresponding p-value and q-value (i.e. the minimum FDR when that particular test is considered significant), and the allele frequencies in the Faroese and British cohorts for the variant(s) with the maximum value. The variant(s) with the maximum value for each region are highlighted in bold font.

https://cdn.elifesciences.org/articles/107428/elife-107428-supp4-v1.xlsx
Supplementary file 5

Admixture and ancient ancestry samples.

Region, country, approximate age, publication source, and group label for ancient ancestry samples included in the ADMIXTURE and HaploNet analyses.

https://cdn.elifesciences.org/articles/107428/elife-107428-supp5-v1.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/107428/elife-107428-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Iman Hamid
  2. Ólavur Mortensen
  3. Alba Refoyo-Martínez
  4. Leivur N Lydersen
  5. Anne-Katrin Emde
  6. Melissa Hendershott
  7. Katrin D Apol
  8. Guðrið Andorsdóttir
  9. Jonas Meisner
  10. Kaja A Wasik
  11. Fernando Racimo
  12. Stephane E Castel
  13. Noomi O Gregersen
(2026)
Faroese whole genomes provide insight into ancestry and recent selection
eLife 14:RP107428.
https://doi.org/10.7554/eLife.107428.3