Native American genetic ancestry and pigmentation allele contributions to skin color in a Caribbean population

  1. Khai C Ang  Is a corresponding author
  2. Victor A Canfield
  3. Tiffany C Foster
  4. Thaddeus D Harbaugh
  5. Kathryn A Early
  6. Rachel L Harter
  7. Katherine P Reid
  8. Shou Ling Leong
  9. Yuka Kawasawa
  10. Dajiang Liu
  11. John W Hawley
  12. Keith C Cheng  Is a corresponding author
  1. Department of Pathology, Penn State College of Medicine, United States
  2. Jake Gittlen Laboratories for Cancer Research, Penn State College of Medicine, United States
  3. Department of Family & Community Medicine, Penn State College of Medicine, United States
  4. Department of Biochemistry and Molecular Biology, Penn State College of Medicine, United States
  5. Department of Pharmacology, Penn State College of Medicine, United States
  6. Institute of Personalized Medicine, Penn State College of Medicine, United States
  7. Department of Public Health Sciences, Penn State College of Medicine, United States
  8. Salybia Mission Project, Dominica
5 figures, 10 tables and 5 additional files


Figure 1 with 3 supplements
Admixture analysis of Kalinago compared with Human Genome Diversity Project populations.

Results are depicted using stacked bar plots, with one column per individual. At K=3, the Kalinago, Native Americans, Oceanians, and East Asians fall into the same green cluster. At K=4, the Native Americans (red cluster) are separated from the East Asians (green cluster). Figure 1—figure supplement 1 shows the expanded admixture plot for K=6 with each populations labeled. Figure 1—figure supplement 2 shows the location of Kalinago Territory where fieldwork was performed.

Figure 1—source data 1

The source data contains results from Admixture analysis.
Figure 1—figure supplement 1
Admixture plot of Kalinago compared to Human Genome Diversity Project data from K=3 to K=6.

Expended admixture plot at K=6 labeled each of the populations used, from panels (A) to (F).

Figure 1—figure supplement 2
Map showing the location of Kalinago Territory in the Commonwealth of Dominica.

Dominica, also known as Wai’tu kubuli in the Kalinago language, is clustered with the Leeward Islands in the Lesser Antilles archipelago of the Caribbean Sea. Main map situates Dominica within the Eastern Caribbean. Inset shows Dominica, with location of Kalinago Reservation (blue) in relation to parishes and principal towns. (Map modified from SESA CROP Report and Google Maps.)

Figure 1—figure supplement 3
Age distribution of sampled Kalinago individuals.

Histogram shows age in years at last birthday for all sampled individuals for whom this information was collected (n=455).

Figure 2 with 2 supplements
Comparison of Kalinago genetic ancestry with that of other populations in the Western Hemisphere.

Ternary plots of genetic ancestry from our work and the literature show estimated proportions of African (AFR), European (EUR), and Native American (NAM) genetic ancestry. (A) Comparison of individuals (n=452, omitting 6 individuals with EAS >0.1) genotyped in this study to individuals (n=38) from southern Dominica sampled by Benn Torres et al., 2013. (B) Comparison of the Kalinago average genetic ancestry with other Native American populations. Kalinago, this study (n=458); Islands (BT) indicates Caribbean islanders reported in Benn Torres et al., 2013, with Dominica labeled; admixed (adm) AFR (1000 Genomes Project [1KGP]) and admixed NAM (1KGP) represent admixed populations from Auton et al., 2015, with Caribbean samples PUR (Puerto Rico) and ACB (Barbados) labeled; and AMR (Reich) indicates mainland Native American samples reported in Reich et al., 2012. Inset (top left) shows ancestries at vertices.

Figure 2—source data 1

Source data contains result from PCA analysis for Kalinago versus other Native American populations in the Western Hemisphere.
Figure 2—figure supplement 1
Principal components (PCs) analysis (PCA) of Kalinago and comparison populations.

PCA was performed on HGDP sample (940 individuals), with 458 Kalinago individuals projected on the same axes. (A) PC1 and PC2; (B) PC1 and PC3. In both panels, HGDP individuals are colored to indicate cluster membership (AFR, African; nAFR/ME, Northern Africa and Middle East; EUR, Europe; CSA, Central and Southern Asia; EAS, East Asia; OCE, Oceania; NAM, Native American). Genetic ancestry was represented by the first 10 PCs because AFR and NAM ancestries are not independent of each other. The first PC correlated strongly with AFR or NAM genetic ancestry (r2=0.94 and 0.97, respectively), but also with EUR genetic ancestry (r2=0.32). Several other PCs displayed considerably lower levels of correlation with genetic ancestry (r2<0.1 for EUR and r2<0.05 for EAS). Individuals homozygous for the albino variant were excluded from association analyses. Association analysis did not reveal any novel variants that reached genome-wide significance, after correction for statistic inflation. The inflation factor (lambda) for the full genotyped sample excluding the albinos (n=444) sample was 1.349. Values of lambda for the nine N=50 subsets ranged from 1.001 to 1.184 (median 1.075), suggesting that the elimination of second-order relatives did not remove all effects of relatedness.

Figure 2—figure supplement 2
Genetic ancestry distribution as function of community-defined ancestry.

Individual genetic ancestry fraction was estimated using admixture (K=4) as described. Individuals identified as (A) ‘Kalinago’ (n=72) have higher NAM and lower AFR and EUR genetic ancestry than those identified as (B) ‘Mixed’ (n=36). Despite considerable overlap in genetic ancestry proportions between individuals, the distributions are distinctly different. Compared to individuals identified as ‘Mixed,’ those identified as ‘Kalinago’ have on average more Native American genetic ancestry (67% vs 51%), less European genetic ancestry (10% vs 14%), and less African genetic ancestry (23% vs 34%). Similarly, the phenotypic distributions of the two groups differed.

Haplotype analysis for three albino individuals.

The inner two lines indicate NAM (red) or AFR (dark blue) genetic ancestry; no EUR genetic ancestry was found in this genomic region. For this local genetic ancestry analysis, the region shown here consisted of 110 non-overlapping segments with 7–346 SNPs each (mean 65). The deduced extent of shared albino haplotype (dotted light blue lines) is indicated on each chromosome. The common region of overlap indicated by the minimum homozygous region (determined by albino individual 1) shared by all three albino individuals is shown at expanded scale below. Genes in this region are labeled, and the position of the NW273KV polymorphism in OCA2 is indicated by the red arrowhead.

Figure 4 with 2 supplements
Skin color distribution of Kalinago samples according to genotype.

The ‘triple ancestral’ plot is individuals ancestral for three pigmentation loci (SLC24A5111A, SLC45A2374L, and OCA2273NW). In the other plots, heterozygosity or homozygosity is indicated for the variants: OCA2NW273KV; SLC24A5A111T; and SLC45A2L374F. Individuals depicted in the second through fourth panels are repeated if they carry variants at more than one locus. M-index of the Kalinago ranged from 20.7 to 79.7 (Figure 4—figure supplement 1) and the histogram of skin color based on community-defined ancestry are shown in Figure 4—figure supplement 2.

Figure 4—source data 1

The source file contain melanin index distribution as function of community-described ancestry.
Figure 4—source data 2

The source data contains data of melanin indices according to genotype.
Figure 4—figure supplement 1
Skin color distribution of the Kalinago from Commonwealth of Dominica.

We collected 462 Kalinago who live in the Kalinago Reservation. Each participant was asked a set of questions about their genealogical ancestry, gave their saliva sample, and have their skin color measured under their arm.

Figure 4—figure supplement 2
Melanin index distribution as function of community-described ancestry.

Individuals described as (A) ‘Kalinago’ (n=72) were slightly lighter and had a narrower melanin index unit (MI) distribution (42.5±5.6, mean ± SD) than those described as (B) ‘Mixed’ (45.8±9.6).

Figure 5 with 2 supplements
Dependence of melanin unit on genetic ancestry for Kalinago.

Only individuals who are ancestral for SLC24A5111A, SLC45A2374L, and OCA2273NW alleles are shown (n=279). The dotted red line represents the best fit (linear regression). Slope is –24.3 (melanin index unit [MI] = –24.3*NAM+61.9); r2=0.2722.

Figure 5—figure supplement 1
Estimated power for GWAS using Kalinago sample.

Simulations were performed as described in Materials and methods, using genotyped SNPs with estimated frequency difference between African and Native American ancestral populations of at least 0.7 and adjusted p-value of at least 0.1.

Figure 5—figure supplement 2
Q-Q plots for association analyses to identify novel SNPs that may contribute towards skin pigmentation in the Kalinago samples.

All estimates (blue dots) were calculated using SLC24A5A111T, SLC45A2L374F, OCA2NW273KV, and sex as covariates. Plotted values are not corrected for statistical inflation. Statistical inflation appears to exceed that predicted from the median when using LMM approaches, for our dataset. The red line shows expected values; dashed red line shows expected values based on statistical inflation (lambda) calculated from the median. (A) Linear regression with 10 principal components (PCs) included as covariates, lambda = 1.342; (B) linear mixed model (LMM) with no PCs, standard genetics relatedness matrix (GRM), lambda = 1.024; (C) LMM with 10 PCs, standard GRM, lambda = 1.031; (D) LMM with 10 PCs, REAP GRM. Lambda = 1.068. Results are consistent regardless of the test used. Blue, flat dots on the top right of the charts are SNPs found in the same gene and having the same p-value. While the LMM-based methods meet the conventional criterion of 5e-08 for genome wide significance (Appendix 3—table 1), our interpretation is that none of these variants warrant further investigation. Low observed minor allele frequencies (<2%) are inconsistent with those expected for variants responsible for pigmentation differences between the African and Native American populations.


Table 1
Albinism among NW273KV and R305W genotypes.
Allele/genotypeNW273KV genotype
Homozygous ancestral*HeterozygousHomozygous derivedTotal
R305W genotypeHomozygous ancestral39800398
Homozygous derived1135
  1. *

    Ancestral = reference allele and derived = alternate allele for both variants.

  2. Albino phenotype. Notably, none of the other genotypic categories are albino individuals.

Table 2
Effect sizes for covariates in linear regression model with 10 principal components.
CovariateEffect size (MI)p-Value
rs1426654 (SLC24A5A111T)–5.81.5E-12
rs16891982 (SLC45A2L374F) –4.46.7E-05
Albino allele (OCA2NW273KV) –7.72.2E-05
Sex (female vs male)–2.45.0E-04
  1. aPer allele effect size, in melanin units, for A111T and L374F; effect of first allele for albino variant.

Appendix 1—table 1
Sample Demographics.
CategoryEntire sample (N=461)
 mean (SD)39 (21.5)
Paternal ancestry
Maternal ancestry
  1. *

    community-described ancestry collected.

  2. values from reported genealogy; 75 fathers and 146 mothers as determined by genotyping.

Appendix 1—table 2
Summary of Kalinago ancestry from admixture analysis (n=458).

NAM = Native American, AFR = African, EUR = European, CSA = Central & South Asian, EAS = East Asian, OCE = Oceanian. At K=3, NAM, EAS, and OCE are not distinguishable.

Appendix 1—table 3
Ancestry proportions estimated using different approaches.
estimation approachAMRAFREUREAS
Admixture (subsets, K=4)0.5490.3180.1220.011
Admixture (two stage, K=4)0.5410.3160.1260.016
rfmix (4 clusters)0.5530.3130.1250.009
rfmix (3 clusters)0.5570.3260.117---
Appendix 1—table 4
Summary by locus of albinism candidates identified through exome sequencing.

Candidates are homozygous derived in one albino and heterozygous in one obligate carrier. No nonsense, frameshift, or splice variants was detected. Our initial attempt to identify the albinism variant in the Kalinago involved targeted genotyping of the albino individuals for 28 mutations previously observed (Honychurch, 2012; Honychurch, 1998; Li et al., 2008; Loomis, 1967) in African or Native American albinos; these included the 2.7 kb exon 7 deletion in OCA2 found at high frequency in some African populations. ­No mutation was detected using this approach.

OCA geneChromosomeVariantsMissense
OCA1 (TYR)110
OCA3 (TYRP1)90
OCA4 (SLC45A2)50
OCA6 (SLC24A5)150
OCA7 (LRMDA)1010
B. Characteristics of individual candidates identified through exome sequencing
ChrrsIDRefAltf(AFR)*GeneLocation/ Effect
  1. *

    Overall frequency for non-reference allele in seven 1KGP African populations.

  2. 1KGP describes this variant as four consecutive SNPs rs549973474, rs569395077, rs538385900 and rs558126113.

Appendix 1—table 5
Effect sizes for covariates in linear regression model with 10 Principal Components.

Effect sizes are per allele for genomic variants (first allele only for albino variant). PC1 variance for included individuals (n=452) is 0.0045. P values adjusted using genomic control (applied to GWAS on the full variant set) are omitted if raw P value is above 0.05.

variableGenestandardBETAP_rawQ (-log P)alt1BETAP_rawQBeta-ratio-std
  1. This table compares three versions of analysis (linear regression only).

  2. P values reported here (and Q = – log P) are not corrected for statistic inflation.

  3. The last column for each non-standard case shows ratio of the effect size to that for the standard model, omitting PCs other than.

  4. alt1 model adds age to standard analysis.

  5. alt2 model adds five additional SNPs to standard analysis.

Appendix 1—table 6
Amplification conditions used for genotyping Kalinago samples for the selected alleles.
Gene & VariantPrimer SequencePCR Annealing Temperature (°C)
Appendix 2—table 1
Effect sizes of previously reported variants in Kalinago samples.
CHRpos (b37)SNPREFALTgenelocationCADD_PHREDPolyphen (main)SIFT (main)FreqGT sourceAR2BETA_aP_a_rawP_a_adjBETA_bP_bBETA_cP_cBETA_dP_dreference(s)
225329016rs12233134CTEFR3B near POMCintronic0.348--0.473IMP10.610.1970.2653–0.060.9103650.150.7715450.230.657335Quillen et al., 2012
6457748rs4959270CALOC105374875 near IRF4intronic1.041--0.329GT10.070.87960.8959–0.140.783747–0.080.880465–0.070.896826Sulem et al., 2007
6466033rs1540771CTLOC105374875 near IRF4intronic0.95--0.305GT1–0.020.97030.9744–0.150.765123–0.080.875216–0.090.853243Sulem et al., 2007
6154663568rs2333857AGIPCEF1 near OPRM1intronic or upstream3.27--0.813IMP11.330.029320.059760.780.2256511.160.07191041.220.059139Quillen et al., 2012
6154721557rs6917661CTCNKSR3 near OPRM13'UTR or downstream1.824--0.584GT11.080.01170.029380.610.201660.720.1332230.760.111546Quillen et al., 2012
755109177rs12668421ATEGFRintronic0.212--0.494IMP0.98–0.160.74730.7809–0.390.46139–0.140.787191–0.150.780554Quillen et al., 2012
755156071rs11238349GAEGFRintronic0.431--0.393IMP10.340.480.542–0.650.22091–0.370.479127–0.340.523937Quillen et al., 2012
755454267rs4948023GALANCL2 near EGFRintronic4.667--0.684IMP10.160.76410.7956–0.220.6961310.000.9966040.000.994267Quillen et al., 2012
912682663rs10809826CGTYRP1upstream1.738--0.117IMP0.96–0.280.68030.7221–0.550.472775–0.650.392833–0.730.332409Adhikari et al., 2019
916864521rs2153271CTBNC2intronic20.5--0.152GT1–2.270.00023370.001459–2.290.00110657–2.250.00120438–2.370.00065015Ju and Mathieson, 2021
10119564143rs11198112CTnear EMX2intergenic17.790.187GT10.330.56640.62061.100.0673850.990.1002630.960.11598Adhikari et al., 2019
1188511524rs7118677GTGRM5 near TYRintronic2.034--0.144IMP1–2.420.00013920.000982–1.620.0152453–1.990.00286815–2.100.00151048Adhikari et al., 2019
1188911696rs1042602CATYRS192Y23.8probably_damaging(0.974)deleterious(0.01)0.072IMP0.74–2.900.00037010.002074–2.160.0119931–2.430.00469847–2.560.00273999Stokowski et al., 2007
1189011046rs1393350GATYRintronic1.555--0.019GT1–3.510.019470.04354–2.900.0869741–3.380.042737–3.570.0261186Liu et al., 2015
1189017961rs1126809GATYRR402Q27.2probably_damaging(0.994)deleterious(0.03)0.019IMP0.97–3.510.019470.04354–2.900.0869741–3.380.042737–3.570.0261186Adhikari et al., 2019; Ju and Mathieson, 2021
1289299746rs642742CTKITLGupstream14.92--0.569GT1–0.340.47570.538–0.200.708049–0.300.568757–0.280.602548Sturm, 2009
1289328335rs12821256TCKITLGupstream15.74--0.015GT11.200.53270.5902–1.360.524421–1.080.608744–0.950.648883Ju and Mathieson, 2021
1492773663rs12896399GTLOC105370627 near SLC24A4intronic0.043--0.054GT1–0.160.86920.887–0.960.352238–0.890.380027–0.980.337781Sulem et al., 2007
1528197037rs1800414TCOCA2H615R23.3benign(0.133)deleterious(0)0.070IMP0.260.480.61160.6612–0.010.9895920.070.9398430.080.938174Edwards et al., 2010
1528213850rs4778219CTOCA2intronic1.527--0.316GT1–0.530.30740.3782–0.890.0952479–0.930.083945–0.900.0977717Adhikari et al., 2019
1528235773rs1800404CTOCA2synonymous coding0.321--0.488GT1–1.500.0014460.005889–1.470.00324612–1.300.00924527–1.370.00639755Crawford et al., 2017; Adhikari et al., 2019
1528344238rs7495174AGOCA2intronic7.622--0.087GT11.640.03260.06491.570.06632651.600.05808151.600.0567887Han et al., 2008
1528365618rs12913832AGHERC2 near OCA2intronic15.8--0.074GT1–1.860.039260.07497–1.480.112546–1.530.100514–1.570.0919142Liu et al., 2015; Adhikari et al., 2019
1528380518rs4778249TAHERC2 near OCA2intronic0.649--0.790IMP1–2.230.00022140.0014–2.430.00041828–2.140.00162008–2.120.0016832Adhikari et al., 2019
1528530182rs1667394CTHERC2 near OCA2intronic1.111--0.452GT1–1.420.0020390.007666–1.560.00151414–1.410.0038335–1.430.00369895Sulem et al., 2007
1689986117rs1805007CTMC1RR151C25.2probably_damaging(0.996)deleterious(0.02)0.016GT11.320.47750.53970.670.7129810.930.6131050.860.64501Ju and Mathieson, 2021
1689986154rs885479GAMC1RR163Q10.89benign(0.013)tolerated(0.3)0.461IMP0.92–1.310.0085650.0231–1.600.00241496–1.460.00572561–1.480.00546969Liu et al., 2015
193548231rs2240751AGMFSD12Y182H27.4probably_damaging(0.999)deleterious(0)0.031GT–3.030.037350.0723–1.600.281792–1.460.33655–1.570.3027Adhikari et al., 2019
203625436rs562926CTATRNintronic or downstream4.601--0.402GT10.850.087050.1395–0.120.8215670.180.7306850.280.595369Quillen et al., 2012
2032856998rs6058017AGASIP/AHCY3'UTR/intron7.639--0.342IMP0.95–0.900.072740.1212–0.850.126901–1.040.0606301–1.040.0609305Stokowski et al., 2007
Appendix 3—table 1
Top novel variants that may contribute towards skin pigmentation from our GWAS analysis.While the lowest p-values from the LMM-based methods meet the conventional criterion of 5e-08 for genome wide significance, the low observed minor allele frequencies (<2%) are inconsistent with what would be expected for variants responsible for pigmentation differences between the African and Native American populations.
CHRpos (b37)SNPREFALTgenelocationCADD_PHREDFreqGT sourceAR2BETA_aP_a_rawP_a_adjBETA_bP_bBETA_cP_cBETA_dP_d
1114560208rs113236485AGnear SYT6intergenic1.3230.014IMP0.9111.961.01E-086.71E-0712.241.57E-0712.914.67E-0812.962.17E-08
1114576742rs145925324GAnear SYT6intergenic0.6480.013IMP112.501.09E-087.12E-0712.561.55E-0713.393.85E-0813.491.81E-08
1114581335rs141998140GTnear SYT6intergenic2.0990.013IMP112.501.09E-087.12E-0712.561.55E-0713.393.85E-0813.491.81E-08
1114582335rs187318390CTnear SYT6intergenic1.1650.013IMP112.501.09E-087.12E-0712.561.55E-0713.393.85E-0813.491.81E-08
1114586703rs149623066AGnear SYT6intergenic0.0520.013IMP112.501.09E-087.12E-0712.561.55E-0713.393.85E-0813.491.81E-08
1114595150rs78273840CTnear SYT6intergenic1.8050.017IMP0.788.873.13E-065.39E-0510.595.74E-0711.111.98E-0711.259.38E-08
1114611620rs116218201TGnear SYT6intergenic2.1990.012IMP0.8314.091.12E-091.24E-0714.183.11E-0815.523.16E-0915.691.23E-09
1114612965rs116746819GAnear SYT6intergenic0.3510.012IMP0.8414.091.12E-091.24E-0714.183.11E-0815.523.16E-0915.691.23E-09
1114614230rs549514340TCnear SYT6intergenic3.3080.012IMP0.8414.091.12E-091.24E-0714.183.11E-0815.523.16E-0915.691.23E-09
  1. key, a = linear regression, 10 PCs; b = LMM with 0 PCs, std GRM; c = LMM with 10 PCs, std GRM; d = LMM with 10 PCs, reap GRM; ad j = based on lambda, inflation factor; beta = effect size.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Khai C Ang
  2. Victor A Canfield
  3. Tiffany C Foster
  4. Thaddeus D Harbaugh
  5. Kathryn A Early
  6. Rachel L Harter
  7. Katherine P Reid
  8. Shou Ling Leong
  9. Yuka Kawasawa
  10. Dajiang Liu
  11. John W Hawley
  12. Keith C Cheng
Native American genetic ancestry and pigmentation allele contributions to skin color in a Caribbean population
eLife 12:e77514.