Introduction

During embryonic development in humans, female cells (46, XX) inactivate a single X-chromosome to balance dosage of X-linked gene expression between females (XX) and males (XY). The process of X-chromosome inactivation (XCI) begins during the peri-implantation stage in humans and is initiated by expression of the long non-coding RNA (lncRNA) XIST from the X inactivation center (XIC) of one X-chromosome in each female cell. XIST RNA coats the X-chromosome which in turn recruits protein complexes including chromatin modifiers which establish a facultative heterochromatin state. Silencing histone modifications and the deposition of DNA methylation ultimately result in the transcriptionally silent inactive X-chromosome (Xi) (Supp Fig. 1A). The selective expression of XIST from only one X-chromosome needs to be tightly regulated. Whereas the exact mechanism by which a given X-chromosome in a female cell is selected for inactivation is unknown, the XIC in humans includes several lncRNAs that have been implicated in the regulation of XIST expression and X-inactivation (Supp Fig. 1A) (1). Although the initial choice of X-chromosome for inactivation is random, the inactivated X-chromosome is stably inherited in a clonal fashion throughout all subsequent cell divisions (2). Interestingly, the process of X-inactivation (XCI) is incomplete; approximately 15% of genes escape XCI and remain expressed from both X-chromosomes (3). These ‘escape’ genes are consequently more highly expressed in females than in males, although the degree of sex-biased expression of escape genes can vary and may be greater for those genes that lack a functional homologue on the Y-chromosome (4). The sex-biased expression of escape genes has been suggested to mediate sex differences in both the frequency and severity of several diseases (5, 6). In addition, genes in the pseudoautosomal regions (PAR1 and PAR2) of the X-chromosome also remain expressed from the Xi and are thus always bi-allelically expressed. PAR1 and PAR2 are short regions of homology between the X- and Y-chromosomes that are essential for pairing of the sex chromosomes during meiosis in males (7). In contrast to escape genes, which tend to be more highly expressed in female cells, genes in the PARs have been suggested to show male-biased expression (8). Consequently, the terms PAR and nonPAR are used to distinguish between the biallelic expression (escape) of genes within the PARs and those located elsewhere on the X-chromosome.

Identification of females with non-mosaic X-inactivation

(A) The patterns of X-chromosome inactivation (XCI) in women resulting in mosaic (right female) or non-mosaic XCI (nmXCI). The presence of genetic variants can result in nmXCI females by (i) directly determining which X-chromosome can be inactivated (primary skewing, left female) or by (ii) imparting a selective advantage to a small number of cells (secondary skewing, middle female). Xa, active chr X; Xi, inactive chr X, Xm, maternal chr X, Xp, paternal chr X. (B) Single-tissue median allelic expression (AE) and standard error of all nonPAR genes on chromosome X (chr X) not previously classified as variable in all 285 women in GTEx. (C) Allelic expression per tissue of nonPAR chr X genes not previously classified as variable in mosaic females (median allelic expression < 0.475) and three females identified as non-mosaic, nmXCI-1, nmXCI-2 and UPIC (median allelic expression > 0.475). Boxplot indicating median, 25th and 75th percentile. (D) Copy number as log2 ratio of chromosome 17 (chr 17) for nmXCI females, UPIC, nmXCI-1 and nmXCI-2. Trisomy 17p in UPIC is highlighted.

Human females are typically ‘mosaic’ for X-inactivation; female tissues contain a mixture of cells that have inactivated either the maternal or paternal X-chromosome (Fig. 1A). Mosaicism renders analysis of XCI in primary human tissues highly complex or unfeasible. Consequently, our knowledge of XCI and escape across human tissues is surprisingly sparse and typically based on indirect measures of XCI, such as DNA methylation, sex-biased expression or from observations in cell lines and animal models (4, 9). In a seminal study, Brown and colleagues reported XCI status for 639 genes by integrating data from three published studies that each used different indirect approaches and model systems to infer XCI status (3). This resource continues to serve as a valuable reference for inferred XCI status in humans.

Interestingly, rare cases of humans in which the same parental X-chromosome has been inactivated in all cells (non-mosaic XCI, nmXCI) have been reported (8, 1012). Organism-wide nmXCI can arise from a constitutional genetic variant that directly interferes with inactivation of a specific parental X-chromosome, such as mutations in the promoter region of the XIST gene (10) and X-autosome translocations (13, 14). Disruption of the XIC locus or autosomal regions involved in the choice of X-chromosome to be inactivated have similarly been suggested to result in direct nmXCI, so called ‘primary’ skewing (15). Alternatively, selection against any deleterious constitutive genetic variant would result in clonal expansion of cells and indirectly result in nmXCI; ‘secondary’ skewing (Fig. 1A). As nmXCI females lack the confounding effect of mosaicism, they represent a powerful system to study human XCI by enabling direct determination of XCI status from bulk tissue samples. Indeed, using a single nmXCI female identified in the Genotype-Tissue Expression (GTEx) database (16), Tukiainen and colleagues generated the first comprehensive reference map of XCI status across human tissues based on direct determination of allele-specific expression of X-linked genes (8). However, as the ability to determine allele-specific expression is limited to the presence of an informative (heterozygous) genetic variant in the gene of interest, the XCI status of just 186 genes was determined across tissues in this one nmXCI female.

In this study, we identify two additional unrelated nmXCI females in the GTEx database. By combining allele-resolution data for the one previously reported nmXCI female and two newly identified nmXCI females, we directly determined X-inactivation status of 380 X-linked genes across 30 normal tissues, including 15 tissues in which XCI has not previously been characterized. This unique dataset allowed investigation of tissue-specific escape from XCI in humans and generation of an extensive multi-tissue map of human X-inactivation to date.

Results

Non-mosaic X-inactivation is common in human females

Whereas the frequency of nmXCI females in the general population was originally thought to be less than 1:500 (11), more recent work has suggested a frequency of as high as 1:50 (12). To identify nmXCI females we screened a single tissue of all 285 female donors (Table S1, Table S2) in the GTEx database (v8 release) (16). Briefly, calculating the allelic expression (AE, how much the ratio of Xa/Xi reads deviates from the expected 0.5) of all X-linked genes outside of the PARs (i.e. the nonPAR AE) allows assessment of skewing in female tissues. We specifically exclude PAR genes from this calculation as they are not subject to XCI, and consequently, their Xa/Xi read count ratio is not significantly affected by XCI skewing. Three female samples showed an exceedingly high degree of skewing (median chr X nonPAR allelic expression > 0.475), consistent with expression from a single parental chromosome, nmXCI (Fig. 1B). Extending allelic expression analysis to all available tissues across the three candidate females (Table S3) confirmed their complete nmXCI status (Fig. 1C). These three nmXCI individuals included the single previously identified nmXCI female, UPIC (8), confirming the accuracy of our screening approach (Fig. 1B, Fig. 1C). nmXCI may result from deleterious genetic mutations causing primary (10) or secondary (acquired) skewing (15) or from stochastic non-random XCI (12). Whereas no small-scale mutations in the XIC were identified in any of the nmXCI females, trisomy 17p was observed in UPIC (Fig. 1D, Supp Fig. 1B). How trisomy 17p could result in non-random X-inactivation is unclear, but an unbalanced 17p:X translocation would result in strong selection for silencing of the abnormal X and consequential nmXCI (17). However, expression of genes on 17p was consistently higher than those on 17q across all tissues in UPIC, suggesting the duplicated 17p was not silenced (Supp Fig. 1C). Indeed, further investigation of donor metadata in GTEx revealed that UPIC had been diagnosed as mosaic for trisomy 17p by traditional karyotyping but did not provide a detailed karyotype. Interestingly, whereas all three females were nmXCI, females nmXCI-1 (GTEx ID: 13PLJ) and nmXCI-2 (GTEx ID: ZZPU) showed greater variation in allelic expression across tissues than UPIC (Fig. 1C), suggesting that their nmXCI may have resulted from clonal selection driven by constitutive genetic variants (secondary skewing).

A landscape of X-inactivation across normal human tissues

The ability to detect allele-specific expression is dependent on the presence of a heterozygous genetic variant (SNP) in the gene of interest. Using whole exome sequencing (WES) and whole genome sequencing (WGS) for both newly discovered nmXCI donors we identified 389 heterozygous SNPs across 316 X-linked genes (Fig. 2A, Fig. 2B, Table S4). Briefly, allelic expression was determined by calling heterozygous SNPs on the X-chromosome using genomic data (WES or WGS) and subsequently counting the number of each base at every heterozygous SNP (hetSNP) position in RNA-seq data. The allelic expression can then be calculated abs(0.5 - (reference reads / total reads)), where a value of zero indicates complete biallelic expression and a value of 0.5 indicates complete monoallelic expression. Allelic expression ratios were highly consistent between nmXCI females and allowed us to identify known escape (PAR and nonPAR), inactive and variable (8) XCI genes (Fig. 2C). Next, we applied our analytical workflow to the previously identified nmXCI sample, UPIC, observing that the allelic expression levels seen using our analysis pipeline largely matched the findings of Tukiainen et al (Fig. 2D, Supp Fig. 2A) (8). Notably, our inclusion of additional WGS data of UPIC and identification of 76 further SNPs allowed determination of allelic expression of 36 further genes in this individual (Fig. 2E). The use of updated gene-models, correction of reference alignment bias and more stringent quality filtering resulted in exclusion of 35 genes for which allelic expression was erroneously reported (Fig. 2E, Supp Fig. 2B).

Characterization of two novel human nmXCI females

(A) Overlap of genic heterozygous SNPs (hetSNPs) (upper) and genes with hetSNP (lower) across the two novel nmXCI females (nmXCI-1 and nmXCI-2). Genic hetSNPs were identified using both WES and WGS for each individual. (B) Distribution of assessed genes across the X-chromosome. Genes located in the pseudoautosomal region 1 (PAR1) and PAR2 are highlighted in green. (C) Allelic expression per tissue for well-characterised PAR, (AKAP17A), escape (PUDP, DDX3X), inactive (APOOL) and variable (PRKX) genes (8). Boxplot indicating median, 25th and 75th percentile. (D) Spearman correlation of allelic expression values using our analysis approach (Gylemo) and the Tukiainen et al analysis pipelines for female UPIC. (E) Overlap of genic hetSNPs (left) and genes (right) identified by our analysis (Gylemo) and the Tukiainen et al analysis pipeline in female UPIC.

Next, we integrated data from all three nmXCI individuals (nmXCI-1, nmXCI-2 and UPIC), allowing direct allelic expression detection of 380 X-linked genes across 30 tissues, including 15 tissues for which XCI status has not been directly assessed previously. While 195 genes and 17 tissues could only be examined in a single individual, 185 genes and 13 tissues were examined across multiple nmXCI females, allowing for selected inter-tissue and inter-individual comparisons (Fig. 3A, Fig. 3B). Classification of allelic imbalance (mono-allelic | bi-allelic) of individual alleles was initially determined by the binomial test (PADJ < 0.01), according to best practice (18). However, upon visual inspection we identified several well characterised escape and inactive genes that may have been mis-classified as variable escape (Fig. 3C). Inappropriate classification of allelic imbalance results from the well documented sensitivity of the binomial test to data over-dispersion, which frequently occurs with read count data (18, 19). Consequently, we further manually curated the allelic expression status of all 380 genes using the empirical guideline of allelic ratio > 0.4 as indicating mono-allelic expression and the potential consequences of high and low read counts. Our three criteria for manual re-classification were (i) low power, indicating genes with a low statistical power but consistent escape pattern, (ii) low read count, indicating genes with a consistent escape pattern but with non-significant escape in a single tissue, and (iii) over-estimation, genes in which a high read counts inflated the binomial p-value. (Fig. 3C, Supp Fig. 3, Table S4, Table S5). This manual curation resulted in the re-classification of allelic expression status of 32 genes (Table S5, Supp Fig. 3). Our classifications of XCI status based on direct determination of allele-specific expression represent one of the most extensive and high-confidence maps of X-inactivation to date (Fig. 3D, Fig. 3E, Supp Fig. 4). Allelic expression for all genes independent of XCI status was highly correlated between individuals (Supp Fig. 5A), and by including three nmXCI individuals in our analysis we were able to determine cross-tissue XCI status for 13 genes which were only covered in one tissue in the previous assessment based on UPIC alone (Fig. 4A). Interestingly, whereas we reveal EGFL6, TSPAN6 and CXorf38 as variable escape genes, these were previously classified (Balaton et al) as inactive, variable and escape, respectively. Finally, we compared our classification with the classification of X-linked escape status from donor UPIC as determined by Tukiainen and colleagues (4, 9, 12). As expected, our classification was largely consistent with that of Tukiainen et al, but also revealed the possible misclassification of XCI status of several genes and further added the XCI status for 198 genes for which XCI status had not previously been directly assessed (Fig. 4B). Our direct assessment of allelic expression further allowed for validation of XCI status previously determined by indirect measures (3, 8). We confirmed XCI status for many previously classified inactive and escape genes whereas most reported variable escape genes are re-classified as inactive based on our analysis (Supp Fig. 5B, Supp Fig 5C). However, as the XCI classifications reported here are largely based on inter-tissue comparisons within individuals, most previous classifications were based on inter-individual comparisons of XCI, confounding direct comparisons between studies.

An extended landscape of X-inactivation in humans

(A) Overlap of genic heterozygous SNPs (hetSNPs) (upper) and genes with hetSNP (lower) across the three females (nmXCI-1, nmXCI-2 and UPIC) with non-mosaic X-inactivation (nmXCI). (B) Tissues covered in each nmXCI female. Tissues not covered in UPIC are indicated in bold. (C) Examples of genes classified by the binomial test alone and after manual curation. Allelic expression across tissues is shown with XCI status based on the binomial test indicated as inactive (grey circle) or escape (red triangle). (D) Allelic expression of X-linked gene categories (lower) and number of genes included in each category (upper). Genes classified as escape and inactive are separated based on whether allelic expression was determined across multiple tissues or in a single sample (data for one single tissue). Boxplot indicating median, 25th and 75th percentile. (E) Heatmap showing the allelic expression of all genes that show constitutive or variable escape from XCI. Black asterisks within the tiles indicate a significant expression from the inactive X-chromosome (i.e. escape, FDR-corrected binomial q-value < 0.01). The ‘consensus call’ tile is the assigned XCI status across tissues and individuals for each gene with genes classified as variable including both intra- and interindividual variation. Red asterisks indicate genes in which manual curation of XCI status was performed. Grey tiles indicate missing data. (B, E) Tissue abbreviations can be found in Table S7.

Classification and novel XCI assessment of X-linked genes

(A) Allelic expression across all available tissues in all three nmXCI females for genes which were only covered in one tissue in the previous assessment based on UPIC alone (8) (B) Alluvial plot showing classification of escape status of X-linked genes based on our analysis (Gylemo) compared to a previous assessment based on UPIC alone (Tukiainen) (8). Genes classified as escape and inactive are separated based on whether allelic expression was determined across multiple tissues or in only one sample (data for one single tissue).

Discussion

Here we describe an extensive multi-tissue map of human X-inactivation directly determined from allele-specific expression of X-linked genes. This data represents a doubling of the number of X-linked genes for which XCI status has been directly determined across multiple normal tissues and individuals. Our results suggest that XCI is stable both within and across individuals and that tissue-specific escape from X-inactivation is rare and often challenging to characterize. Variability in X-linked gene expression is important to characterize, as XCI variability can directly modify expressivity of X-linked pathogenic variants. This provides a mechanism for sex-biased expression differences in certain biological contexts and tissues, which in turn can modify disease-risk.

nmXCI females allow for direct determination of allele-specific expression of X-linked genes, even in bulk tissue samples. Our study further highlights the power of such natural genetic systems to shed light on XCI in humans. Whereas the origin of nmXCI in females is typically unknown, nmXCI females likely harbor one or more mutations that (i) directly affect the choice of X-chromosome to be inactivated (primary skewing) or (ii) result in selection against cells carrying an active mutated X-chromosome (secondary/acquired skewing). As such, the X-linked genetics of nmXCI females is clearly atypical and could result in aberrant establishment and maintenance of XCI in these females. Whereas the highly similar patterns of XCI observed across the three genetically independent nmXCI females analyzed here suggests that such affects are subtle, additional multi-tissue studies of XCI in mosaic females is required to confirm the wider applicability of our findings. Finally, our identification of two additional nmXCI females among the 285 females in the GTEx database further substantiates recent claims of extreme skewing of XCI as a frequent and major modifier of trait penetrance and expressivity in the human population (12, 20).

Materials and Methods

Allele-specific expression analysis for initial identification of nmXCI females

For the initial screen to identify nmXCI females in GTEx pre-processed SNPs were called from WES data from GTEx (GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz, see https://gtexportal.org/home/methods for specifics on data processing). Downloaded data was lifted over from hg19 to hg38 with LiftoverVcf (GATK v.4.1.9.0) and bcftools (v1.10.2) (21) was utilized to include only hetSNPs on the X-chromosome. One RNA-seq sample per female was downloaded from AnVIL using the Gen3 client, as aligned BAM files (see https://gtexportal.org/home/methods for specifics on data processing). Tissues used for the screen were selected based on abundance in the entire data set (muscle, n=239; skin-lleg, n=21; thyroid, n=1; adipose-subc, n=2; artery-tibi, n=2; esophagus-muco, n=2; nerve, n=2; adipose-visc, n=1; ovary, n=1; uterus, n=1; see Table S7 for full list of abbreviations). Following data download, allelic expression (AE) analysis of allele counts for hetSNPs was retrieved from RNA-seq data using GATK ASEReadCounter (v.4.1.9.0), requiring a minimum base quality of 20 in the RNA-seq data. To remove potentially spurious sites, conservative filters were applied to the data: variant call read depth in WES >= 20 per allele, minor allele read count > 10% of the total read count, and RNA-seq read depth > 10 reads. If a gene carried multiple hetSNPs, only the one with the highest RNA-seq read count was included. AE data from the screen can be found in Table S2 and Table S3. XCI skew of each individual participant was assessed by calculating the single-tissue median AE of all nonPAR chr X genes not detected previously as variable (8) (Table S1) in all 285 women in GTEx. Median chr X nonPAR AE of higher than 0.475 (equates to < 2.5% of reads coming from the inactive X-chromosome) indicates expression from a single parental chromosome and was used as a cutoff to distinguish between mosaic and non-mosaic XCI (nmXCI) females. As a control, UPIC was included. UPIC is a female which has previously been shown to have nmXCI (8). As UPIC is not included in GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz, SNP calling was performed using WES data and AE analysis was performed, ultimately leveraging her AE data as a positive control.

Detailed analysis of allele-specific expression and XCI status in nmXCI females

For further assessing XCI status of the three females with nmXCI (high median chr X nonPAR AE), the remaining RNA-seq tissue samples available for all three donors were downloaded from GTEx and processed as described above.

We further downloaded WES and whole-genome sequencing (WGS) data available for the three nmXCI females, UPIC, 13PLJ (nmXCI-1) and ZZPU (nmXCI-2) and performed SNP calling. Briefly, WES BAM and WGS CRAM files for each donor were downloaded from AnVIL using the Gen3 client. The files were sorted and converted into fastq files using Samtools (v1.9) (21). Raw fastq reads were quality trimmed using FastP (v.0.20.0) (22) with default settings. BWA-MEM (23) was used for mapping reads to the human genome build 38 (hg38, GCA_000001405.15_GRCh38_no_alt_analysis_set.fna) using default settings. Sambamba v0.8.2 (24) was used to mark duplicate reads. The GATK best practice germline short variant discovery pipeline was used to process the aligned reads, using base quality score recalibration and local realignment at known insertions and deletions (indels) (GATK v.4.2.6.1). Indels and SNPs were called jointly across all samples, for both WGS and WES data. Default filters were applied to indel and SNP calls using the variant quality score recalibration (VQSR) approach of GATK. All RNA-seq samples available for each participant (Table S6) were downloaded from AnVIL as aligned BAM files using the Gen3 client. Samtools (v1.9) (21) was used to sort and convert the BAM files to fastq files and quality trimming was done with FastP (v.0.20.0) (22) with default settings. RNA-seq reads were aligned with STAR 2-pass mode (v2.7.10a) (25) with GCA_000001405.15_GRCh38_no_alt_analysis_set.fna and --sjdbGTFfile gencode.v40.annotation.gtf index. WASP filtering was performed to reduce reference bias (26). Briefly, GATK SelectVariants was used to extract hetSNPs (that passed the VQSR filtering) for each individual from either the WGS or WES VCFs, and subsequently passed to STAR via --varVCFfile and–waspOutputMode. Reads failing WASP filtering were removed. Following data pre-processing, AE analysis of the allele counts for hetSNPs was retrieved from RNA-seq data using GATK ASEReadCounter. Heterozygous variants that passed VQSR filtering were first extracted for each sample from WES and WGS VCFs using GATK SelectVariants. Following, sample-specific VCFs and RNA-seq BAMs were input to ASEReadCounter requiring minimum base quality of 20 in RNA-seq data. ASEReadCounter outputs from WGS and WES pipelines were joined. If overlapping sites were identified, the hetSNP read counts were merged and the AE counts from the assay with the highest total read count was kept. To remove potentially spurious sites, conservative filters were applied to the data (variant call read depth >= 10 per allele and minor allele read count > 10% of the total variant read count for WES/WGS data, and an RNA-seq read depth of more than 7 reads). If a gene carried multiple hetSNPs, the hetSNP detected in the highest number of tissues was kept. If two hetSNPs were detected in the same number of tissues the one with the highest RNA-seq read count was kept. The ‘pituitary’ sample from nmXCI-1 was excluded as it had a lower median chr X nonPAR AE than all other tissues from the same individual. Furthermore, the ‘lymphoblasts’ sample was excluded from donor UPIC since it is unclear how EBV-transformation of lymphocytes affects the X-chromosome. The final ASE table for all X-linked genes in the nmXCI females can be found in Table S4.

X-inactivation status categorization

Tissue-specific X-inactivation status categorization was performed as previously described (8). Briefly, XCI status of genes was assessed by comparing the allelic read count ratios at each filtered X-chromosomal hetSNP, in each tissue individually. Whether there was significant expression from the inactive X-chromosome (escape) at each hetSNP was tested with a one-sided binomial test where the reads from Xi compared to the total read count was expected to be significantly greater than 0.025 or 2.5% (hypothesized probability of success = 0.025). FDR correction was applied to p values from the binomial test for each of tissues separately using the rstatix R package version 0.7.2 (Kassambara A. 2023). q values < 0.01 were considered significant and indicative of XCI escape for the hetSNP and tissue.

To assess across-tissue XCI status of genes, we leveraged the q values obtained above. If a given hetSNP showed significant Xi expression in all tissues in which the gene is expressed, the SNP was categorized as ‘escape (across tissues)’, a hetSNP with non-significant Xi expression across all tissues in which the gene is expressed was classified as ‘inactive (across tissues)’. If a gene was significant in more than one but not in all tissues in which it is expressed the gene was classified as ‘variable’. Genes for which we had AE data in only one tissue were either labeled “inactive (data for one single tissue)” or “escape (data for one single tissue)” based on the binomial test. Finally, genes residing in the pseudo-autosomal region (PAR) were labeled ‘PAR’. Upon inspection of the classification results several well characterized escape genes and inactive genes were mis-classified as variable escape. Consequently, we manually curated allelic expression status of all 380 genes using the empirical guideline of allelic ratio >0.4 as indicating mono-allelic expression while also considering the consequences of high and low read counts. Our three criteria for manual re-classification were low read count, low power and over-estimation, indicating genes with a low statistical power but consistent escape pattern (low power), consistent escape pattern but with non-significant escape in single tissue (low read count) and over-estimation due to high read counts inflating the binomial p-value (over-estimation). The list of genes that were manually curated, their change in XCI status and the criteria for their manual classification can be found in Supp Fig. 3 and Table S5.

Copy number analysis

Copy number analysis was performed using the R-package, QDNASeq (27). Briefly, whole genome sequencing data for all three nmXCI females was mapped to the human reference genome (hg38) by BWA-MEM (23) using default parameters. Raw copy numbers were then estimated by counting the number of reads mapped to non-overlapping bins of 15 kb unless stated otherwise.

Investigating chromosome 17p and 17q arm expression

Transcriptome mapping was performed using Salmon v1.9.0 (--gcBias --seqBias -- numBootstraps 100) (28) using known Refseq transcripts (NM_* and NR_*) from the GRCh38.p12 assembly (GCF_000001405.38_GRCh38.p12_rna.fna.gz) as a reference. Library type options were IU. Raw fastq reads were quality trimmed using FastP (v.0.20.0) with default settings. Abundance estimates were converted to h5 format using Wasabi (https://github.com/COMBINE-lab/wasabi). Sleuth was used for data normalization and normalized gene abundances (29, 30). Expression matrix (TPM) can be found in Table S8.

Data and code availability

All data and analysis scripts are included in this manuscript or are available on GitHub https://github.com/ColmNestorLab/tissue_XCI.

Author contributions

C.E.N and B.G designed research; B.G. performed research; C.E.N., B.G. and M.B. wrote the paper; M.B. provided insights and discussions.

Declarations of interest

The authors declare no competing interests.

Funding sources

The project was funded by the Swedish Research Council (2020-01277_VR) and the European Research Council (XX-Health-101045171).

Acknowledgements

We thank Shadi Jafari and Sandra Hellberg for helpful discussions and input on the manuscript. Work in the lab of Colm Nestor was funded by the Swedish Research Council (2020-01277_VR) and the European Research Council (XX-Health-101045171).

Supplementary figure legends

(A) Expression, cis-binding and spreading of the long non-coding RNA XIST initiates the epigenetic process of remodelling an active X-chromosome (chr X) into an inactive X-chromosome (Xi). After initiation of X-inactivation activating histone marks (histone 3 lysine 27 acetylation; H3K27ac) are removed while silencing histone modifications (trimethylation of lysine 27 on histone 3; H3K27me3 and ubiquitination of lysine 119 on histone 2A; H2AK119Ub) and DNA methylation are deposited on the future inactive X chromosome. This process leads to transcriptional silencing of all genes on the Xi except for genes residing in the pseudoautosomal regions (PARs, green) as well as a subset of non-PAR genes that ‘escape’ X-inactivation (red) and are continually expressed from the Xi. (B) Copy number as log2 ratio using 500kb bins of the whole genome for nmXCI females, UPIC, nmXCI-1 and nmXCI-2. Trisomy 17p in UPIC is highlighted. (C) Gene expression from 17p and 17q across all tissues in nmXCI-2 and UPIC. TPM: transcriptions per million. 17p: chromosome 17, p arm. 17q: chromosome 17, q arm. Tissue abbreviations can be found in Table S7.

(A) Spearman correlation of allelic expression values using our analysis approach (Gylemo) and Tukiainen et al analysis pipelines for female UPIC in all tissues available. (B) List of genes that were included in Tukiainen’s analysis of allelic expression in UPIC but were excluded from our analysis including reason for exclusion.

Read counts for minor and major alleles for all genes where the XCI status was manually curated. The manual curation result and the reason for manual curation is stated above each gene and summarized in the table in the bottom right of the figure. Asterisks indicate significant escape based on the binomial test (FDR-corrected binomial q value < 0.01).

Heatmap showing the allelic expression of all X-linked genes assayed in this study. Black asterisks within the tiles indicate a significant inactive X-chromosome expression (i.e. escape, FDR-corrected binomial q value < 0.01). The ‘consensus call’ tile is the assigned XCI status across tissues for each gene with genes classified as variable including both intra- and interindividual variation. Red asterisks before gene names indicate genes in which manual curation was performed to assign XCI status. Grey tiles indicate missing data.

(A) Spearman correlation of allelic expression between nmXCI females. (B, C) Alluvial plots showing direct classification of escape status based on our analysis (Gylemo) compared to classification of escape status of X-linked genes as consensus calls for X-inactivation based on data from multiple previous studies employing indirect measures to determine escape; reported by (B) Tukiainen et al (8) (in their Suppl. Table.1, Combined XCI status) and (C) Balaton et al (in their Table S1, Balaton consensus calls) (31).

Supplementary table legends

Table S0: Text file describing each Supplementary Tables in detail.

Table S1: Genes used in screening to identify non-mosaic XCI (nmXCI) females. Compiled from Supplementary Table 13 in Tukiainen et al, removing genes with a variable escape status in UPIC (XCI across tissues != “Partial, heterogeneous”) and genes residing in the pseudoautosomal region (Region != “PAR”).

Table S2: Allelic expression of the genes selected in Table S1 for all 285 females analyzed from GTEx.

Table S3: Allelic expression of the genes selected in Table S1 for all tissues available for the nmXCI females: UPIC, nmXCI-1 (13PLJ) and nmXCI-2 (ZZPU).

Table S4: Allelic expression data of 380 genes on the X-chromosome for each of the nmXCI females UPIC: nmXCI-1 (13PLJ) and nmXCI-2 (ZZPU).

Table S5: List of genes which were manually curated, the directionality of the manual curation, the reasoning behind the manual curation.

Table S6: Sample- and tissue-IDs for all data analyzed for nmXCI females: UPIC, nmXCI-1 (13PLJ) and nmXCI-2 (ZZPU).

Table S7: Tissue abbreviations.

Table S8: Gene expression (TPM) matrix of all tissues for nmXCI females: UPIC, nmXCI-1 (13PLJ) and nmXCI-2 (ZZPU).