Two-dimensional joint histograms comparing pLDDT to protein amino acid length (a) and expression (b) measured in transcripts per million (TPM). For each protein-coding gene, only the isoform found …
Comparison of predicted structures of ASMT, showing the 373aa isoform from Matched Annotation from NCBI and EMBL-EBI (MANE) (CHS.57426.2, RefSeq NM_001171038.2, GENCODE ENST00000381241.9) on the …
(a) Predicted protein structure for the Matched Annotation from NCBI and EMBL-EBI (MANE) isoform (CHS.52273.5, RefSeq NM_144727.3, GENCODE ENST00000337323.3) of gamma-N crystallin (CRYGN), colored …
Comparison of gamma-N crystallin (CRYGN) transcript structures in frog, mouse, and human. Exons 1, 2, and 3 are highly conserved across all species. Exon 4 is missing from the poorly folding Matched …
Predicted protein structures for seven distinct human isoforms of thioredoxin domain-containing protein 8 (TXNDC8), as well as the primary cattle transcript and a novel mouse transcript. Alternate …
Comparison of predicted structures for interleukin 36 beta (IL36B) for the Matched Annotation from NCBI and EMBL-EBI (MANE) isoform (CHS.30565.1, RefSeq NM_014438.5, GENCODE ENST00000259213.9) and …
Comparison of the structure of the Matched Annotation from NCBI and EMBL-EBI (MANE) isoform (CHS.7860.58, RefSeq NM_014489.4, GENCODE ENST00000278243.9) versus the highest scoring alternate isoform …
VEGFB isoforms VEGFB-186 (a) and VEGFB-167 (b). The inclusion of a heparin binding domain in VEGFB-167 results in sequestration to the cell surface while VEGFB-186 remains freely soluble. Relying …
A three-dimensional (3D) animation comparing the predicted protein structure of the Matched Annotation from NCBI and EMBL-EBI (MANE) isoform (CHS.52273.5, RefSeq NM_144727.3, GENCODE …
All isoform summary.
Folding scores from ColabFold for each transcript from a preliminary new build of the Comprehensive Human Expressed SequenceS (CHESS) database that contained a protein-coding sequence (CDS) that was under 1000aa in length. For transcripts already contained in the released CHESS v3.0 database, the identifier from that database is provided. If the transcript maps to a known gene locus X but is a novel isoform, it is shown with the identifier CHS.X.altY. If a transcript occurs at a novel locus X, the identifier is hypothetical.X.Y, where Y identifies the isoform number. Additional columns show the gene name, the RefSeq ID (release 110), the GENCODE ID (release 40), the predicted local distance difference test (pLDDT) (folding) score, and a flag indicating whether all intron boundaries (for multi-exon genes) are conserved in the mouse genome.
Matched Annotation from NCBI and EMBL-EBI (MANE) comparison summary.
Folding scores and additional data for all Comprehensive Human Expressed SequenceS (CHESS) transcripts that match genes in the MANE v1.0 dataset, limited to protein sequences under 1000aa in length. Transcripts must overlap the annotated CDS of the MANE transcript to be included. Columns include: CHESS_ID_isoform, the CHESS identifier of the alternate isoform transcript; CHESS_ID_MANE, the CHESS identifier of the MANE transcript at the same locus; gene, the gene name; aa_length_isoform, the amino acid length of the alternate isoform’s CDS; aa_length_MANE, the amino acid length of the MANE transcript’s CDS; length_ratio, the ratio of the alternate isoform length to the MANE isoform length; pLDDT_isoform, the predicted folding score of the alternate isoform; pLDDT_MANE, the predicted folding score of the MANE isoform; pLDDT_ratio, the ratio of the alternate isoform folding score to the MANE isoform folding score; GTEx_samples_observed_isoform, the total number of GTEx samples where the alternate isoform was observed at least once; GTEx_samples_observed_MANE, the total number of GTEx samples where the MANE isoform was observed at least once; GTEx_top_tissue_name_isoform, the name of the tissue in which the alternate isoform was observed in the highest number of samples; GTEx_top_tissue_name_MANE, the name of the tissue in which the MANE isoform was observed in the highest number of samples; GTEx_top_tissue_TPM_isoform, the average TPM of the alternate isoform in the named tissue; GTEx_top_tissue_TPM_MANE, the observed transcripts per million (TPM) of the MANE isoform in the named tissue; introns_conserved_in_mouse_isoform, an indicator of whether introns are conserved between the alternate human isoform and any annotated isoform in the GRCm38 mouse reference genome; introns_conserved_in_mouse_MANE, an indicator of whether introns are conserved between the MANE human isoform and any annotated isoform in the GRCm38 mouse reference genome.
Matched Annotation from NCBI and EMBL-EBI (MANE) comparison summary, filtered subset.
A filtered set of Comprehensive Human Expressed SequenceS (CHESS) transcripts compared to MANE according to the criteria detailed in the ‘Filtering MANE comparisons’ section of the Materials and methods. Uses the same column names as Supplementary file 2.