BID-seq identifies precise pseudouridine (Ψ) modification sites in ribosomal RNA and reveals dynamic Ψ modification patterns in transfer RNA.

a,b Ψ modifications detected on 16S and 23S ribosomal RNA with baBID-seq of E. coli (a) and P. aeruginosa (b) total RNA during exponential and stationary growth phases. All Ψ sites in panels a and b were identified using filtration criteria of deletion fraction > 0.02 and p-value <= 1 × 10−4 c Pearson correlation analysis of Ψ modification fractions at individual sites between biological replicates. d Ψ modification fraction alteration pattern observed on specific sites in K. pneumoniae different tRNA regions. The left panel depicts the general tRNA secondary structure in K. pneumoniae. Structural regions are indicated in the legend and highlighted with corresponding colors. The right scatter plot’s y-axis depicts the Ψ fraction difference, calculated as (exponential phase Ψ fraction) - (stationary phase Ψ fraction). e Heatmap displaying the tRNA Ψ fraction alteration features detected across different conditions. Color represents the average Ψ fraction values (range from 2% to 100%) at specific sites within tRNA isoacceptors for each strain. The tRNA tags (labeled in y-axis) comprise the amino acids transferred by each tRNA, the corresponding anticodons, and the Ψ position on the tRNA molecules. f Venn plot shows the overlap of detected Ψ sites between biological replicates for P. aeruginosa and P. syringae during each growth phase.

baBID-seq uncovers Ψ modification in bacterial mRNA CDS and UTRs.

a Density plot depicting the distribution of Ψ modifications in mRNA across different growth phases and conditions. b Distribution of mRNA Ψ fraction showing strain and growth phase-specific patterns of Ψ distribution (right Ψ density plot across each strain’s mRNA). c mRNA Ψ faction and counts under different conditions. d Right pie charts show the proportion of Ψ sites in intergenic versus coding regions, and violine plots compare Ψ fraction values between untranslated regions (upstream and downstream UTRs) and coding regions. Statistical significance was determined using the Wilcoxon Signed-Rank test; ns, p-value ≥ 0.05; *p- value < 0.05; **p-value < 0.01; ***p-value< 0.001 and ****p-value< 0.0001. e The pie charts show the Ψ-modified gene overlap in two growth phases across four strains. f Density plot shows the Ψ site numbers per transcript.

Comparative analysis of Ψ modification motif across strains.

a Comparison of overall mRNA 5-mer Ψ motif ratios across four strains. Motif ratios are calculated by dividing the count of each specific 5-mer motif centered on Ψ by the total number of motifs detected in each individual strain mRNA. Ψ-modified sites with a fraction above 2% are used here. b Distribution of Ψ fraction (ranging from 2% to 100%) for each motif detected. c Scatter pie chart shows the proportional distribution of top 10 (ranked by motif abundance) Ψ-containing motif counts categorized by RNA types and bacterial strains. d The scatter plot illustrates the relationship between the average modification fraction and abundance of motifs in P. aeruginosa in exponential (P. aeruginosa_exp) and stationary (P. aeruginosa_stat) growth phases. The average Ψ fraction was calculated as the sum of Ψ fractions for each individual motif divided by its frequency.

Evolutionary conservation of clustered Ψ modifications in orthologous genes.

a The bar plot depicts the Ψ modified homologous genes among bacterial species (green box) and strain-specific genes (orange box). Functional networks generated by clustering KEGG pathway enrichment results of Ψ-modified homologous genes present across two or more strains. The dot size in the network indicates the gene number contained in specific KEGG pathways. The p-value for each pathway was calculated with Fisher’s Exact test. b Regions with Ψ enrichment across the atp and eno operons in four strains. Dark red dots represent highly modified sites with Ψ fraction values exceeding 50%.

Growth state-dependent dynamics of Ψ modification fraction.

a,b Heatmap showing the mRNA Ψ fractions alteration of each site in P. aeruginosa (a) and P. syringae (b). The color intensity reflects Ψ fraction at each site. Only sites with >10% absolute difference in Ψ fraction between exponential and stationary phases are displayed. Blank boxes signify either unmodified sites or those with Ψ fractions below 2%. The annotation combining position label and gene name indicates the precise location of Ψ modification within genes. c Scatter plot shows Ψ intensity alteration between two growth phases in P. aeruginosa. For each mRNA, Ψ intensity is calculated as the sum of all Ψ fractions throughout the transcript. d Similar to c, the scatter plot shows Ψ intensity alteration in P. syringae under two growth phases. e scatter plot shows Ψ intensity alteration in P. syringae under two growth phases in MM medium condition. f-h Box plot shows the modified and unmodified mRNA transcripts per kilobase million (TPM) changing between exponential and stationary growth phases under different conditions: P. aeruginosa cultured in LB medium (f), P. syringae cultured in KB medium (g) and P. syringae cultured in MM medium (h). Y-axis shows log2(TPM at exponential phase / TPM at stationary phase) of each mRNA. The red color presents Ψ-mRNA, and the blue color indicates no-Ψ-mRNA. Wilcoxon Signed-Rank test; ns, p-value ≥ 0.05; *p-value < 0.05; **p-value < 0.01; ***p-value< 0.001 and ****p-value< 0.0001. i The scatter plot illustrates the correlation between translation efficiency (TE) alteration (log2Foldchange of (TE_MM/TE_KB)) and Ψ intensity difference (Ψ intensity of MM-Ψ intensity of KB) in P. syringae cultured under MM medium (TE_MM) versus KB conditions (TE_KB). j The proportion of Hfq-bound Ψ-mRNA (red color for exponential growth phase mRNA and blue color for stationary phase mRNA) versus those non-Ψ-mRNA (grey color) across two growth conditions in P. aeruginosa. k The scatter plot shows a correlation between mRNA Ψ intensity and Hfq binding score, where the Hfq binding score is calculated as the sum of each mRNA peak binding strength (log2FoldChange value of each peak).

Structure-dependent patterns of Ψ modifications and transformer-GNN-based deep learning network for Ψ prediction.

a Predicted RNA secondary structures containing the GUΨCG motif with corresponding Ψ fraction values and gene identifiers annotated. MXfold2 is employed to model these structures using 20nt flanking sequences extending from each modification site. b Sequence and structure clustering of 41-nucleotide RNA segments centered by Ψ sites with fraction values greater than 0.1 and containing either GCΨCG (black branch color) or GUΨC (red branch color) motifs. The circular visualization features three concentric layers: the inner layer displays the Ψ fraction value, the middle layer indicates RNA type, and the outer layer represents the GC ratio (%) of each 41nt RNA segment. The red dot around the circle marked the position of RNA displayed in (a). c Architecture of pseU_NN. The model integrates sequence and structural information through two parallel pathways: 1) A sequence analysis branch with one-hot embedding followed by a multi-head transformer module, and 2) A structure analysis branch that processes RNA secondary structure adjacent matrices through a graph convolution module to extract structural features. The features extracted by the two modules are further weighted and merged as input for residual blocks (fully connected layers). d Bar plots summarize model performance with input sequences of 41 nt, 61 nt, and 81 nt, evaluated by PR-AUC, accuracy (ACC), F1 score, Matthews correlation coefficient (MCC), precision, recall, and ROC-AUC. Overall, the three sequence lengths show comparable performance across metrics, with the 41 nt model achieving slightly higher PR-AUC and ROC-AUC, indicating that shorter sequence contexts are sufficient for robust pseudouridine prediction. e Multi-metric assessment showing precision-recall curve (AUC 0.906), F1 curve (AUC 0.821), and ROC curve (AUC 0.89) of the pseU_NN model on 41nt validation datasets, achieving a peak F1 score of 0.804. f Distribution of pseU_NN prediction scores on 41nt test datasets.

baBID-seq workflow and detection of Ψ sites on 16S rRNA.

a Schematic overview of the baBID-seq workflow optimized for bacterial RNA, adapted from the original BID-seq protocol. b Distribution of detected Ψ sites on 16S rRNA across different bacterial strains, with the y-axis showing the deletion ratios in input and bisulfite-treated samples. c Integrative Genomics Viewer (IGV) snapshot illustrating the detected Ψ781 site in P. aeruginosa 16S rRNA.

Reproducibility of Ψ detection and dynamic Ψ modification in tRNA.

a Pearson correlation of Ψ modification fractions between two independent biological replicates for each strain. b-c Changes in tRNA Ψ fractions in B. cereus (c) and P. syringae (d) between exponential and stationary growth phases. The y-axis represents the Ψ fraction difference (exponential phase minus stationary phase). d Overlap of identified mRNA Ψ sites between biological replicates for each strain. e Experimental validation of Ψ sites in P. aeruginosa PAO1 using the pseU-TRACE method. From right to left, the panels show validation of the Ψ site at position 944 on 23S rRNA, a negative control site from the guaA gene, a Ψ site in clpV1 (chromosomal position 109,883), and a Ψ site located in the intergenic region between guaA and guaB (chromosomal position 4,227,284).

Distribution and fraction of mRNA Ψ modifications across transcript regions.

a Distribution of mRNA Ψ fraction values in K. pneumoniae and P. syringae under minimal medium (MM) conditions. b Distribution of Ψ modifications between untranslated regions (UTRs) and coding sequences (CDSs) during the stationary growth phase and (c) exponential growth phase. d Distribution of Ψ fraction values in upstream and downstream UTRs of mRNAs across exponential and stationary growth phases.

Motif pattern of Ψ modification.

a The Ψ motif ratio in tRNA across four strains. Motif ratios are calculated by dividing the count of each specific 5-mer motif centered on Ψ by the total number of motifs detected in each individual strain tRNA. b The Ψ motif distribution ratio in ribosomal RNA (rRNA) across four bacterial strains. Motif ratios are calculated by dividing the count of each specific 5-mer motif centered on Ψ by the total number of motifs detected in each individual strain rRNA. c Scatter plot displaying the average Ψ fraction values for different sequence motifs in ribosomal RNA. Each point represents a specific sequence motif, with its position on the y-axis indicating the average Ψ fraction value across all instances of that motif in rRNA. d Similar to c, each point represents a specific motif, with its position on the y-axis indicating the average Ψ fraction value across all instances of that motif in tRNA. e-h Pseudouridylation motifs identified on mRNA transcripts from four bacterial strains using MEME. i Density plot displaying GC composition within 10-base pairs flanking Ψ sites in mRNA. j GC composition within 10-base pairs flanking Ψ sites in mRNA normalized to the background GC ratio. k Box plot comparing the GC ratio (normalized to genome background) around Ψ sites across different bacterial strains. l Hierarchical clustering plot showing gene orthology analysis of all pseudouridine synthases across bacterial strains. The colour intensity represents clustered gene counts across genomes.

Comprehensive analysis showing functional enrichment of Ψ-containing transcripts and Ψ enriched region.

a The bar plot illustrates the top 20 enriched KEGG pathways, ranked by p-value, derived from Ψ modified homologous genes identified in at least two strains. b The Gene Ontology analysis of Ψ-mRNA in exponential and stationary phases of P. aeruginosa. c Comparison of Ψ intensity patterns between strain-specific and homologous gene sets during two different growth phases. d Analysis of pseudouridylation density showing the number of Ψ sites occurring within 500nt windows centered on each modified position. e Ψ modification-enriched regions detected across bacterial strains. The x-axis represents positions on the bacterial chromosome. Dark red indicates highly modified Ψ sites, defined as positions where the Ψ fraction exceeds 50%.

Growth state-dependent dynamics of Ψ modification fraction.

a-c Paired visualization showing Ψ dynamics across growth phases, with a violin plot depicting the distribution of Ψ fractions in mRNAs during two distinct growth phases (left), and a corresponding heatmap displaying the specific Ψ modification sites dynamic detected across bacterial strains at these same growth phases (right). d The Ψ fraction distribution on mRNA in P. aeruginosa (left) and P. syringae (right) at two growth phases. e Violin plot quantifying the differential pseudouridylation intensity (exponential phase Ψ intensity minus stationary phase Ψ intensity) of specific genes at two growth phases. f Distribution plot illustrating the spatial relationship between Ψ modification sites and Hfq binding peak centres during two distinct growth phases. The violin plot reveals that the distribution of minimum distances between Ψ sites and Hfq binding centres varies with statistical significance (p-value = 0.0014) between growth phases, as determined by an F-test. g The scatter plot depicts the relationship between Ψ modification intensity and gene expression levels (log10(TPM)) under two distinct growth conditions.

The structure and sequence clustering analysis of RNA molecules containing Ψ modifications.

a-d Sequence and structure clustering of 41-nucleotide RNA segments centered by Ψ modification sites with fraction values greater than 0.1. The circular visualization comprises three concentric layers: the inner layer displays the Ψ fraction value, the middle layer designates RNA type, and the outer layer represents the GC ratio of each 41-nucleotide RNA segment. e Multi-metric assessment showing the precision– recall curve (AUC = 0.899), F1 score curve, and ROC curve (AUC = 0.879) of the pseU_NN model on the 61-nt validation dataset. The right panel f shows the distribution of pseU_NN prediction scores on the 61-nt validation dataset. j,h Similar to panels e,f, multi-metric assessment showing the precision–recall curve (AUC = 0.888), F1 score curve, and ROC curve (AUC = 0.869) of the pseU_NN model on the 81-nt validation dataset, and the right panel shows the distribution of pseU_NN prediction scores on the 81-nt validation dataset.