Production of most eukaryotic mRNAs requires splicing of introns from pre-mRNA. The splicing reaction requires definition of splice sites, which are initially recognized in either intron-spanning ('intron definition') or exon-spanning ('exon definition') pairs. To understand how exon and intron length and splice site recognition mode impact splicing, we measured splicing rates genome-wide in Drosophila, using metabolic labeling/RNA sequencing and new mathematical models to estimate rates. We found that the modal intron length range of 60-70 nt represents a local maximum of splicing rates, but that much longer exon-defined introns are spliced even faster and more accurately. Surprisingly, we observed low variation in splicing rates across introns in the same gene, suggesting the presence of gene-level influences, and we identified multiple gene level variables associated with splicing rate. Together our data suggest that developmental and stress response genes may have preferentially evolved exon definition in order to enhance rates of splicing.
Drosophila S2 cell 4sU RNA-seq dataGene Expression Omnibus (GEO) accession GSE93763.
- Telmo Henriques
- Adam Burkholder
- Karen Adelman
- Athma A Pai
- Christopher B Burge
- Athma A Pai
- Kayla McCue
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Timothy W Nilsen, Case Western Reserve University, United States
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
The locus coeruleus (LC) houses the vast majority of noradrenergic neurons in the brain and regulates many fundamental functions including fight and flight response, attention control, and sleep/wake cycles. While efferent projections of the LC have been extensively investigated, little is known about its local circuit organization. Here, we performed large-scale multi-patch recordings of noradrenergic neurons in adult mouse LC to profile their morpho-electric properties while simultaneously examining their interactions. LC noradrenergic neurons are diverse and could be classified into two major morpho-electric types. While fast excitatory synaptic transmission among LC noradrenergic neurons was not observed in our preparation, these mature LC neurons connected via gap junction at a rate similar to their early developmental stage and comparable to other brain regions. Most electrical connections form between dendrites and are restricted to narrowly spaced pairs or small clusters of neurons of the same type. In addition, more than two electrically coupled cell pairs were often identified across a cohort of neurons from individual multi-cell recording sets that followed a chain-like organizational pattern. The assembly of LC noradrenergic neurons thus follows a spatial and cell type-specific wiring principle that may be imposed by a unique chain-like rule.
Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences score as well as natural sequences, for homology, coevolution and structure-based measures. For large protein families, our synthetic sequences have similar or better properties compared to sequences generated by Potts models, including experimentally-validated ones. Moreover, for small protein families, our generation method based on MSA Transformer outperforms Potts models. Our method also more accurately reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.