Schematic of the BiG Mito-Split collection creation and microscopy screening.

To create a whole-proteome fusion library, the acceptor C-SWAT library where each open reading frame (ORF) under its native promoter (NATIVEpr) is tagged with a cassette (acceptor module) that contains URA3 marker and homology linkers (dark green) was crossed with a donor strain that carries a plasmid with the desired tag (3×GFP11) and genomically integrated mitochondrial marker MTSSu9-mCherry and a NAT selection marker. Following sporulation and haploid selection, recombination is initiated, and the acceptor module is swapped with the donor tag, following negative selection for the loss of URA3 marker. To assay mitochondrial localization, the resulting haploid C’- collection is deprived of its mtDNA on ethidium bromide and crossed with the BiG Mito-Split strain carrying GFP1-10 in its mtDNA, diploids selected on respiratory media supplemented with NAT. The diploid collection is imaged using an automated fluorescence microscope and only proteins with their C-termini localized in the matrix complement split-GFP and can be detected.

Visualization of mitochondrial proteins with the BiG Mito-Split collection.

(A) Scatter plots of correlation between normalized fluorescence intensities of each strain imaged in different conditions, the line and value of linear regression are shown. (B) Examples of fluorescent micrographs of strictly mitochondrial and dually localized proteins visualized by full-length fluorescent protein tags (C-SWAT mNeonGreen (mNG) library, left) or using the BiG Mito-Split collection (right). Quantification of the GFP signal in mitochondria marked by MTSSu9- mCherry in BiG Mito-Split strains relative to pooled controls without GFP11 is shown in the far right. (C) Comparison of the proteins visualized in this study with the previous microscopy studies, and the breakdown of newly visualized proteins into the ones previously found in another location, or never studied with high-throughput microscopy before. Scale bars 10 µm.

Comparison of the proteins visualized in the BiG Mito-Split collection with previous studies of the mitochondrial proteome.

(A) A Venn diagram showing a comparison of proteins defined from this work with manual annotations in the SGD and high-throughput proteomics reveals eight proteins previously not associated with mitochondria. (B) Mitochondrial proteins visualized in our screen (green) compared to all other mitochondrial proteins (grey, list compiled as in panel A) ranked by unified protein abundance (Ho et. al 2018). (C) Non-mitochondrial proteins ranked by unified protein abundance and split into a group with a high mitochondrial targeting signal prediction score (>0.7) or low score (according to (Monteuuis et al, 2019)) with the eight non-mitochondrial proteins visualized in this work (panel A) marked with circles. Only proteins with known unified abundance were analyzed (Ho et. al 2018).

Dually localized proteins and their targeting signals.

(A) Proteins visualized in this work and only found by high-throughput proteomics before tend to have high whole cell to mitochondria ratio (Vögtle et al, 2017) indicating that they are dually localized; only the proteins for which the ratio is known are plotted. (B) Three different types of potential dually localized proteins can be found: those predicted to have an N-terminal targeting signal and no alternative start codon, those with no targeting signal or alternative start prediction, those where an alternative start generates an echoform with high-scoring prediction; for each type one example is shown with its graph of the i-MTSL-score, canonical start codon (green dashed line), alternative start codon (magenta dashed line), and other similar proteins listed on the plot. (C) Western blot verifying the expression of GFP11-tagged Cha1 and Arc1 cloned into an expression plasmid with their canonical start codons under a heterologous promoter and transformed into a haploid BiG Mito-Split strain; three different clones from each transformation are shown; primary antibody used for decoration is shown on the right. (D) Confocal fluorescence microscopy of haploid BiG Mito-Split of the strains analyzed in (C), one clone is shown for each. (E) Gpp1 and Gpp2 i-MTSL start codons prediction (top), and schematics of generated constructs with mutated promotors and N-termini (bottom). (F) Fluorescence microscopy of the strains with mtDNA-encoded GFP1-10 where Gpp1 and Gpp2 were tagged with 3×GFP11 at the C-terminus (NATIVEpr) and then native promoter was substituted to TEF2pr, or TEF2pr followed by 3xHA tag, or MTSSu9-3×HA. (G) Left: sequence of the GPP1 promoter region with upstream on the canonical start codon (ATG, +1) showing the most upstream non-canonical start (ATC, -39) and an additional ATG (-21); right: fluorescence micrographs of the cells without GFP and of a haploid BiG Mito-Split strain where genomically-tagged GPP1-3×GFP11 has a native promoter or a TEF2pr integrated before the two non-canonical starts shown to the left. Scale bars are 10 µm (F,G) and 5 µm (D).

Protein topology visualized by BiG Mito-Split.

(A) Schematic drawing of the topology and approximate sizes of Complex IV and Complex III subunits as seen in a supercomplex structure (PDB:6ymx), mtDNA-encoded subunits are shown in purple, for all other subunits the first and the last amino acid number and position occurring in the structure are shown, C-terminal label is highlighted in bold and enclosed in a circle, proteins that we observe in the dataset are colored green, C-terminal label is colored red if the observation does not agree with the known topology. (B) All membrane proteins found in our dataset sorted by the number of TMDs and topology. Those where topology does not agree with our data are marked in red. Those with agreeing topology are marked in green. The two poorly studied proteins are marked in grey. (C) Mitochondrial GFP fluorescence intensity for Nat2-3×GFP11 and Spg1-3×GFP11 shown besides the proteomic data where peptide position and fractionation ratios S1/M and S2/M are shown in grey and black (high S1/M and S2/M mean the peptide is in the matrix; high S1/M and low S2/M mean that the peptide is in the IMS), TMD prediction is shown in green, AlphaFold2-predicted structures with TMD regions highlighted by dashed lines are shown. (D) Other proteins found in the matrix: Mcr1 is a transmembrane protein alternatively sorted into the OM and the IM, Osm1 is a soluble IMS protein sorted via a stop-transfer mechanism, Mix17 is a possible substrate of Mia40 that is subsequently imported into the matrix.

Quantitative analysis and normalization of GFP signal in different growth conditions.

(A) Median fluorescence intensity in mitochondrial regions was measured for each control strain, then all values were normalized by subtracting the average of all measurements (N=706) and divided by standard deviation (SD), yielding a normal distribution (top) where less than 1% of measurements deviated more than 3SDs from an average control; the same normalization was applied to all the other imaged strains using the average and SD of control measurements and producing normalized fluorescence score; if the score was between 3 and 5 it was designated as ambiguous and above 5 as confident. (B) Examples of images of two strains with confident and ambiguous scores along with histograms representing the distribution of raw intensities measured for each mitochondrial object in these strains (green or yellow shaded region) compared to the measurements in pooled controls (grey line). (C) Distribution of strains by the number of confident and ambiguous scores in six conditions: the majority of strains had a confident score in all conditions, second largest group had none, relatively few strains had different combinations of confident and ambiguous score numbers; to be added in the list of observed proteins for comparison with the other datasets, the strain had to demonstrate at least one confident and one ambiguous score or to be imaged one more time to confirm an ambiguous score. (D) The number of observed proteins differed between conditions possibly reflecting different fluorescence background and mitochondrial activity and to a lesser extent, specific regulation of individual protein expression and import. Scale bar 10 µm.

Proteins observed in this study compared to other works.

(A) The list of soluble matrix proteins was extracted from Vögtle et al. 2017 and numbers of proteins from that list observed in our study and in Mark et al. 2023 was depicted. (B) For each protein observed in our study its normalized fluorescence score in glucose is plotted against log2 protein abundance measured by proteomics in purified mitochondria (cells grown on glucose), certain functional groups that might have disproportionally higher or lower fluorescence signal are highlighted.

Comparison of observed inner membrane protein topology with other works and predictions.

(A) The available proteomic data from (Morgenstern et al. 2017) for the proteins depicted in Fig. 5B showing S1/M and S2/M rations for each detected peptide (high S1/M + low S2/M indicates IMS location, both ratios high indicate matrix location), and two TMD predictions from TopologYeast web server (which compiles TMD predictions by seven different algorithms), the X-axis corresponds to amino acid number. (B) An output for Nat2 and Spg1 proteins from TopologYeast web server.