Introduction

Intrinsically disordered regions (IDRs) in proteins participate in key cellular processes via interactions with globular protein domains (Dyson and Wright, 2005). Short linear motifs (SLiMs) mediate these interactions using contiguous stretches of 3 – 10 amino acids that undergo disorder-to-order transitions upon binding, typically resulting in interactions with equilibrium dissociation constants (KD) in the micromolar range (Tompa et al., 2014; Wright and Dyson 2015; Diella 2008; Davey et al., 2012; Van Roey et al., 2014). For most SLiMs, the sequence features necessary and sufficient for binding are incompletely understood. Cataloged regular expressions, such as those in the Eukaryotic Linear Motif (Kumar et al., 2022) and Motif Map of the Proteome (https://slim.icr.ac.uk/momap/) databases, capture the features that best typify known binders but also highlight the limitations of existing motif specifications, which are satisfied by many sequences that fail to bind (Bugge et al., 2020). Although the number of experimentally defined SLiMs has expanded in recent years through the targeted assessment of individual binding partners, more comprehensive, high-throughput approaches are required to refine these recognition patterns and to support the development of prediction tools (Davey et al., 2023; Kumar et al., 2024).

SLiMs mediate interactions in the conserved proteostasis pathway macroautophagy (Popelka, 2020), hereafter referred to as autophagy. This pathway involves the de novo formation of a double-membrane organelle known as the autophagosome, which transports cellular contents to the lysosome for degradation (Mizushima et al., 2008). Microtubule-associated protein 1A/1B light chain 3B (LC3B) and its homologs are critical at multiple stages of the autophagic process (Rogov et al., 2023) and participate via interaction with many partners that contain a SLiM termed the LC3-interaction region (LIR) (Birgisdottir et al., 2013). During autophagy, hundreds of LC3B proteins are covalently linked to the growing autophagosome through reversible conjugation to phosphatidylethanolamine embedded in the autophagosomal membrane (Kabeya et al., 2004). LC3B•LIR complexes, enhanced by multivalency and avidity at the membrane surface (Sawa-Makarska et al., 2014; Wurzer et al., 2015; Lee and Davis, 2024), tether cellular components to the autophagosome either directly as cargo or indirectly using selective autophagy receptors to ensure efficient degradation (Gubas and Dikic, 2022). LC3B•LIR interactions are also critical for autophagosome biogenesis, trafficking, and lysosomal fusion (Johansen and Lamark, 2020). Mutations that alter LC3B•LIR interactions have been linked to the development of many diseases, including neurodegenerative disorders, aging, and cancer (Ramesh Babu et al., 2008; Park et al., 2020; Fas et al., 2021; Brennan et al., 2022). LC3B also has important roles in LC3-associated forms of phagocytosis (Florey and Overholtzer, 2012), endocytosis (Heckmann et al., 2019), and micropinocytosis (Sønder et al., 2021).

LC3B is one of six hAtg8 proteins that are homologous to the yeast Atg8 protein and that share a common architecture consisting of a ubiquitin-like β-grasp fold with two additional N-terminal α-helices (Sugawara et al., 2004). The 14 kDa LC3B and its paralogs bind to the information-poor consensus LIR [FWY]0-X1-X2-[LVI]3 (Chatzichristofi et al., 2023), where X can be any amino acid, via two conserved hydrophobic pockets that form the LIR-Docking site (LDS). In its canonical binding mode, the aromatic residue [FWY]0 engages the first hydrophobic pocket (HP1), and the aliphatic residue [LVI]3 engages the second hydrophobic pocket (HP2) (Rogov et al., 2023). This degenerate LIR motif occurs ∼170,000 times in the human proteome, with ∼19,000 occurrences falling in regions of predicted disorder. It is unlikely that all of these sequences represent functional interaction sites; indeed, several proteins containing canonical LIRs fail to bind LC3B with measurable affinity (Chatzichristofi et al., 2023). Moreover, recent discoveries of binders that deviate from the canonical LIR motif have expanded the repertoire of hAtg8 binding modes, and thus increased the number of potential interactors. For example, Li et al., reported that LC3B binds to ankyrin-G using the LIR sequence 1991WIEF1994, wherein an aromatic Phe residue engages HP2 (Li et al., 2018), and Knævelsrud et al., found that sorting nexin-18 (SNX18) binds hAtg8 paralogs using an expanded five-residue motif (154WDDEW158) (Knævelsrud et al., 2013). The proteome-wide prevalence of binders that lack the canonical [FWY]0-X1-X2-[LVI]3 motif remains unknown.

Given that not all instances of the LIR motif bind to LC3B, and that not all LC3B interactors contain a LIR motif, we sought to understand determinants beyond the core LIR that contribute to binding. To discover these sequence determinants, we used a bacterial display assay to screen ∼500,000 peptides derived from the human proteome for those capable of binding to LC3B. We determined the binding affinity of 51 peptides from the screen and elucidated binding mechanisms that support SLiM-LC3B interactions for known and novel binders through structural and biochemical analysis of identified peptides and mutated variants. Our results expand the number and types of residues that can engage HP2 on LC3B and strongly support the role of N-terminal acidic residues in enhancing LIR•LC3B binding affinity, allowing us to design a synthetic LIR peptide with affinity comparable to that of the tightest known native LC3B binder.

Results

A high-throughput bacterial display screen identifies thousands of LC3B-binding peptides

To rapidly identify peptides capable of binding LC3B, we used bacterial surface display in combination with fluorescence-activated cell sorting (FACS). Following Hwang et al., we expressed ∼500,000 36-mer peptides spanning the human proteome on the cell surface of Escherichia coli via fusion to circularly permuted OmpX (Rice et al., 2006; Larman et al., 2011; Hwang et al., 2022).

To compensate for the low micromolar affinity of typical SLiM interactions, and to better mimic the avid interactions enabled by the high local concentration of LC3B conjugated to autophagosomal membranes in cells (Zaffagnini and Martens, 2016), we bound N-terminally biotinylated LC3B to tetravalent streptavidin-phycoerythrin (SA-PE). Using fluorescence-activated cell sorting (FACS), we quantified peptide expression levels using an allophycocyanin (APC)-conjugated antibody against an encoded N-terminal FLAG tag. Cells displaying peptides that bound to LC3B were identified and isolated based on their high SA-PE-to-APC ratio (Figure 1A). We validated our screening approach using well-characterized LC3B-binding peptides of various affinities (Supplementary Figure 1). Following five rounds of positive selection and one round of counterselection against non-specific binding to SA-PE (Figure 1B), cells in the binding gate were enriched ∼80-fold, from ∼0.7% of the naïve library to ∼57% of all peptide-expressing cells (Supplementary Figure 2).

Human peptidome library screening enriches known and novel LC3B-binding peptides.

(A) Schematic of FACS-based bacterial display enrichment screening. Peptides, 36 amino acids in length, that tile the human proteome were expressed as FLAG-tagged fusions to eCPX on the surface of E. coli. Expression was detected using the fluorescence signal from an allophycocyanin (APC)-conjugated anti-FLAG antibody. Binding to LC3B (grey), which was tetramerized through binding to streptavidin conjugated to phycoerythrin (SA-PE, pink), was detected using PE fluorescence. Peptide-expressing cells were sorted based on APC and PE fluorescence and sequenced. Panel created using BioRender.com. (B) Diverging bar chart plotting the number of unique sequences detected in each round of sorting that lacked (grey, left axis) or contained (right, blue) a LIR motif. Sort 2, a negative sort used to eliminate peptides nonspecifically bound to SA-PE, was not sequenced. (C) Peptide enrichment profiles across enrichment sorts. Black lines show trajectories for a random subset (0.1%) of all peptides that persisted to sort 6. Overlaid are the enrichment profiles for the best-enriching peptide (BLM552-570*), positive controls (FYCO11277-1312 and ATG4A363-398), and the worst-enriching peptide that reached sort 6 (EXTL3871-906) colored green, pink, yellow, and blue, respectively. Peptides sequences detailed in Supplementary Table 1. (D) Cumulative density function plot of the mean z-scores across sorts 4, 5, and 6. Mean z-scores for the 427 peptides of 12,158 that surpassed the threshold of 1.70 that were used to define the HC-set are colored in black, and select peptides are colored as in panel C. (E) Peptide enrichment profiles for the 427 peptides in the HC-set (black), with select peptides colored as in panels C-D.

In all, 12,158 peptides derived from 5,578 unique proteins were identified in the final round of library sorting, with individual peptides displaying a wide range of behaviors as assessed by their calculated enrichment ratio (ER) (Rubin et al., 2017) across successive sorts (Figure 1C, see Methods). To define a high-confidence set (HC-set) of LC3B binders for further analysis, we selected peptides with an mean ER at least 1.70 standard deviations (z-score ≥ 1.70, n = 427 peptides) above the mean ER calculated over the final three rounds of sorting (Figure 1D-E). This criterion captured four of the six peptides in our input library that were reported to bind LC3B with dissociation constants below 5 µM, as compiled in the LIRcentral database (Chatzichristofi et al., 2023) (Supplementary Figure 3A). In contrast, peptides in our input library annotated by LIRcentral as non-binding (KD > 50 µM) were strictly excluded from this group and were progressively depleted during sorting (Supplementary Figure 3B). Beyond these known interactions, this HC-set included peptides from sixteen proteins previously reported to co-immunoprecipitate with LC3B (Stark et al., 2006), most of which lacked mapped interaction sites (Figure 2A). We tested four such peptides (MAP1B823-858, ATG4A363-398, SCYL1640-675, and HEAT3377-412) for binding to monomeric LC3B by bio-layer interferometry (BLI) and observed robust binding with dissociation constants ranging from approximately 0.5 to 10 µM (Figure 2B, Supplementary Table 1).

Highly enriched peptides share annotations with LC3B and bind LC3B tightly.

(A) Peptides in the HC-set, annotated with various lines of evidence of binding to LC3B. For each protein, circles indicate: LIR - experimentally validated LIR motifs annotated in LIRcentral (Chatzichristofi et al., 2023); IP - proteins co-purified with LC3B, from BioGrid (Stark et al., 2006); GO – peptides with GO-annotations shared with LC3B, from Uniprot (The UniProt Consortium, 2023), with heatmap of enrichment z-score plotted at bottom. (B) Binding to monomeric LC3B assayed by BLI for peptides ATG4A363-398 (VPPAKPEVTTTGAEFIDSTEQLEEFDLEEDFEILSV), SCYL1640-675 (TADRWDDEDWGSLEQEAESVLAQQDDWSTGGQVSRA), HEAT3377-412 (EDPSDDEWEELSSSDESDAFMENSFSECGGQLFSPL), and BLM552-587 (DIDNFDIDDFDDDDDWEDIMHNLAASKSSTAAYQPI). Assay performed in triplicate, with error bars indicating standard deviation. Data fit to a standard binding isotherm: (C) Relationship between mean z-score over rounds 4-6 and affinity for monomeric LC3B as determined by BLI. Four peptides are colored as in Figure 1C. Shapes indicate five classes of peptides, as annotated in panel A. Dotted lines mark thresholds delineating high vs. low z-scores and binding vs. non-binding, with the number of peptides in each quadrant listed. Enrichment of binders among high z-score peptides was found to be statistically significant, as assessed using Fisher’s exact test (p < 0.001).

Strongly enriched peptides include novel candidate LC3B interactors

To identify peptide hits from our screen most likely to represent biologically relevant LC3B interaction partners, we selected proteins with shared Gene Ontology (GO) annotations (The Gene Ontology Consortium, 2023) with LC3B that contained peptides in our HC-set (Figure 2A, Supplementary Figure 4). From this list, we tested LC3B for binding to two candidate peptides derived from HERC13075-3110 and LRRK2858-893, noting that these proteins are reported to impact autophagy (Montes-Fernández et al., 2020; Pérez-Villegas et al., 2020; Park et al., 2016; Roosen and Cookson 2016; Madureira et al., 2020; Boecker and Holzbaur 2021), but have not been shown to directly interact with LC3B. Each peptide bound with a dissociation constant in the low micromolar range (Supplementary Table 1).

We additionally identified peptide hits without reported links to autophagy or LC3B. Many of these peptides bound LC3B with moderate affinity (KD ≤ 50 µM) and thus represent candidates for new LC3B-interacting proteins (Supplementary Table 1). In total, we determined monomeric LC3B binding affinities for 45 peptides spanning varying degrees of enrichment in our sort and found that the z-score cutoff of 1.70 used to define our HC-set effectively distinguished binders (KD < 50 µM) from non-binders (KD > 60 µM) (Figure 2C, Supplementary Table 2).

Enriched LIR-containing sequences exhibit a preference for Trp in the X0 position, preceded by acidic residues

To identify sequence features associated with binding, we generated a sequence logo (Schneider and Stephens, 1990; O’Shea et al., 2013) for peptides in the HC-set that contained a LIR motif (see Methods). This logo highlighted enrichment of tryptophan in the first position (X0), over tyrosine and phenylalanine, and a preference for acidic residues in positions X-1, X-2, and X-3, as well as for glutamate in the X1 position of the core LIR motif and at the C-terminal X7 position (Figure 3A). Consistent with this logo capturing features related to binding affinity, a synthetic peptide defined by the consensus sequence observed in the logo, pCONSLIR, bound to LC3B with a dissociation constant of ∼60 nM (Figure 3B). This affinity is ∼25-fold tighter than a well-studied peptide from FYCO11277-1312 (Olsvik et al., 2015; Cheng et al., 2016), ∼25-fold tighter than a chimeric peptide designed to target α-synuclein for autophagic degradation (Tong et al., 2023), and is comparable in affinity to the tightest known LC3B-binding peptide, ANK21578-1613 (Li et al., 2018).

Highly enriched peptides feature W-type core LIR motif flanked by acidic residues.

(A) Sequence logo derived from analysis of the HC-set peptides (see Methods), and plotted using logomaker (Tareen and Kinney, 2019). Acidic residues are colored maroon and those colored red are significantly enriched compared to the input library. Other residues are colored gray with [FWY]0 and [LVI]3 in black. The core LIR is highlighted in yellow. (B) BLI measurements of binding affinity between LC3B and peptides pCONSLIR (EEEVEEKEEEDDDEEWEILDIEEGSDSEQKLISE), ANK21578-1613 (VQSSRSERGLVEEEWVIVSDEEIEEARQKAPLEITE), and FYCO11277-1312 (DAVFDIITDEELCQIQESGSSLPETPTETDSLDPNA). Error bars report the standard deviation of two or more technical replicates. Data fit to a standard binding isotherm, as in Figure 1. (C) Bar chart plotting binding affinities (KD) of sequential truncations of pCONSw measured via BLI, with error bars reporting standard error of the mean. Brackets report statistical significance of pairwise t-tests (****p ≤ 0.0001; ns not significant). (D) Structure of BLM552-571* bound to LC3B, resolved to 2.2 Å resolution (see Supplementary Table 4). Here, and in panels E-G, the BLM peptide is displayed in stick representation (green backbone) and the LC3B surface is colored by hydrophobicity, as computed by ChimeraX (Meng et al., 2023). Hydrophobic pockets HP1 and HP2, and the N-terminal flanking residues, are indicated. (E) Inset highlighting contact between BLM E568 and LC3B R70, and BLM D569 and LC3B K30, near HP2 (black dashed lines). LC3B residues that form HP2 are annotated and shown in stick representation. (F) Inset highlighting docking of BLM W567 in LC3B HP1, with residues forming that pocket annotated and shown in stick representation. (G) Inset highlighting interactions between BLM N-terminal acidic residues D564 and D566 with R11, K49, and K51 of LC3B. Putative contacts within 4.5 Å marked with dashed black lines.

To test the role of the hydrophobic residue in position X0, we partitioned the sequence alignments based on the W/F/Y residue identity and generated three new peptides (pCONSW, pCONSF, and pCONSY) based on the consensus sequences (Supplementary Figure 5). Because Y-type LIR sequences were frequently located near the C-terminus in our HC-set, we lacked sufficient sequence diversity to generate a logo with C-terminal flanking sequences for this region, and the LIR motif of pCONSY was placed at the C-terminus of the peptide to reflect this positional bias. Peptide pCONSW bound to LC3B with similar affinity (KD∼90 nM) as peptide pCONSLIR (Supplementary Table 3). The affinities of pCONSF and pCONSY were both weaker (KD∼400 nM, KD∼10 µM, respectively) (Supplementary Table 3), consistent with tryptophan in the first position (X0) favoring binding.

LIR+ motifs support binding to LC3B.

(A) Affinities of LC3B for peptides from PAR118-37, CTSL2233-262*, and DYH12422-445*, as measured by BLI, plotted as a bar chart (mean ± s.e.m, n≥3). Candidate binding sites that were mutated to Ala are red in the mutated sequence. No detectable binding, up to 40 µM LC3B, is indicated with N.B. (B) AlphaFold3 (Abramson et al., 2024) structural prediction of CTSL2233-262* (colored by pLDDT) in complex with LC3B (colored by hydrophobicity). Insets: canonical LIR YYFI258-261 is not predicted to engage the LDS (left), in contrast to WEVF252-255 (right).

Mutations at the LC3B LIR docking site alter LIR-peptide binding specificity.

(A) Structure of LC3B bound to FYCO11276-1288 LIR peptide (PDB 5d94) (Olsvik et al. 2015) shown as a surface representation colored by hydrophobicity. Insets depict hydrophobic pockets formed by residues F52, L53 (stick representation) in wild-type LC3B (left), or a predicted surface of the LDS* mutant (right), with each alanine substitution colored red. (B) Enrichment profiles across three rounds of the LC3B LDS* enrichment sort, with nine peptides spanning the range of observed profiles colored and sequences provided. Four peptides that were enriched and bound with measurable affinity to LC3B LDS* are in blue. Known, canonical LC3B LDS binders that were depleted in the LDS* sort, are colored purple and red. Black lines indicate a random sample of 10% of peptides that persisted through LC3B LDS* sort 3. The histogram shows the frequencies of peptides with a given enrichment ratio in LDS* sort 3, with the dashed red line marking an enrichment ratio of 0. (C) Affinities of LC3B (blue) and LC3B LDS* (light red) for peptides FYCO11277-1312 and ATG4A363-398 and for four peptides found to enrich in across the LDS* sorts: PPM1H436-471, OSBL71-36, EFC13262-278*, TRIM5288-120. Error bars indicate standard error of the mean across three or more replicates. Hashed bars mark peptides without detectable binding at the highest measured LC3B concentration (40 µM).

We next tested the contributions of the highly enriched N- and C-terminal acidic residues in the context of the tightly binding pCONSW peptide. Truncation of C-terminal residues beyond position X5 had modest effects on binding, but removing residues N-terminal to position X-2 weakened affinity ∼7-fold (Figure 3C, Supplementary Table 3). Most of this reduced affinity could be restored by reintroducing either a single N-terminal Asp or a 6-residue EDDDDA sequence that lacked this Asp (Figure 3C, Supplementary Table 3). These data support an important role for flanking N-terminal acidic residues in facilitating high-affinity binding to LC3B, without a strict requirement for a specific residue at a specific site. Consistent with the key determinants of binding being proximal to the LIR, removing 7 residues from either the N- or C-terminus of the 36-residue peptide identified in the screen did not significantly impact affinity (Figure 3C, Supplementary Table 3).

Given the efficacy of this approach in predicting tightly binding peptides, we next evaluated whether a related method could distinguish LC3B-binding from non-binding peptides that contained LIR motifs. Briefly, we assembled a test set of 79 binders and 56 non-binders as annotated in LIRcentral (Chatzichristofi et al., 2023) and compared the discriminatory performance of the previously reported iLIR model (Kalvari et al., 2014) with new PSSM-based models derived from our screening data (see Methods). We also constructed higher-order random forest classifiers using our screening data that aimed to capture potential residue–residue coupling effects. On this test set (see Methods), no model performed better than the iLIR model (Supplementary Figure 6). Evaluating the models using held-out screening data gave auROC = 0.87 for recognizing LIR peptides with z-score > 2.3, but poor performance for less enriched peptides (Supplementary Figures 7), suggesting that the test set and/or screening data may be too limited, sparse, or noisy, or that key binding determinants depend on higher-order structure that our models fail to capture.

Acidic residues N-terminal of the core LIR directly contribute to LC3B engagement and binding affinity

Many of our top-scoring peptides exhibited features common to both pCONSLIR and pCONSW, namely the presence of Trp in X0, Glu in X1, and multiple N-terminal acidic residues. Indeed, the top-scoring peptide contained these features, and bound tightly to LC3B (KD∼0.7 µM; Supplementary Table 1). Many previously reported high-affinity LC3B binders, including FYCO11276-1288, ANK21588-1613, and ANK31985-2010, use an acidic residue in position X7 of the C-terminal extension to make an affinity-enhancing contact to LC3B residue R70 (Olsvik et al., 2015; Cheng et al., 2016; Li et al., 2018) (Supplementary Figure 8), yet our top scoring peptide, derived from the DNA helicase BLM, lacks such a residue. This observation prompted us to investigate how it achieved comparable binding.

We determined the structure of LC3B fused to BLM552-571 at 2.2 Å resolution using X-ray crystallography (Figure 3D-G, Supplementary Table 4, PDB 9p3e). BLM552-571* engaged the LDS of LC3B as expected, and the interactions between the peptide and LC3B were consistent with those observed for other canonical LIR motifs: the aromatic W567 (X0) and hydrophobic residue I570 (X3) docked in HP1 and HP2, respectively (Figure 3E-F). Additionally, we observed an intermolecular β-sheet between the LC3B residues 51-53 and BLM residues 568-570, which occupy the 1-3 positions of the [FWY]0-X1-X2-[LVI]3 LIR motif. Notably, the sidechain of BLM E568 (X1) contacted LC3B R70 (Figure 3E), seemingly substituting for the contacts often found between R70 and glutamates C-terminal to the core LIR (Supplementary Figure 8). Acidic residues N-terminal to the LIR, D565-D567 (X-3 – X-1), were in proximity to positively charged residues R10, R11, K49, and K51 of LC3B (Figure 3G), though they adopted multiple conformations as analyzed by ensemble refinement (Burnley et al., 2012) (Supplementary Figure 9, see Methods). Together, these observations indicated that acidic residues N-terminal to the core LIR contribute directly to LC3B engagement and binding affinity.

Highly enriched peptides lacking a canonical LIR can bind LC3B

We next asked whether sequences from our HC-set that lacked a canonical LIR could also bind LC3B, focusing on candidates with residues we predicted could engage the HP1 and HP2 pockets. We specifically examined peptides containing a motif we termed LIR+ ([FWY]0-X1-X2-[FWY]3), as Li et al., had previously reported an X-ray crystal structure showing a peptide bound to LC3B with aromatic residues W and F engaged with the hydrophobic pockets (Li et al., 2018). To this end, we assessed LC3B binding for three peptides in our HC-set bearing LIR+ sequences: (1) DYH12422-445*, which contains two LIR+ sequences; (2) PAR118-37, which contains a single LIR+ sequence, and (3) CTSL2233-262*, which contains both a canonical LIR and a LIR+ motif. We found that each bound LC3B with sub-20 µM affinity as measured by BLI. We then introduced alanine substitutions into each LIR+ sequence and observed reduced affinity for LC3B in each peptide, consistent with a role for the LIR+ motif in supporting LC3B association (Figure 4A). Notably, in CTSL2233-262*, mutation of the LIR+ sequence 246WEVF249 to AAAA strongly reduced binding, whereas analogous alanine substitutes in the canonical 252YYFI255 LIR motif had a modest effect, consistent with the LIR+ site serving as the dominant binding determinant. Supporting this interpretation, an AlphaFold3 (Abramson et al., 2024) prediction of a CTSL2233-262*•LC3B complex structure showed the LIR+ motif engaging the LC3B LDS (Figure 4B).

Considering the apparent binding efficacy of such LIR+ sequences, we assessed the relative frequency of canonical LIR, LIR+, and related motifs in our input library and in our HC-set. We found strong evidence for enrichment of LIR sequences, but not for these closely related motifs (Supplementary Figure 10), indicating that whereas such LIR-related motifs may support binding in some sequence contexts, they are not sufficiently strong binding determinants to be enriched in our screen.

Common mutations to the LIR docking site of LC3B modulate but do not ablate peptide binding

LC3B with mutations F52A and L53A in the LDS, here called LC3B LDS*, is commonly used to assess binding at the LDS, with these substitutions in HP2 and HP1 presumed to disrupt peptide binding at this interface (Figure 5A) (Behrends et al., 2010; Kraft et al., 2014; Qiu et al., 2017; Skytte Rasmussen et al., 2017; Marshall et al., 2019). Using LC3B LDS*, we performed three additional bacterial-display sorts with the round-5-enriched library as input (Supplementary Figure 11, see Methods). The well-studied peptide FYCO11277-1312 was depleted over the three rounds of LDS* sorting, consistent with its canonical LDS-dependent binding mechanism. Similar behavior was observed for other known LC3B-binding peptides, including KBTB7639-674, FUND11-36, NEDD4681-716, and ATG4A363-398 (Figure 5B). Unexpectedly, several other peptides were further enriched over these sorts (Figure 5B), consistent with increased affinity for the altered LC3B LDS* interface.

To investigate this further, we selected six peptides spanning the enrichment patterns observed in the LDS* sorts and quantified their binding to wild-type LC3B or LC3B LDS* proteins using BLI. The results revealed diverse binding behaviors: some peptides bound more tightly to wild-type LC3B (e.g., ATG4A363-398), whereas others exhibited strong selectivity towards the LDS* mutant (e.g., TRIM5288-120) (Figure 5C). Notably, structural modeling indicated that although the LDS* mutations alter the binding interface, the hydrophobic pockets persist (Figure 5A). Therefore, we propose that, rather than abolishing LDS-mediated interactions entirely, the F52A/L53A substitutions subtly remodel the docking surface, creating an alternative hydrophobic interface that modulates peptide binding in a sequence-dependent manner.

Discussion

LC3B and other hAtg8 paralogs often interact with binding partners via the LIR motif [FWY]0-X1-X2-[LVI]3, an information-poor SLiM that can mediate critical interactions in all major stages of bulk and selective autophagy (Rogov et al., 2023), as well as in non-autophagic processes such as LC3-associated phagocytosis (Florey and Overholtzer 2012) and LC3-associated endocytosis (Heckmann et al., 2019). Here, we used high-throughput bacterial-surface-display screening to broadly assess the LC3B-binding potential of peptides derived from the human proteome, and to define sequence determinants of LC3B recognition. We focused our analysis on a set of strongly enriched peptides (i.e., the HC-set), recognizing that we would miss many authentic but weakly interacting partners. Within the HC-set, we observed roughly threefold enrichment of peptides bearing traditional LIR sequences. Many peptides contained multiple LIR motifs, which may enhance apparent affinity through avid interactions with LC3B oligomerized through the tetravalent streptavidin linkage used in our assay, or through membrane conjugation in cells.

Overall, the vast majority (∼90%) of peptides in the HC-set contained a canonical LIR or a related sequence bearing hydrophobic or aromatic residues positioned to plausibly interact with hydrophobic pockets HP1 and HP2 on LC3B. We biochemically characterized one such motif [WFY]0-X1-X2-[WFY]3, which we termed “LIR+”, finding that it supported robust LC3B binding. Although a small number of LC3B-binding peptides bearing this LIR+ motif have been reported (∼1% of cataloged hAtg8-binding peptides) (Li et al., 2018; Chatzichristofi et al., 2023), their prevalence appears underappreciated given their relatively high frequency in the HC-set (Supplementary Figure 10). Further, the binding of LIR+ peptides observed here suggests that other motif variants, such as [ILV]0-X1-X2-[WFY]3 or [ILV]0-X1-X2-[ILV]3, may also engage LC3B and, consistent with this notion, such sequences were prevalent in our HC-set (Supplementary Figure 10).

The remaining peptides lacking any LIR-like motifs (∼10%) partitioned between those in which the two hydrophobic residues were separated by either 3 or 4 acidic amino acids (3 peptides), similar to the LC3B-binding “WDDEW” motif in SNX18 (Knævelsrud et al., 2013), those bearing the sequence “HPQ” (3 peptides), which is reported to bind to streptavidin (Weber et al., 1992; Katz 1995), and a set of peptides lacking linear motifs identifiable through expert-guided inspection or the automated motif discovery tool XSTREME (Grant and Bailey, 2021).

A sequence logo-based analysis of LIR-bearing peptides in our HC-set revealed enrichment of acidic residues, and biochemical measurements confirmed that acidic residues N-terminal to the core LIR enhance binding affinity without strict positional dependence. These observations complement and expand on previous studies of individual peptides derived from FYCO1 and p62/SQSTM1, for which LC3B binding is supported by N-terminal acidic residues (Pankiv et al., 2007). They are also aligned with a reported affinity-enhancing role for phosphorylated serine and threonine residues in the N-terminal flanking sequence of the core LIR motif (Richter et al., 2016; Wirth et al., 2021; Kliche et al., 2023). Indeed, given the impact of N-terminal acidic residues, it is plausible that phosphorylation of target proteins could dynamically modulate LIR•LC3B binding affinity, effectively linking kinase-dependent environmental signaling cascades to the regulation of autophagic cargo selection (Birgisdottir et al., 2013; Kliche and Ivarsson 2022; Rogov et al., 2023).

LIR databases catalog experimentally validated motifs (Chatzichristofi et al., 2023) and have been used to define a six-residue PSSM model for predicting binders to hAtg8 proteins and other Atg8 orthologs (Kalvari et al., 2014). Guided by sequence information from our screening data, we designed a 34-residue synthetic peptide with an affinity comparable to the strongest known interaction partners (Figure 3B, Supplementary Table 1). The high-affinity pCONSLIR peptide is characterized by an extended negatively charged N-terminal flanking sequence. Notably, the peptides exhibiting the strongest enrichment in our screen (i.e., z-score ≥ 2.3) and those described in the iLIR set of known binders shared this prominent enrichment of acidic N-terminal residues, as evidenced by these sequences co-clustering in UMAP space (Supplementary Figure 6D). However, peptides lacking these specific features can also bind LC3B, and others with similar sequence features do not bind LC3B, underscoring the complex relationship between peptide sequence and LC3B recognition that was not fully captured by the PSSM-based classifiers or by the random forest classifiers we deployed (Supplementary Figures 6-7).

Recently, the Vierstra group reported an alternate binding interface on Atg8 homologs (hAtg8) that they termed the Ubiquitin-interaction-motif Docking Site (UDS). This interface, which occurs on the face opposite that of the LIR docking site, is reported to interact with an alternative SLiM known as the Ubiquitin interacting motif (UIM), with the sequence Ψ-ζ-X-A-Ψ-X-X-S, where Ψ, ζ, and X denote small hydrophobic, hydrophilic, and any amino acids, respectively (Marshall et al., 2019). Peptides bearing this motif were present in our input library (2,452 peptides) yet only 4 were found in our HC-set, indicating that if an LC3B UDS exists, it was either inaccessible in our assay format or did not support sufficiently high-affinity binding.

Notably, we did not observe enrichment for UIM-bearing peptides even when screening in the context of LC3B constructs bearing mutations at the LIR docking site (LDS*). Instead, screening against LC3B LDS* revealed that peptides bearing canonical LIR sequences exhibited altered affinity for the mutant LC3B domain (Figure 5), consistent with these often-utilized mutations changing specificity but not ablating binding, contrary to what has been assumed (Behrends et al., 2010; Kraft et al., 2014; Qiu et al., 2017; Skytte Rasmussen et al., 2017; Marshall et al., 2019).

Together, the LC3B and LC3B LDS* bacterial display datasets presented here constitute a rich resource for the study of SLiMs and autophagy. These datasets reveal preferred residues flanking the canonical LIR motif, expand the repertoire of LC3B-binding sequences, and highlight important caveats in using the LC3B F52A L53A LDS* mutant to disrupt LDS-dependent interactions. Although peptides in our bacterial display system cannot fully capture the complexity of LC3B binding in cells— where SLiM accessibility, subcellular localization, and post-translational modifications may play key roles—our results provide a quantitative and structural framework for understanding LC3B recognition. Moreover, this work nominates new candidate LC3B partners for validation in cellular and physiological contexts and establishes a generalizable framework for high-throughput analysis of SLiM-peptide interactions with Atg8-family proteins.

Materials and methods

Expression and bacterial display vector construction

LC3B expression vectors

Coding sequences for LC3B and the LDS* mutant (F52A, L53A) were cloned into plasmid pDW363 (Tsao et al., 1996) downstream of an N-terminal Biotin Acceptor Peptide (BAP) and 6×His tag (GLNDIFEAQKIEWHE-DTGGSS-HHHHHH-GSGSG-[coding sequence]). The resulting plasmids (pl_ JD239, pl_JD362) also expressed the BirA biotin ligase for in vivo biotinylation of the BAP-fused LC3B. For BLI experiments, the BAP tag was removed to yield 6×His-tagged, non-biotinylated variants pl_JD361 (LC3B wild-type) and pl_JD363 (LC3B LDS*) (M-GSS-HHHHHH-GSGSG-[coding sequence]). For crystallization of LC3B bound to the BLM552-571 peptide, DNA constructs were designed to fuse the sequence DNFDIDDFDDDDDWEDIM N-terminal to LC3B, separated by a two-residue GS linker, and cloned into a pGEX vector containing an N-terminal GST and 3C cleavage site using Gibson assembly (Gibson et al., 2009), resulting in plasmid pl_JD371.

Peptide expression vectors

The coding sequence of each peptide was synthesized either as a gene block (IDT) or was encoded on primers (Eton Bio), each used to insert this coding sequence into a modified pDW363 vector using Gibson (Gibson et al., 2009) or ‘round-the-horn’ insertional cloning (Liu and Naismith, 2008). The resulting products fused each peptide to the C-terminus of a BAP–6×His–SUMO tag via a short (SG)2 linker.

Bacterial display vectors

The human peptidome bacterial display library is described by Hwang et al., and the peptide-encoding DNA was originally provided by the Elledge laboratory based on work of Larman et al., Briefly, each peptide is fused to the C-terminus of circularly permuted E. coli OmpX (eCPX) (Rice et al., 2006), with the peptide flanked by an N-terminal FLAG tag and a C-terminal cMyc tag. The resulting amino-acid sequence is:

Individual peptides analyzed as shown in Supplementary Figure 1 were cloned using Gibson assembly (Gibson et al., 2009) to replace the variable peptide sequence described above.

Protein expression and purification

Expression of LC3B proteins

LC3B proteins were expressed in E. coli Rosetta2(DE3) cells (Novagen) grown at 37 °C with aeration via shaking in 2×YT medium supplemented with ampicillin (100 µg/mL) and D-(+)-biotin (0.05 mM). When cultures reached OD600 ≈ 0.8, protein expression was induced by addition of IPTG (1 mM), and cells were shifted to 25 °C and incubated overnight with aeration before harvesting by centrifugation (5000×g, 15 minutes, 4 °C). Cell pellets were flash-frozen and stored at -80 °C until purification. The GST-3C-BLM552-571-GS-LC3B fusion protein was expressed as above using E. coli BL21(DE3) cells.

Expression of peptides

Peptide constructs were expressed in E. coli BL21(DE3) cells (New England Biolabs) cultured with aeration at 37°C in 20 mL Luria Broth (LB) supplemented with ampicillin (100 µg/mL) and D-(+)-biotin (0.05 mM). Expression was induced at an OD600 ≈ 0.8 using IPTG (1 mM) and, after 4-5 hours of expression at 37 °C, they were harvested and frozen as above.

Purification of LC3B proteins

Cell pellets were resuspended in buffer NB (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5 mM imidazole, 0.5 mM TCEP) supplemented with 0.2 mM phenylmethylsulfonyl fluoride (PMSF). Cells were lysed by 20 passes through a Dounce homogenizer followed by sonication on ice (4×2.5min cycles, each alternating 20 sec at 30% amplitude, 10 sec off). The lysate was clarified by centrifugation (15,000×g, 15 min, 4 °C) and ultracentrifugation in a 60Ti rotor (50,000 rpm, 1 hr). The supernatant was loaded onto a 5 mL Bio-Scale Mini Nuvia IMAC Ni-charged cartridge (BioRad 780-0811) equilibrated in buffer NB and eluted with a 0-100% linear gradient of buffer EB (20 mM Tris-HCl pH 7.5, 500 mM NaCl, 300 mM imidazole). Fractions containing LC3B, as identified by SDS-PAGE, were pooled, concentrated (3 kDa cutoff spin concentrator; Millipore), and further purified by size-exclusion chromatography on a Superdex 75 16/600 (Cytiva) column equilibrated with buffer GFB (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.5 mM TCEP, 10 % glycerol). Purified protein was concentrated to ∼1.3 mM, flash-frozen in liquid nitrogen, and stored at -80 °C.

Cell pellets bearing BLM552-571-GS-LC3B were resuspended in buffer GLB (20 mM Na2HPO4, 1.8 mM KH2PO4, 140 mM NaCl, 2.7 mM KCl, 0.2 mM PMSF; pH 7.3) and lysed and clarified as above. Supernatant was then loaded onto a GSTPrep FF 16/10 10 mL column (Cytiva) pre-equilibrated with buffer GB lacking PMSF, washed, and eluted with buffer GEB (50 mM Tris-HCl pH 8.0, 150 mM NaCl, 30 mM reduced glutathione). Fractions were analyzed by SDS-PAGE, and those containing the correct molecular-weight species were pooled. Cleavage was performed overnight at 4 °C using a 1:100 (w/w) ratio of 3C protease (gift of Dr. Andrew V. Grassetti) to substrate. The cleavage product was then purified via size-exclusion chromatography as above, concentrated to ∼1.4 mM, flash-frozen in liquid nitrogen, and stored at -80 °C in 50 μL aliquots.

Purification of peptides for BLI assays

Cell pellets were thawed on ice and lysed using 4 mL B-PER reagent (ThermoFisher) per gram of cells, supplemented with 0.2 mM PMSF. Lysed cells (∼1 mL) were rotated at 25 °C for 15 min before pelleting by centrifugation at 15,000 rpm for 15 min. Supernatant was added to 250 μL Ni-Sepharose High Performance resin (Cytiva) pre-equilibrated with buffer NB in a gravity flow column (Bio-Rad). Resin was washed three times with 1 mL buffer NB before elution with 2 mL of buffer EB. Purified products were flash-frozen and stored at -80 °C.

FACS sample preparation

FACS samples were prepared as previously described (Hwang et al., 2022) with slight modifications detailed below.

Bacterial display sample preparation

Briefly, MC1061 E. coli cells (Casadaban and Cohen, 1980) were transformed either with a control plasmid by chemical transformation or with the human peptidome library by electroporation. Transformed cells were cultured overnight with aeration at 37 °C in LB (5 mL) supplemented with chloramphenicol (25 µg/mL) and glucose (0.2% w/v). The following morning, ∼1×107 cells per sample to be analyzed (as estimated using OD600, assuming 5×108 cells/ mL/OD600) were pelleted (3,000×g, 5 min), and resuspended in Terrific Broth supplemented with chloramphenicol (25 µg/mL) to an OD600 ≈ 0.1. Cultures were grown with aeration at 37 °C until reaching an OD600 ≈ 0.5, at which point peptide expression was induced with 0.04 %(w/v) arabinose for 1.5 hr at 37 °C with aeration. Following induction, cells (∼1×107 per sample) were centrifuged (3,000×g, 5 min), washed once with phosphate-buffered saline (PBS; 10 mM Na2HPO4, 1.8 mM KH2PO4, 2.7 mM KCl, 138 mM NaCl; pH 7.4) containing 0.1 bovine serum albumin (0.1% w/v; BSA), and resuspended in PBS + BSA (0.1 %) to a final cell density of 4×108 cells/mL.

An anti-FLAG antibody conjugated to allophycocyanin (anti-FLAG-APC; PerkinElmer) was diluted 1:100 in PBS + BSA (0.1%), and 30 µL of solution was added per 1×107 cells (25 µL/sample) for each sample to be analyzed. Sample was rotated at 4 °C for 30 min in foil-wrapped 1.5 mL tubes, pelleted (3,000×g, 5 min) and washed with 500 µL PBS + BSA (0.1 %) to remove unbound antibody, and resuspended in 25 µL PBS per sample to a final cell density of ∼4×108 cells/mL.

LC3B sample preparation

Biotinylated BAP-6×His-LC3B proteins were bound to streptavidin-conjugated-phycoerythrin (SA-PE, ThermoF isher Scientific) at a 4.2:1 molar ratio in PBS+BSA (1.0 %) containing 4 mM dithiothreitol (DTT). Complexes were incubated in the dark with rotation for 15 min at 4 °C and prepared at 2× the desired final concentration. For all sorting experiments, LC3B-SA-PE was prepared to yield a final concentration of 1.68 µM monomeric LC3B (0.42 µM tetramerized LC3B•SA-PE).

FACS sample preparation

To prepare final samples, 25 µL of cells (1×107) in PBS was mixed 1:1 with 25 µL of pre-tetramerized LC3B (2×stock concentration) for a final volume of 50 µL, and incubated with rotation in the dark at 4 °C for 1 hr. Note that for the initial library, 200 µL of cells (∼1×108) were mixed 1:1 and treated as above. Samples were transferred to a 0.22 µm 96-well Multi-Screen HTS GV sterile filtration plate (Millipore) pre-washed with 500 µL PBS + BSA (0.5%) per well. Cell suspensions were vacuum filtered, washed with 200 µL PBS + BSA (0.1%) and resuspended in 400 µL PBS + BSA (0.1%) for control experiments shown in Supplementary Figure 1 or 4.2 mL PBS + BSA (0.1%) for library samples prior to FACS analysis or sorting.

Bacterial display sorting

Bacterial display libraries were analyzed and sorted on the BD FACSCanto II (analysis) and BD FACSAria 4 (sorting) instruments. To establish gating parameters for enrichment of LC3B-binding peptides, two positive controls (FYCO11264-1299 and ATG4B372-407) and a negative control (lacking the variable peptide segment) were analyzed. The selection gate was set to maximize recovery of positive cells while maintaining ≤ 0.4 % background from the negative population. Following fluorescence compensation to correct for spectral overlap between channels, the gate was held constant throughout all positive enrichment sorts.

In the initial sort, ∼8.6×107 cells were sorted – sufficient to oversample the input library by ∼200-fold. Sorted cells were collected into Super Optimal broth with Catabolite repression (SOC) and incubated for 1 hr at 37 °C with rotation before being transferred to 8 mL LB medium containing chloramphenicol (25 μg/mL) and glucose (0.2 % w/v). Cultures were grown at 37 °C with aeration until OD600 ≈ 0.4–0.8. Plasmid DNA was purified from these cells using a Monarch Miniprep Kit (New England Biolabs).

For each subsequent round of sorting, E. coli MC1061 cells were transformed via electroporation with 50 ng of the purified plasmid library DNA and analyzed as above, with sufficient cells collected to oversample the binding pool by ∼100-fold. For the second round (sort 2), which served as a counter-selection against nonspecific peptide binding to the assay components, D-(+)-biotin was incubated with SA-PE at a 4.2:1 molar ratio as described above, and a gate was selected that minimized the capture of positive-control peptides (Supplementary Figure 1).

LC3B LDS* samples were prepared in a similar manner as wild-type LC3B. To enable direct comparison, the binding pool from sort 5 was split and sorted in parallel on the same day against wild-type LC3B (sort 6) and LC3B LDS* (LDS* 1). Both binding and nonbinding populations from each experiment were collected for sequencing. Two additional LC3B LDS* enrichment sorting rounds were then performed to further enrich for LDS*-binding peptides, and both the binding and nonbinding populations were collected and miniprepped to isolate the plasmid DNA.

Next-generation sequencing (NGS) sample preparation

Miniprepped plasmid libraries from each round of sorting were assigned a unique 6-nucleotide index. The variable region of each library was PCR-amplified using a forward primer (ol_JD806) bearing the 5′ Illumina adaptor sequence (AATGATACGGCGACCACCGAGATCTACAC) and a sequence (GTGGCTCGGGAATTCCGCTGCGC) designed to anneal immediately 5′ of the variable peptide region of our bacterial display vector, and a reverse primer (ol_JD807-813) designed to anneal immediately 3′ of the variable peptide region that also contained one of the fourteen 6-nucleotide index sequences and a 3′ Illumina adaptor sequence. The amplicon preparation scheme and all primers are listed in Supplementary File 1.

Eleven cycles of amplification were performed using Phusion polymerase (NEB) with 1.25 µM primers under the following thermocycling conditions: initial denaturation at 98 °C for 45 s; denaturation at 98 °C for 15 s; annealing at 68 °C for 30 s; and extension at 72 °C for 60 s, followed by a final extension at 72 °C for 5 min. To minimize heteroduplex formation, two additional cycles of reconditioning PCR were performed with 1 µM primers using the same thermocycling conditions. PCR products were purified using a double-sided AMPure XP (Beckman Coulter) bead size selection (0.85X/0.5X) to exclude large amplicons and small adapter or primer dimers, retaining fragments within the desired library size range for Illumina sequencing. Samples were eluted in 11 μL of buffer NGS (10 mM Tris-HCl, pH 8.0).

DNA amplicon library purity, size, and quantity were assessed using a Bioanalyzer (Agilent Technologies). Indexed samples were pooled to generate a multiplexed library bearing samples from all wild-type LC3B and LC3B LDS* sorts, which was submitted to the MIT BioMicro Center for sequencing on a NextSeq500 (Illumina) platform using 150 bp paired-end reads.

NGS data processing and analysis

Demultiplexed sequencing data were merged with BBMerge (Bushnell et al., 2017) using an average quality score threshold of 20. Merged reads were used to quantify the number of clonal cells (i.e., cells displaying the same peptide) recovered across consecutive sorts. The frequency of each clone i collected in sort x, denoted as fi,x, was calculated as (ci,x / Nx) where ci,x is the number of sequencing reads mapped to sequence i in sort x and Nx is the total number of sequencing reads mapped in sort x, i.e., Nx = Σi ci,x. The enrichment ratio (ER) for each clone across the sorting trajectory was then calculated as ERi,x = log2(fi,x / fi,input) where fi,input represents the frequency of the same sequence in the naïve input library (Rubin et al., 2017).

The T7-pep library contains many apparent point mutations and frameshifts. To group related variants, sequences were clustered using ALFAT-Clust (Chiu and Ong, 2022) with default parameters. Within each cluster, the variant that persisted to the latest sort was selected as the representative sequence; when multiple variants persisted, the sequence with the highest read count in that sort was chosen. Clones with fewer than 10 reads in the input library were excluded from all analyses. Sequences were translated using Biopython (Cock et al., 2009) and mapped to the human proteome using NCBI BLASTP (Camacho et al., 2009).

All read count data for each sequence in each sort are available at the NCBI Sequence Read Archive (SRA) under BioProject ID PRJNA1276872. Supplementary File 2 contains the pre-collapsed data, and Supplementary File 3 contains the post-collapsed data used in this study, including amino acid sequences, raw counts, ER values, BLAST results, and z-scores. Supplementary File 4 is filtered for peptides that enriched through sort 6 and includes AlphaFold2 (Jumper et al., 2021) pLDDT scores estimating structural disorder within each peptide region.

LC3B and LC3B LDS* scoring metrics

Following NGS data processing described above, 487,021 unique peptide sequences were identified in the input library, with 12,158 remaining in the terminal LC3B sort 6. Among these, clones were placed in the HC-set if the mean z-score across sorts 4-6 was ≥1.70. The z-score for clone i in sort x was defined as: zi,x = [(ERi,x - μx) / σx], where ERi,x is the enrichment ratio of peptide i in sort x and μx and σx are the mean and standard deviation ER values for all peptides in that sort. The threshold of 1.70 was chosen based on BLI affinity measurements to minimize the false positives (Figure 2C). Z-scores for all clones that reached sort 6 are provided (Supplemental File 4).

To evaluate peptide behavior in the LC3B LDS* sorts, three complementary metrics were applied. First, enrichment trajectories were examined to identify clones that had enriched through the canonical LC3B sort 5, and that continued to enrich through the three LDS* sorts despite disruption of the canonical LIR-docking site. Second, the enrichment ratio was calculated using sort 5 as the input library to compare peptide enrichment between the first LDS* sort (ER LDS*i,1 = freq(LDS*i,1) / freq(sort 5)) and the canonical sort 6 (ERi,6 = freq(sort 6) / freq(sort 5)). Promising clones were those that were enriched in the LDS* condition and depleted in the canonical condition (ER LDS*i,1/ ERi,6 > 1,), consistent with binding in the absence of the wild-type LDS pocket. Finally, the ratio of clone frequencies in the binding versus non-binding gates (B/NB) was calculated for each LDS* sort and for the canonical LC3B sort 6. Clones showing progressive increase in the B/NB ratio across the LDS* sorts were considered selectively enriched for binding to LC3B LDS*. Top-scoring clones across these metrics were manually selected and tested for binding to LC3B and LC3B LDS* by BLI, as detailed below. Peptides that bound LC3B LDS* scored highly across all three metrics. All LDS* enrichment data and associated scoring metrics are provided (Supplemental File 5).

Identification of reported interaction partners of LC3B

To assemble a set of high-confidence LC3B-binding peptides and their corresponding human proteins, the LIRcentral database (Chatzichristofi et al., 2023) was used to extract entries (n=115) experimentally verified to bind one of the hAtg8 proteins. Of these, 27 were reported to bind LC3B, with 10 such peptides reported to bind with KD ≤ 50µM being present in our input library. The six peptides from this subset that appeared in our HC-set are indicated in the “LIR row” of Figure 2A.

In parallel, human proteins reported to co-immunoprecipitate with LC3B (n=550) were retrieved from the BioGrid database (Stark et al., 2006), and peptides related to those proteins that were present in our HC-set are annotated in the “IP row” in Figure 2A. To assess whether candidates in the HC-set may have biologically relevant interactions with LC3B, Gene Ontology (GO) terms for proteins corresponding to each peptide present in sort 6 were extracted from UniProt (The UniProt Consortium, 2025) and cross-referenced with GO terms associated with LC3B. Those proteins sharing a GO term with LC3B are noted in the GO row of Figure 2A, with the GO terms shown in Supplementary Figure 4.

Biolayer interferometry (BLI)

BLI experiments were performed on an Octet Red96 instrument (ForteBio). Purified, biotinylated SUMO-fused peptides were immobilized on Octet SA Biosensors (Sartorius) and loaded until a response level of at least 0.3nm was reached. The ligand-loaded tips were then incubated with increasing concentrations of target protein (e.g., LC3B) in buffer BLI (25 mM Tris, pH 7.5, 150 mM NaCl, 0.5 mM TCEP, 0.05 % Tween-20, 0.5 mg/mL BSA) in a 200 μL reaction conducted in a 96-well flat-bottom polypropylene microplates (Greiner, #655209). Each measurement was repeated 2 – 4 times at 25 °C at an agitation speed of 1000 RPM. Equilibrium dissociation constants (KD) for the measured interactions were determined by plotting the equilibrated signal, after subtracting the negative control (biotinylated SUMO-only), as a function of protein concentration. The data were fit by nonlinear regression to a one-site binding model, according to the equation:

Structure determination by X-ray crystallography

Crystals of the purified BLM552-571-GS-LC3B fusion protein were grown in hanging drops containing buffer BXC (0.1 M HEPES, pH 7.3, PEG-3350 (30 % w/v), and 0.32 M MgCl2, by mixing 1 μL of protein with an equal volume of reservoir solution; crystals appeared within 7 days. X-ray diffraction data were collected on a Rigaku Micromax-007 rotating anode equipped with Osmic VariMax-HF mirrors and a Rigaku Saturn 944 detector. Diffraction data were processed with the XDS suite (Kabsch, 2010).

The structure was solved by molecular replacement in Phaser (McCoy et al., 2007) using LC3B (pdb 3vtu) (Rogov et al., 2013) as the search model. The molecular replacement solution was refined in PHENIX (Liebschner et al., 2019) with manual fitting in Coot (Emsley et al., 2010), and further geometry optimization in Rosetta (DiMaio et al., 2013). The model was refined to Rwork/ Rfree=0.216/0.262. The X-ray data collection and refinement statistics are summarized in Supplementary Table 4, and the coordinates have been deposited in the Protein Data Bank (pdb 9pe3). Ensemble refinement as implemented in PHENIX (Burnley et al., 2012) was carried out using pTLS values (fraction of atoms included in TLS fitting) ranging from 1.0 to 0.5, with 0.6 yielding the lowest Rfree value.

Sequence logo-based analysis

We used pLogo (O’Shea et al., 2013) to visualize the enrichment of residues in peptide binders, compared to the input library background, and designed a series of consensus-sequence peptides based on this analysis. pLogo estimates the statistical significance of residue enrichments using foreground and background distributions. To define the background, we identified input-library sequences matching the LIR motif and included 15 residues on either side of the motif. For peptides that lacked 15 flanking residues around the LIR, we appended the N-or C-terminal flanking sequence that was present in the experimental display construct. To define a set of strongly enriching binders for the foreground, peptides that reached sort 6 were clustered (k=10), based on their enrichment profiles, using Clust (Abu-Jamous and Kelly, 2018). The sequences in the highest-enriching cluster were used as the foreground, using the LIR plus 15 flanking residues on either side, as above. The foreground and background sequences were aligned on the LIR and used as inputs to pLogo. We repeated this analysis four times, for the canonical [FWY]0-X1-X2-[LVI]3 LIR motif and for LIR motifs with position X0 restricted to F, W, or Y, generating the images in Supplementary Figure 5. For consensus peptide design, the residue with the highest log-odds of the binomial probability at each position of the corresponding 34-residue logo was selected, generating peptides pCONSLIR, pCONSF, pCONSW, and pCONSY, which are listed in Supplementary Table 3.

LIR sequence sets for PSSM and random-forest model building

To build and assess models for classifying LC3B-binding and non-binding peptides, five different sequence sets were defined, each matching the 7-residue motif X-3X-2X-1[FWY]0X1X2[FWYILV]3: screen_binder and screen_nonbinder from our screening data, test_binder and test_nonbinder sequences from LIRcentral (test sets) (Chatzichristofi et al., 2023), and iLIR_binder sequences from iLIR (Kalvari et al., 2014). To define the iLIR_binder set, LIR-containing sequences were downloaded from the iLIR webserver. Because the iLIR motifs are only 6 residues long, we extended the 6-mers to 7-mers using full-length sequences from the UniProt reference proteome (The UniProt Consortium, 2025). To define the test_binder set, we extracted 76 motif-matching 7-mer sequences annotated to bind LC3B in LIRcentral and that were not present in the iLIR_binder set, then added three peptides from our screen that we measured to bind LC3B with KD ≤ 14 µM using BLI (Supplementary Table 1). To define the test_nonbinder set, we used 49 motif-matching 7-mers reported by LIRcentral to not bind LC3B (also absent from the iLIR_binder set), then added seven peptides from our screen with no detectable binding to LC3B by BLI (Supplementary Table 2). In total, test set contained 79 binder peptides and 56 non-binder peptides.

To define the screen_binder and screen_nonbinder sets, we selected input-library sequences bearing exactly one LIR motif (as above), appended “PLR” to the N-terminus, which was present in the experimental display construct, extracted 7-mers centered on the LIR motif, and removed any sequences also present in the iLIR_binder, test_binder, or test_nonbinder sets. Remaining 7-mers in the HC-set formed the screen_binders set; those that were depleted in sorts 1 and 3 and then dropped out of the screen during any subsequent sort formed the screen_nonbinders set. Note that any duplicate 7-mers found within or between the screen_binder and screen_nonbinder sets were removed from both sets. The final sequence set from the screening data consisted of 151 screen_binder sequences and 2,844 screen_nonbinder sequences. For modelling, we additionally used ranges of z-scores to define screen_binder subsets: screen_binder_all (z-score ≥ 1.7, 151 sequences), screen_binder_low (1.7 ≤ z-score < 2.3, 101 sequences), and screen_binder_high (z-score ≥ 2.3, 50 sequences). Logos for the screening sets are shown in Supplementary Figure 6C.

For the model building and evaluation, screening data were repeatedly (100×) split into train/test partitions. Each iteration selected 7-mers at random without replacement to form test sets, with 15 from the screen_binder_high test subset, and 30 from the screen_binder_low test subset; we designated the combined 45 sequences as the “all” screen_binder test set. An equal number of nonbinder sequences from the screen_nonbinder set was selected to complete the test sets. For models built from the screening data, the sequences that were not selected for the test sets were used to train models or build PSSMs. Thus, for each train-test iteration, three completely non-overlapping test and training sets were defined, which were used to train and evaluate models.

PSSM classifier model construction

PSSM classifier models were generated using a 7-residue spanning the three N-terminal flanking residues and the core LIR motif (i.e., X-3X-2X-1[FWY]0X1X2[FWYILV]3). Each model used one of the binder sets (defined above), as the foreground and the amin-acid frequencies from the Homo sapiens reference proteome (UP000005640) as the background distribution. Position-specific scores were calculated as log-odds values:

where wi,c is the position-specific score for residue c at position i, pi,c is the frequency of residue c at position i in the set of foreground sequences, and bc is the background frequency of residue c in the proteome. The total PSSM score for a sequence was the sum of position-specific values across the 7-residue window. An analogous model (iLIR27) was generated using the 27 binders that formed the basis of the published iLIR model (Kalvari et al., 2014) as the foreground sequences.

Random-forest classifier model construction

Random-forest and balanced random-forest classifiers were trained using RandomForestClassifier from scikit-learn (Pedregosa et al., 2011) and BalancedRandomForestClassifier from imbalance-learn (Lemaître et al., 2017), where we noted that unlike the PSSM model from above, these models have the potential to capture higher-order effects, including coupling between sites. The training set consisted of the 2,844 nonbinder and 151 binder sequences defined above. We leveraged the PSSM matrix detailed above to encode each sequence. A PSSM-encoded sequence (X) is generated by replacing the one-letter amino acid code in a sequence with its corresponding PSSM score in the matrix. Mathematically, the ith encoded position of a sequence (Xi) is:

where i denotes the ith position in an encoded peptide sequence, j represents the column in the PSSM matrix that corresponds to the amino acid appearing at ith position in a sequence. To handle data imbalances between screening binders and nonbinders, a downsampling technique was used to randomly select nonbinders to match the total number of binders. Cross-validation (5-fold) was applied to select the best hyperparameters from the parameter space for random forest (n_estimators: 4-200, criteria: [gini, entropy], ccp_alpha: 0.004-0.014) and balanced random forest models (n_estimators: 4-200, criteria: [gini, entropy], ccp_alpha: 0.004-0.014, sampling_strategy: all, replacement: True, bootstrap: False). The best model was selected based on the mean ROC-AUC and retrained with all training data.

Acknowledgements

We thank A. Ghanbarpour for help with crystallographic structure refinement. We thank L. Kinman, D. Cui, and B. Powell for helpful discussions. Research reported in this publication was supported by the National Institutes of Health under Award Numbers R35GM149227 (AK), R01GM144542 (JHD), R00AG050749 (JHD) and the Smith Family Odyssey Award (JHD).

Additional information

Data availability

Datasets generated during this study have been deposited in the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA). Raw reads can be found under BioProject ID PRJNA1276872. The structure of LC3B bound to BLM552-571 is deposited in the Protein Data Bank with accession number 9p3e. Supplementary Files 1-5 are available at Zenodo (10.5281/zenodo.17943925). Python processing scripts are available upon request.

Author contributions

J.E.K., J.H.D., and A.E.K. designed research; J.E.K. performed screening, data analysis, and biochemical experiments, and contributed new reagents and analytic tools; C.L., J.E.K., and J.C.H. developed and tested models; D.L. collected and refined crystallography data; J.E.K., J.H.D., and A.E.K. wrote the paper.

Funding

HHS | National Institutes of Health (NIH) (R35GM149227)

  • Amy E Keating

HHS | National Institutes of Health (NIH) (R01GM144542)

  • Joseph H Davis

HHS | National Institutes of Health (NIH) (R00AG050749)

  • Joseph H Davis

Smith Family (Odyssey Award)

  • Joseph H Davis

Additional files

Supplementary Information