Figures and data

Evolutionary ecology of insect OR repertoires.
(A). Phylogeny of 115 insect species (see Methods and Data S4 for details). Genome size and the number of OR genes for each species are shown as bar plots. Ecological parameters, including larval and adult diets, habitat, and circadian rhythm, are displayed as labels with color coding corresponding to the legend below. Different geological periods are separated by shaded intervals. The tree was visualized using the iTOL (Letunic and Bork 2024). (B). Boxplot showing the number of OR genes for each insect order. (C). Relationship between genome size and OR gene number across 114 insect species. Each point represents one species. The gray shading indicates the 95% confidence interval. (D-E). Results of two-sided pGLS tests assessing the relationship between intact OR gene numbers and various ecological parameters: adult diet (D), larval diet (E), circadian rhythm (F), and habitat (G). Dots represent OR counts of each species. N represents the counts of species. Animal silhouettes were obtained from PhyloPic.org.
© 2014, Descouens and Keesey. The Rhyacophila dorsalis silhouette is reproduced from phylopic, and available under a CC BY-SA 3.0 license.
© 2013, Stemonitis and Keesey. The Petrobius maritimus silhouette is reproduced from phylopic, and available under a CC BY-SA 3.0 license.

Subfamily classification of ORs based on SSN.
(A). Schematic representation of the “trunk-branch” strategy pipeline for constructing the similarity network and classifying subfamilies. First, all ORs were subjected to greedy clustering based on sequence similarity to generate nodes. The average similarity between nodes was then calculated, and nodes were treated as units to identify core regions (trunks) using a strict similarity threshold. By gradually relaxing the threshold, new core regions and additional nodes (branches) emerged. After a final merging step based on defined criteria, the optimal community structure was determined using modularity as an index (see Methods). (B). Distribution of pairwise sequence similarity among all ORs. The red line indicates the mean similarity. (C). SSN of ORs constructed using the “trunk-branch” strategy. Each node represents ORs sharing >60% sequence identity. Edges represent node pairs with an average sequence alignment E-value of ≥ 1 × 10⁻²⁵. Node colors indicate SeqC classifications. (D). Heatmap depicting the number of ORs within different SeqCs across various insect orders. Due to substantial differences in OR numbers among orders, the values are ln-transformed. The bar plot above the heatmap shows the number of SeqCs present in each order, while the bar plot on the right indicates the proportion of ORs within each SeqC generated by tandem duplication. Two ORs located within 5 kb of each other on the genome were defined as tandem duplicates. TD: Tandem Duplication.

StSN and structural community diversity of ORs.
(A). Sequence similarity versus structural similarity of insect ORs, assessed using pairwise comparisons with Dali. Each dot represents comparison between a pair of ORs, with color indicating point density. (B). StSN of ORs constructed using the “trunk- branch” strategy. Nodes represent ORs with pairwise Dali Z-scores >50. Edges connect nodes with an average pairwise Z-score of at least 42.9. Node colors denote StrCs classifications. (C). Structural comparison of ORs and Orco. The structures were aligned using US-align, with multiple structural alignments performed for annotated Orco and OR structures (visualized in PyMol). Colors represent pLDDT scores provided by AlphaFold2, indicating the confidence of structural modeling. (D). Visualization of differences among StrCs. The heatmap shows the number of ORs within different StrC across insect orders, with OR counts ln-transformed due to large inter-order variations. The bar plot above the heatmap indicates the number of StrC in each order. On the right, a ridge plot illustrates the flexibility rates of different structural regions across StrCs. Flexibility rate = 1 - conservation rate. Light gray represents transmembrane regions, white denotes loop regions, and dark gray marks the anchor domain of ORs. Colors correspond to StrCs, with gray indicating StrCs without interactions. (E). Two types of ORs in StrC17 — long IL3 ORs and general ORs — with the IL3 regions highlighted in color. (F). Phylogenetic analysis of StrC17 ORs based on amino acid sequences. Red circles indicate bootstrap values, with most ORs collapsed based on bootstrap support. Gray branches represent the clade of long IL3 ORs, and the dN/dS values are shown in blue on the corresponding evolutionary branches.

Drivers promoting the early origin of Orco.
(A). Differences in EL2, IL2, and binding pocket properties between M. hrabei and T. domestica ORs. Only ORs with closed binding pockets were analyzed, and all scores were normalized. Circles and triangles represent the minimum and maximum scores after column normalization, respectively. The presence of an EL2 β-sheet is indicated with a check mark. (B). Differences in amino acid properties of the EL2 region between OR and Orco. Hydrophilic amino acids are shown in blue, neutral amino acids in green, and hydrophobic amino acids in black (visualized by WebLogo (Crooks et al. 2004)). The x-axis numbers represent amino acid positions from the multiple sequence alignment, with highly gapped positions removed. (C). MD simulation-based analysis of the movement trajectory of VOC (geranyl acetate) near Orco and OR. The red protein represents ApisOR5, and the blue protein represents ApisOrco. The green area indicates the probability of VOC remaining above the binding pocket, while the pink represents its probability of occupying other regions of the protein (see Methods). Statistical significance was evaluated using a Chi-Squared Test. The red pentagram indicates the initial position of geranyl acetate in the simulation. (D). Changes in relative calcium fluorescence intensity (ΔF/F0) in cells. Cells were stimulated with 100 μM of the OR47a agonist pentyl acetate (PA) at 50 s and 100 μM the Orco agonist VUAA1 at 180 s. el2.DmOR47a: OR47a mutant in which the EL2 region of OR47a was replaced with that of Orco. mut.DmOr47a: the EL2 region of wild-type DmOr47a contains an inserted short peptide. The total number of cells (n = 30) was derived from three independent replicate experiments. The shading indicates the 95% confidence interval. (E). Relative calcium fluorescence intensity in cells following stimulation with PA and VUAA1, respectively. n=30. Statistical significance was determined using Tukey’s HSD test. (F). Schematic illustration of the mechanism by which the β-sheet structure in the EL2 region of Orco influences VOC proximity to the binding pocket. (G). Comparative analysis of binding pocket parameters (volume, hydrophobicity score, and polarity score) between Orco and ORs across all studied insect species. Statistical significance was determined using a two-sided t-test. **** P < 0. 0001.n.s. no significant.

Classification of insect OR functional communities and the relationship between ecological parameters and the VOC recognition capability of ORs.
(A). 2D odor space based on molecule physicochemical properties. Each dot represents an odor molecule, with different colors indicating different functional groups. (B). Relationship between hit rate and docking score, inferred using Bayesian statistics. The black curve shows the distribution of hit rates and docking scores across all OR-VOC docking results. The top plateau has a hit rate of 22%, while the bottom plateau has a hit rate of 0%. Cyan dots represent the mean hit rate ± s.e.m. for each docking score interval. The orange curve indicates the peak hit rate at a docking score of -8 kcal/mol. The gray curve represents posterior distributions from Bayesian inference, with n = 500. (C). FSN and FunC of ORs. Nodes represent ORs with pairwise PCCs > 0.6 based on docking scores. Edges represent node pairs with an average PCCs ≥ 0.48. (D). Functional labels for the recognition of different functional group small molecules by each FunC. Red indicates a tendency to recognize a higher number of odor molecules from a particular functional group, while blue indicates to recognize fewer. Circle size represents the degree of this tendency. (E). Correlation between four ecological parameters and species’ BBI for different VOC functional groups. pGLS tests were used for significance assessment. Red circles represent significant correlations, with size indicating the degree of significance. Gray triangles denote non-significant results. (F). Distribution of BBI values for different VOC functional groups across species with different larval diets. (G). BBI values for amine VOCs recognized by species with different larval diets. (H). 2D distribution of BBI values across species with different life habit combinations. The density distributions of different clusters along PCA1 and PCA2 are shown at the top and right. (I). Summed BBI values for all functional groups across three life habit combinations. One-way ANOVA followed by Bonferroni post-hoc tests was used for significance analysis. (J). Binary logistic regression coefficients for different life habit combinations. ****P < 0.0001

Relationships between OR sequence structure and functional communities across insect orders.
The phylogenetic tree follows the structure from Fig.1. On the right, the mapping relationships between all insect OR SeqCs StrCs and FunCs are shown, considering only ORs classified into communities. Time nodes for different geological periods are based on previous study (Misof et al. 2014) and are separated by gray shading. The timeline beneath the tree marks the origin of various insect orders. Animal silhouettes were obtained from PhyloPic.org.
© 2014, Descouens and Keesey. The Rhyacophila dorsalis silhouette is reproduced from phylopic, and available under a CC BY-SA 3.0 license.
© 2013, Stemonitis and Keesey. The Petrobius maritimus silhouette is reproduced from phylopic, and available under a CC BY-SA 3.0 license.