Processing of strawberry and fermentation data before low dimensional embedding.

(A) The strawberry data contains under-ripe, ripe and overripe data sets which share 13 common odors. The three data sets are integrated by selecting the common odors. (B) The fermentation data is sparse, and therefore we omitted odors with less than 20 non-zero values. The concentration values in both datasets were log-transformed before doing low dimensional embedding.

Hyperbolic geometry describes strawberry and fermentation data.

We performed hyperbolic multi-dimensional scaling (HMDS) with a different Rmodel for the strawberry (A) and fermentation (B) datasets, plotting the curvature of Shepard diagram fitting versus Rmodel. The violin plots show the curvature statistics from 100 repetitions of the embeddings. The details of the method were previously described43. (C) 3D pairwise distances plots of 3D Euclidean embedding of strawberry data; the curvature index of the fitting function y = αxk+1 is indicated in the panel. (D) 3D hyperbolic embedding of strawberry data with Rmodel= Rdata= 2.11. (E) 3D Euclidean embedding of fermentation data. (F) 3D Hyperbolic embedding of fermentation data with Rmodel = Rdata = 2.77. (G). Visualization of strawberry data using the first two components of PCA. The pink, red and dark red colors represent under-ripe, ripe and overripe strawberry samples. (H) 2D h-SNE embedding of strawberry data. (I) The distance plot of PCA embedding. (J) The distance plot of h-SNE embedding.

h-SNE embedding and spiral fitting of strawberry data.

(A) h-SNE embedding of strawberry samples. The pink, red and dark red points represent under-ripe, ripe and overripe strawberries. (B) Invariant transformation of points in (A) by reflection and rotation, such that the points can best be fitted by an Archimedean spiral (blue line): r = αϕ. (C) Plot of radial coordinate of points in (B) as the function of the angular coordinate; the blue line is the linear fitting. (D) 3D h-SNE embedding of strawberry with a 2D intersection plane shown in gray, where the projected points have optimal spiral fitting after reflection and rotation. (E) Spiral fitting of projected points in the 2D disk shown in (D). (F) Plot of radial coordinate of points in (E) as a function of the angular coordinate. Sizes of points represent the point-to-plane distances of points in (D), and the inset shows the point-to-plane distances versus corresponding spiral lengths of points in (E).

h-SNE embedding and spiral fitting of fermentation data.

(A) h-SNE embedding of integrated fruit fermentation datasets from three runs (left) and batch correction by Procrustes for the three runs (right). (B) The batch-corrected samples are color-coded according to fermentation day and symbol-coded according to fruit type (left). The optimal Archimedean spiral fitting after performing distance-invariant reflection, rotation, and translation operations (right). (C) Plot of radial coordinate of points in (B) as a function of the angular coordinate. The blue line shows the linear fitting, and the color scales are the same as in (B). (D) Plot of the mean spiral length of samples as a function of fermentation day. Samples at fermentation day 10 and 17 days were removed as outliers (see text). The points are fit by a logistic function with a manually assigned amplitude (Eq. 4).

Comparison of yeast and uncontrolled fermentation odor mixtures.

(A-B) Headspace profiles of uncontrolled (A) and yeast (B) fermentation were separately fit by Eq. 4. The 95% confidence interval of b is shown in the square bracket after the fitting equation. (C) Overlay of the two fittings in (A-B). The red arrow indicates the direction that yeast fermentation deviates from uncontrolled fermentation. (D-E) The same logistic fitting as in Fig. 4D for uncontrolled (D) and yeast (E) fermentation. Points are colored by fermentation day. (F) Overlay of the two fittings in (D-E).

Identifying odors that drive spiral progression.

(A) Visualizing odors together with natural mixtures from samples in the strawberry dataset in the same space (see Methods). The gray dots represent strawberry samples, and the colored dots represent six odors which are shared in strawberry and fermentation data. The color scale is determined by the Pearson correlation coefficient between odor concentration and spiral length in the strawberry data. (B) Visualizing the same six odors as in (A), together with natural mixtures from samples in the fermentation data. Points representing odors are color-coded the same as the corresponding odor in (A). (C) Pairwise distances of plots of odors in (A-B). (D) Abundance profiles of two odors styrene (left) and 1-hexanol (right) in strawberry (top) and fermentation (bottom) datasets. The color scales are determined by the normalized odor concentrations in each dataset.

Estimation of odor source phenotypes for odor profiles in strawberry and fermentation datasets.

(A) Hierarchical clustering of strawberry phenotypes based on concentration measurements of strawberry headspace volatiles from Schwieterman et al28. The rows and columns represent samples and phenotypes, respectively. The red boxes highlight three distinct phenotypes: sourness, malic acid, and sweetness. (B) Estimated sweetness, malic acidity, and sourness values of strawberry samples (top) and fermentation fruit samples (bottom) represented in h-SNE map. (C) Plots of estimated phenotype values versus the spiral length of points from strawberry samples (top) and fermentation fruits samples (bottom). The points are colored according to the ripeness stage for strawberry data and fermentation day for fermentation data. The blue shaded areas in the top panels correspond to the transition from ripe to rotting/fermenting in the strawberry dataset, which may best correspond to the fermentation process in the fermentation dataset (bottom). Sweetness decreases, whereas sourness and malic acidity both increase during fermentation that starts from ripe fruit.

Compounds identified by dynamic headspace sampling from individual strawberry fruit in white (underripe), ripe, or brown/fermented states. Rows correspond to chemicals listed in the rows of Supplemental Table 1. Columns correspond to different individual samples. Color scale corresponds to relative abundance inferred from quantification of GC peak area.

Compounds identified by static sampling (SPME/GC-MS) of the headspace of banana, kiwi, mango, peach and strawberry fruit mash undergoing uncontrolled or yeast-dominated fermentation. Fruits were sampled at 0, 1, 2, 3, 5, 7, 11, and 14 days of fermentation; some fermentation runs were additionally sampled at 21 and 28 days. Each row block corresponds to a different compound, specified in the rows of Supplemental Table 2. Each row block comprises 1-3 rows, corresponding to measurements from independent instances of the fermentation runs. Three independent banana, kiwi, and mango fermentation runs were measured, whereas only two independent runs of strawberry and one run of peach were measured. Columns correspond to different time points (in days) at which the samples were measured; the ordering of columns is the same for all five fruits. Color scale indicates the area under the curve for the peak corresponding to each compound in the gas chromatograph and indicates relative abundance.

Embedding of strawberry samples by 3D PCA and 3D h-SNE. (A-B) 3D PCA(A) and h-SNE (B) embedding of strawberry samples with 13 common odors. (C-D) Distances preservation of PCA(C) and h-SNE (D) embedding.

h-SNE embedding of strawberry samples by taking the union of all available odors.

The missing values are padded by 100000. (A) 3D h-SNE embedding shown in a 2D stereographic plane. (B) Plot of embedding distances versus data distances. (C) Radii distribution of points in the three stages.

Effect of yeast fermentation in the radius vs. angle plane.

Uncontrolled fermentation results in faster degradation represented by a larger radius of the fermentation spiral for the uncontrolled condition.

Analogous to Figure 6, but with styrene removed from both datasets.

The results are similar to the case where styrene is included. In particular, the samples still follow a spiral trajectory (A,B) and correlation between fermentation date and spiral position is significant (B). (D) distribution of two example odorants in fermentation segments. Instead of styrene, we now highlight ethyl butyrate.