Learning a stimulus-response strategy for turbulent navigation. (A) Representation of the search problem with turbulent odor cues obtained from Direct Numerical Simulations of fluid turbulence (grey scale, odor snapshot from the simulations). The discrete position s is hidden; the odor concentration zT = z(s(t’), t’)|tTt’ ≤ t is observed along the trajectory s(t’), where T is the sensing memory. (B) Odor traces from direct numerical simulations at different (fixed) points within the plume. Odor is noisy and sparse, information about the source is hidden in the temporal dynamics. (C) Contour maps of olfactory states with nearly infinite memory (T = 2598): on average olfactory states map to different locations within the plume and the void state is outside the plume. (D) Performance of stimulus-response strategies obtained during training, averaged over 500 episodes. We train using realistic turbulent data with memory T = 20 and backtracking recovery.

The optimal memory T*. (A) Four measures of performance as a function of memory with backtracking recovery (solid line) show that the optimal memory T* = 20 maximizes average performance and minimizes standard deviation, except for the normalized time. Top: Averages computed over 10 realizations of test trajectories starting from 43000 initial positions (dash: results with adaptive memory). Bottom: standard deviation of the mean performance metrics for each initial condition (see Materials and Methods). (B) Average number of times agents encounter the void state along their path, , as a function of memory (top); cumulative average reward is inversely correlated to (bottom), hence the optimal memory minimizes encounters with the void. (C) Colormaps: Probability that agents at different spatial locations are in the void state at any point in time, starting the search form anywhere in the plume and representative trajectory of a successful searcher (green solid line) with memory T = 1, T = 20, T = 50 (left to right). At the optimal memory agents in the void state are concentrated near the edge of the plume. Agents with shorter memories encounter voids throughout the plume; agents with longer memories encounter more voids outside of the plume as they delay recovery. In all panels, shades are ± standard deviation.

The adaptive memory approximates the duration of the blank dictated by physics and it is an efficient heuristics, especially when coupled with a learned recovery strategy. (A) Colormaps of the Eulerian blank times τb (top) and the sensing memory T (bottom): Left: averages; Right: standard deviations. The sensing memory statistics is computed over all agents that are located at each discrete cell, at any point in time. (B) Probability distribution of τb across all spatial locations and times (black) and of T across all agents at all times (gray). (C) Performance with the adaptive memory nears performance of the optimal fixed memory, here shown for backtracking; similar results apply to the Brownian recovery (Supplementary Figure 2). (D) Comparison of three recovery strategies with adaptive memory: The learned recovery with adaptive memory outperforms all fixed and adaptive memory agents. In (C) and (D) dark squares mark the mean, and light rectangles mark ± standard deviation. No standard deviation is shown for the f+ measure in the learned case as this strategy is deterministic (see Materials and Methods).

Optimal policies with adaptive memory for different recovery strategies: backtracking (green), Brownian (red) and learned (blue). For each recovery, we show the spatial distribution of the olfactory states (top); the policy (center) and the state occupancy (bottom) for non-void states (left) vs the void state π*(a|∅)(right). Spatial distribution: probability that an agent at a given position is in any non-void olfactory state (left) or in the void state (right), color-coded from yellow to blue. Policy: actions learned in the non-void states , weighted on their occupancy no (left, arrows proportional to the frequency of the corresponding action) and schematic view of recovery policy in the void state (right). State occupancy: fraction of agents that is in any of the 15 non-void states (left) or in the void state (right) at any point in space and time. Occupancy is proportional to the radius of the corresponding circle. The position of the circle identifies the olfactory state (rows and columns indicate the discrete intensity and intermittency respectively). All statistics is computed over 43000 trajectories, starting from any location within the plume.

Generalization to statistically different environments. (A) Snapshots of odor concentration normalizes with concentration at the source, colorcoded from blue (0) to yellow (1) for environment 1 to 6 from top to bottom. Environment 1* is the native environment where all agents are trained. (B) Performance for the three recovery strategies backtracking (green) Brownian (red) and learned (blue), with adaptive memory, trained on the native environment and tested across all environments. Four measures of performance defined in the main text are shown. Dark squares mark the mean, and empty rectangles ± standard deviation. No standard deviation is shown for the f+ measure in the learned case as this strategy is deterministic (see Materials and Methods).

Data information

The role of temporal memory with Brownian recovery strategy (same as main Figure 2A). Total cumulative reward (top left) and standard deviation (top right) as a function of memory showing an optimal memory T* = 3 for the Brownian agent. Other measures of performance with their standard deviations show the same optimal memory (bottom). The tradeoff between long and short memories discussed in the main text holds, but here exiting the plume is much more detrimental because regaining position within the plume by Brownian motion is much lenghtier.

All four measures of performance across all agents with fixed and adaptive memory and with adaptive memory for the three recovery strategies.

Optimal policies for different recovery strategies and adaptive memory. From left to right: results for backtracking (green), Brownian (red) and learnd (blue) recovery strategies. Top: probability that an agent in a given olfactory state is at a specific spatial location color-coded from yellow to blue. Rows and columns indicate the olfactory state; the void state is in the lower right corner. Arrows indicate the optimal action from that state. Bottom: Circles represent occupancy of each state, olfactory states are arranged as in the top panel. All statistics is computed over 43000 trajectories, starting from any location within the plume.