Alignment of motion capture data to environmental coordinates. Motion capture coordinate system (A) is aligned with Meshroom coordinate system (B) via a single rotation and translation that minimizes error between the mocap’s camera axes and Meshroom’s camera axes (C). The motion capture skeleton is then scaled such in order to minimize the distance from the closest point on the mesh to the locations of footfall frames, evaluated at each footfall frame. This scale factor is then applied to the motion capture data at every frame.

Rendered image of textured mesh from Meshroom (right) along side original RGB video frame (left) . Meshroom provides as output estimated camera positions and orientations for each video frame, relative to an estimated environmental structure represented as a textured 3D triangle mesh

Examples of path convergence and divergence. The colors indicate different subjects. In (A), subjects diverge by choosing two different routes around a root, but then converge again. in (B) subjects paths converge to avoid the large outcrop. In (C) subject paths converge around a mossy section of a large rock.

Overhead view of Austin data. Subjects walking from left to right (A) or right to left (B). Different colors correspond to different subjects, each traversing in each direction 3 times.

Gaze is used to select paths. Here we show a representative excerpt of data where gaze is directed further along the path, in this case at locations that are not travelled to. Gaze is apparently used to determine the viability of paths ahead of time, since fixations further ahead in straight directions often precede turns that deviate from the fixated locations. Other gaze locations, illustrated in green, fall close to the foothold locations shown in pink.

Step parameter distributions. The histograms show the distributions of (A) step slopes, defined as height change divided by the length of the step along the ground plane, (B) step lengths, and (C) direction changes. These deviations define set of feasible next steps for a given foothold, allowing the calculation of feasible alternative paths to the one actually chosen by the subject. The Figure shows histograms of these quantities pooled over subjects, although calculations of viable paths were done separately for individual subjects.

Chosen vs random path mean slope. Using the previously described method we can randomly sample available paths in order to compare them to the chosen path. (A) shows a subject’s chosen path (magenta) along with a subset of randomly sampled paths (B) Shows histograms of the mean step slope, for paths that were chosen and for randomly sampled paths. The chosen path distribution is shifted to the left with far less rightwards skew.

Turn probability vs straight path slope for 5 step sequences. For each sequence the distance of the straight line connecting the first and last footplant is computed, as well as the distance of the actual path. These are used to compute tortuosity of the chosen path. In addition, 10,000 paths are simulated that include locations that are reachable from the start location and end location. The straightest paths (paths with tortuosity less than the median tortuosity of chosen paths) are used to compute an average straight path step slope, ΔH. This average straight path step slope is then compared to the tortuosity of chosen paths. Correlations are indicated at the top of each panel.

Relationship between leg length and correlation value between straight path step slope and path tortuosity. Subject length length (in millimeters) is plotted on the horizontal axis, against the correlation coefficients for each of the plots in Figure 8 plotted on the vertical.

CNN based foothold location prediction. A CNN was trained to predict foothold locations in depth images from the subject’s viewpoint. Depth images are acquired using Blender, where a virtual camera follows the same trajectory and orientation as the subject’s eye. Foothold locations on the mesh are then projected back onto the retinal image plane. The CNN is a convolution deconvolutional architecture where the output is a probability map of foothold locations. The CNN is trained with outputs generated by placing gaussians with standard deviation [sigma] at the calculated foothold locations, and the corresponding depth image is used as in input. Performance is evaluated by computing the mean and median percentiles of the foothold locations in the output probability map.

Foothold localization error. A. Distribution of between mesh errors of foothold location estimates (for the same subject traversal data. Foothold locations are estimated according to the same process described, but the terrain data used is interchanged, and the resulting different corresponding foothold locations are compared. B. Distribution of foothold estimate errors when compared to ’ground truth’ foothold locations, obtained by manual annotation in the image frame, followed by projection of manually marked locations out onto the mesh depending on Meshroom’s estimated camera pose