Efficient coding of natural scene statistics predicts discrimination thresholds for grayscale textures
Abstract
Previously, in (Hermundstad et al., 2014), we showed that when sampling is limiting, the efficient coding principle leads to a 'variance is salience' hypothesis, and that this hypothesis accounts for visual sensitivity to binary image statistics. Here, using extensive new psychophysical data and image analysis, we show that this hypothesis accounts for visual sensitivity to a large set of grayscale image statistics at a striking level of detail, and also identify the limits of the prediction. We define a 66-dimensional space of local grayscale light-intensity correlations, and measure the relevance of each direction to natural scenes. The 'variance is salience' hypothesis predicts that two-point correlations are most salient, and predicts their relative salience. We tested these predictions in a texture-segregation task using un-natural, synthetic textures. As predicted, correlations beyond second order are not salient, and predicted thresholds for over 300 second-order correlations match psychophysical thresholds closely (median fractional error < 0:13).
Data availability
All the code and data necessary to reproduce the results from the manuscript are available at https://github.com/ttesileanu/TextureAnalysis.
-
Natural Images from the Birthplace of the Human Eyehttps://doi.org/10.1371/journal.pone.0020409.
-
Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortexhttps://doi.org/10.1098/rspb.1998.0303.
Article and author information
Author details
Funding
US-Israel Binational Science Foundation (2011058)
- Vijay Balasubramanian
National Eye Institute (EY07977)
- Mary M Conte
- Jonathan D Victor
- Vijay Balasubramanian
Swartz Foundation
- Tiberiu Tesileanu
Howard Hughes Medical Institute
- John J Briguglio
- Ann M Hermundstad
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Human subjects: This work was carried out with the subjects' informed consent, and in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) and the approval of the Institutional Review Board of Weill Cornell. The IRB protocol number is 0904010359.
Copyright
© 2020, Tesileanu et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,506
- views
-
- 177
- downloads
-
- 20
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Biochemistry and Chemical Biology
- Neuroscience
The buildup of knot-like RNA structures in brain cells may be the key to understanding how uncontrolled protein aggregation drives Alzheimer’s disease.
-
- Neuroscience
Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.