Visual perception

Visual perception is the ability to interpret the surrounding environment through photopic vision (daytime vision), color vision, scotopic vision (night vision), and mesopic vision (twilight vision), using light in the visible spectrum reflected by objects in the environment. This is different from visual acuity, which refers to how clearly a person sees (for example "20/20 vision"). A person can have problems with visual perceptual processing even if they have 20/20 vision.

"Sight" and "Eyesight" redirect here. For other uses, see Sight (disambiguation) and Eyesight (song).

The resulting perception is also known as vision, sight, or eyesight (adjectives visual, optical, and ocular, respectively). The various physiological components involved in vision are referred to collectively as the visual system, and are the focus of much research in linguistics, psychology, cognitive science, neuroscience, and molecular biology, collectively referred to as vision science.

light comes from above;

objects are normally not viewed from below;

faces are seen (and recognized) upright;

[14]

closer objects can block the view of more distant objects, but not vice versa; and

figures (i.e., foreground objects) tend to have convex borders.

A 2D or primal sketch of the scene, based on feature extraction of fundamental components of the scene, including edges, regions, etc. Note the similarity in concept to a pencil sketch drawn quickly by an artist as an impression.

A 21⁄2 D sketch of the scene, where textures are acknowledged, etc. Note the similarity in concept to the stage in drawing where an artist highlights or shades areas of a scene, to provide depth.

A 3 D model, where the scene is visualized in a continuous, 3-dimensional map.

[36]

In the 1970s, David Marr developed a multi-level theory of vision, which analyzed the process of vision at different levels of abstraction. In order to focus on the understanding of specific problems in vision, he identified three levels of analysis: the computational, algorithmic and implementational levels. Many vision scientists, including Tomaso Poggio, have embraced these levels of analysis and employed them to further characterize vision from a computational perspective.^[35]

The computational level addresses, at a high level of abstraction, the problems that the visual system must overcome. The algorithmic level attempts to identify the strategy that may be used to solve these problems. Finally, the implementational level attempts to explain how solutions to these problems are realized in neural circuitry.

Marr suggested that it is possible to investigate vision at any of these levels independently. Marr described vision as proceeding from a two-dimensional visual array (on the retina) to a three-dimensional description of the world as output. His stages of vision include:

Marr's 21⁄2D sketch assumes that a depth map is constructed, and that this map is the basis of 3D shape perception. However, both stereoscopic and pictorial perception, as well as monocular viewing, make clear that the perception of 3D shape precedes, and does not rely on, the perception of the depth of points. It is not clear how a preliminary depth map could, in principle, be constructed, nor how this would address the question of figure-ground organization, or grouping. The role of perceptual organizing constraints, overlooked by Marr, in the production of 3D shape percepts from binocularly-viewed 3D objects has been demonstrated empirically for the case of 3D wire objects, e.g.^[37]^[38] For a more detailed discussion, see Pizlo (2008).^[39]

A more recent, alternative framework proposes that vision is composed instead of the following three stages: encoding, selection, and decoding.^[40] Encoding is to sample and represent visual inputs (e.g., to represent visual inputs as neural activities in the retina). Selection, or attentional selection, is to select a tiny fraction of input information for further processing, e.g., by shifting gaze to an object or visual location to better process the visual signals at that location. Decoding is to infer or recognize the selected input signals, e.g., to recognize the object at the center of gaze as somebody's face. In this framework,^[41] attentional selection starts at the primary visual cortex along the visual pathway, and the attentional constraints impose a dichotomy between the central and peripheral visual fields for visual recognition or decoding.

Opponent process[edit]

Transduction involves chemical messages sent from the photoreceptors to the bipolar cells to the ganglion cells. Several photoreceptors may send their information to one ganglion cell. There are two types of ganglion cells: red/green and yellow/blue. These neurons constantly fire—even when not stimulated. The brain interprets different colors (and with a lot of information, an image) when the rate of firing of these neurons alters. Red light stimulates the red cone, which in turn stimulates the red/green ganglion cell. Likewise, green light stimulates the green cone, which stimulates the green/red ganglion cell and blue light stimulates the blue cone which stimulates the blue/yellow ganglion cell. The rate of firing of the ganglion cells is increased when it is signaled by one cone and decreased (inhibited) when it is signaled by the other cone. The first color in the name of the ganglion cell is the color that excites it and the second is the color that inhibits it. i.e.: A red cone would excite the red/green ganglion cell and the green cone would inhibit the red/green ganglion cell. This is an opponent process. If the rate of firing of a red/green ganglion cell is increased, the brain would know that the light was red, if the rate was decreased, the brain would know that the color of the light was green.^[44]

Artificial visual perception[edit]

Theories and observations of visual perception have been the main source of inspiration for computer vision (also called machine vision, or computational vision). Special hardware structures and software algorithms provide machines with the capability to interpret the images coming from a camera or a sensor.

For instance, the 2022 Toyota 86 uses the Subaru EyeSight system for driver-assist technology.^[45]

(1867). Handbuch der physiologischen Optik. Vol. 3. Leipzig: Voss. Quotations are from the English translation produced by the Optical Society of America (1924–25): Treatise on Physiological Optics Archived September 27, 2018, at the Wayback Machine.

Visual perception

[14]

[36]

Opponent process[edit]

Artificial visual perception[edit]

Von Helmholtz, Hermann

The Organization of the Retina and Visual System

Effect of Detail on Visual Perception

The Joy of Visual Perception

VisionScience. Resource for Research in Human and Animal Vision

Vision and Psychophysics

Vision

What are the limits of human vision?