Katana VentraIP

Computer vision

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions.[1][2][3][4] Understanding in this context means the transformation of visual images (the input to the retina in the human analog) into descriptions of world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

The scientific discipline of computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, multi-dimensional data from a 3D scanner, 3D point clouds from LiDaR sensors, or medical scanning devices. The technological discipline of computer vision seeks to apply its theories and models to the construction of computer vision systems.


Sub-domains of computer vision include scene reconstruction, object detection, event detection, activity recognition, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, visual servoing, 3D scene modeling, and image restoration.


Adopting computer vision technology might be painstaking for organizations as there is no single-point solution for it. Very few companies provide a unified and distributed platform or Operating System where computer vision applications can be easily deployed and managed.

Definition[edit]

Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.[5][6][7] "Computer vision is concerned with the automatic extraction, analysis, and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding."[8] As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner.[9] As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems. Machine vision refers to a systems engineering discipline, especially in the context of factory automation. In more recent times, the terms computer vision and machine vision have converged to a greater degree.[10]: 13 

History[edit]

In the late 1960s, computer vision began at universities that were pioneering artificial intelligence. It was meant to mimic the human visual system as a stepping stone to endowing robots with intelligent behavior.[11] In 1966, it was believed that this could be achieved through an undergraduate summer project,[12] by attaching a camera to a computer and having it "describe what it saw".[13][14]


What distinguished computer vision from the prevalent field of digital image processing at that time was a desire to extract three-dimensional structure from images with the goal of achieving full scene understanding. Studies in the 1970s formed the early foundations for many of the computer vision algorithms that exist today, including extraction of edges from images, labeling of lines, non-polyhedral and polyhedral modeling, representation of objects as interconnections of smaller structures, optical flow, and motion estimation.[11]


The next decade saw studies based on more rigorous mathematical analysis and quantitative aspects of computer vision. These include the concept of scale-space, the inference of shape from various cues such as shading, texture and focus, and contour models known as snakes. Researchers also realized that many of these mathematical concepts could be treated within the same optimization framework as regularization and Markov random fields.[15] By the 1990s, some of the previous research topics became more active than others. Research in projective 3-D reconstructions led to better understanding of camera calibration. With the advent of optimization methods for camera calibration, it was realized that a lot of the ideas were already explored in bundle adjustment theory from the field of photogrammetry. This led to methods for sparse 3-D reconstructions of scenes from multiple images. Progress was made on the dense stereo correspondence problem and further multi-view stereo techniques. At the same time, variations of graph cut were used to solve image segmentation. This decade also marked the first time statistical learning techniques were used in practice to recognize faces in images (see Eigenface). Toward the end of the 1990s, a significant change came about with the increased interaction between the fields of computer graphics and computer vision. This included image-based rendering, image morphing, view interpolation, panoramic image stitching and early light-field rendering.[11]


Recent work has seen the resurgence of feature-based methods used in conjunction with machine learning techniques and complex optimization frameworks.[16][17] The advancement of Deep Learning techniques has brought further life to the field of computer vision. The accuracy of deep learning algorithms on several benchmark computer vision data sets for tasks ranging from classification,[18] segmentation and optical flow has surpassed prior methods. [19]

and image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel-wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither requires assumptions nor produces interpretations about the image content.

Image processing

Computer vision includes 3D analysis from 2D images. This analyzes the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an image.

is the process of applying a range of technologies and methods to provide imaging-based automatic inspection, process control, and robot guidance[25] in industrial applications.[21] Machine vision tends to focus on applications, mainly in manufacturing, e.g., vision-based robots and systems for vision-based inspection, measurement, or picking (such as bin picking[26]). This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that real-time processing is emphasized by means of efficient implementations in hardware and software. It also implies that external conditions such as lighting can be and are often more controlled in machine vision than they are in general computer vision, which can enable the use of different algorithms.

Machine vision

There is also a field called which primarily focuses on the process of producing images, but sometimes also deals with the processing and analysis of images. For example, medical imaging includes substantial work on the analysis of image data in medical applications.

imaging

Finally, is a field that uses various methods to extract information from signals in general, mainly based on statistical approaches and artificial neural networks.[27] A significant part of this field is devoted to applying these methods to image data.

pattern recognition

Automatic inspection, e.g., in manufacturing applications;

Assisting humans in identification tasks, e.g., a system;[28]

species identification

Controlling processes, e.g., an ;

industrial robot

e.g., for visual surveillance or people counting, e.g., in the restaurant industry;

Detecting events

Interaction, e.g., as the input to a device for ;

computer-human interaction

Modeling objects or environments, e.g., medical image analysis or topographical modeling;

Navigation, e.g., by an or mobile robot;

autonomous vehicle

Organizing information, e.g., for databases of images and image sequences.

indexing

Tracking surfaces or planes in 3D coordinates for allowing Augmented Reality experiences.

(also called object classification) – one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Blippar, Google Goggles, and LikeThat provide stand-alone programs that illustrate this functionality.

Object recognition

Identification – an individual instance of an object is recognized. Examples include identification of a specific person's face or fingerprint, , or the identification of a specific vehicle.

identification of handwritten digits

 – the image data are scanned for specific objects along with their locations. Examples include the detection of an obstacle in the car's field of view and possible abnormal cells or tissues in medical images or the detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

Detection

Image acquisition – A digital image is produced by one or several , which, besides various types of light-sensitive cameras, include range sensors, tomography devices, radar, ultra-sonic cameras, etc. Depending on the type of sensor, the resulting image data is an ordinary 2D image, a 3D volume, or an image sequence. The pixel values typically correspond to light intensity in one or several spectral bands (gray images or colour images) but can also be related to various physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or magnetic resonance imaging.[29]

image sensors

Pre-processing

Scale space

Feature extraction

Outline of computer vision

List of emerging technologies

Outline of artificial intelligence

James E. Dobson (2023). . University of Minnesota Press. ISBN 978-1-5179-1421-9.

The Birth of Computer Vision

(1982). Vision. W. H. Freeman and Company. ISBN 978-0-7167-1284-8.

David Marr

Azriel Rosenfeld; Avinash Kak (1982). Digital Picture Processing. Academic Press.  978-0-12-597301-4.

ISBN

Barghout, Lauren; Lawrence W. Lee (2003). . U.S. Patent Application 10/618,543. ISBN 978-0-262-08159-7.

Perceptual information processing system

(1986). Robot Vision. MIT Press. ISBN 978-0-262-08159-7.

Berthold K.P. Horn

Michael C. Fairhurst (1988). Computer Vision for robotic systems. Prentice Hall.  978-0-13-166919-2.

ISBN

(1993). Three-Dimensional Computer Vision, A Geometric Viewpoint. MIT Press. ISBN 978-0-262-06158-2.

Olivier Faugeras

Tony Lindeberg (1994). . Springer. ISBN 978-0-7923-9418-1.

Scale-Space Theory in Computer Vision

James L. Crowley; Henrik I. Christensen, eds. (1995). Vision as Process. Springer-Verlag.  978-3-540-58143-7.

ISBN

Gösta H. Granlund; Hans Knutsson (1995). Signal Processing for Computer Vision. Kluwer Academic Publisher.  978-0-7923-9530-0.

ISBN

Reinhard Klette; Karsten Schluens; Andreas Koschan (1998). . Springer, Singapore. ISBN 978-981-3083-71-4.

Computer Vision – Three-Dimensional Data from Images

Emanuele Trucco; Alessandro Verri (1998). . Prentice Hall. ISBN 978-0-13-261108-4.

Introductory Techniques for 3-D Computer Vision

Bernd Jähne (2002). Digital Image Processing. Springer.  978-3-540-67754-3.

ISBN

Richard Hartley and (2003). Multiple View Geometry in Computer Vision. Cambridge University Press. ISBN 978-0-521-54051-3.

Andrew Zisserman

Gérard Medioni; (2004). Emerging Topics in Computer Vision. Prentice Hall. ISBN 978-0-13-101366-7.

Sing Bing Kang

R. Fisher; K Dawson-Howe; A. Fitzgibbon; C. Robertson; E. Trucco (2005). Dictionary of Computer Vision and Image Processing. John Wiley.  978-0-470-01526-1.

ISBN

and Yunmei Chen and Olivier Faugeras (2005). Handbook of Mathematical Models in Computer Vision. Springer. ISBN 978-0-387-26371-7.

Nikos Paragios

Wilhelm Burger; Mark J. Burge (2007). . Springer. ISBN 978-1-84628-379-6. Archived from the original on 2014-05-17. Retrieved 2007-06-13.

Digital Image Processing: An Algorithmic Approach Using Java

Pedram Azad; Tilo Gockel; Rüdiger Dillmann (2008). . Elektor International Media BV. ISBN 978-0-905705-71-2.

Computer Vision – Principles and Practice

Richard Szeliski (2010). . Springer-Verlag. ISBN 978-1848829343.

Computer Vision: Algorithms and Applications

J. R. Parker (2011). Algorithms for Image Processing and Computer Vision (2nd ed.). Wiley.  978-0470643853.

ISBN

Richard J. Radke (2013). Computer Vision for Visual Effects. Cambridge University Press.  978-0-521-76687-6.

ISBN

Nixon, Mark; Aguado, Alberto (2019). Feature Extraction and Image Processing for Computer Vision (4th ed.). Academic Press.  978-0128149768.

ISBN

USC Iris computer vision conference list

– a complete list of papers of the most relevant computer vision conferences.

Computer vision papers on the web

Archived 2011-11-30 at the Wayback Machine – news, source code, datasets and job offers related to computer vision

Computer Vision Online

– Bob Fisher's Compendium of Computer Vision.

CVonline

– supporting computer vision research within the UK via the BMVC and MIUA conferences, Annals of the BMVA (open-source journal), BMVA Summer School and one-day meetings

British Machine Vision Association

Widely adopted open-source container for GPU accelerated computer vision applications. Used by researchers, universities, private companies, as well as the U.S. Gov't.

Computer Vision Container, Joe Hoeller GitHub: