Katana VentraIP

2D to 3D conversion

2D to 3D video conversion (also called 2D to stereo 3D conversion and stereo conversion) is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

Process type

digital and print

Film and television, print production

Computer software

Movies, television shows, social media, printed images

Stereo post-production workflow is much more complex and not as well-established as 2D workflow, requiring more work and rendering.

Professional stereoscopic rigs are much more expensive and bulky than customary monocular cameras. Some shots, particularly action scenes, can be only shot with relatively small 2D cameras.

Stereo cameras can introduce various mismatches in stereo image (such as vertical , tilt, color shift, reflections and glares in different positions) that should be fixed in post-production anyway because they ruin the 3D effect. This correction sometimes may have complexity comparable to stereo conversion.

parallax

Stereo cameras can betray used during filming. For example, some scenes in the Lord of the Rings film trilogy were filmed using forced perspective to allow two actors to appear to be different physical sizes. The same scene filmed in stereo would reveal that the actors were not the same distance from the camera.

practical effects

By their very nature, stereo cameras have restrictions on how far the camera can be from the filmed subject and still provide acceptable stereo separation. For example, the simplest way to film a scene set on the side of a building might be to use a camera rig from across the street on a neighboring building, using a zoom lens. However, while the zoom lens would provide acceptable image quality, the stereo separation would be virtually nil over such a distance.

With the increase of films released in 3D, 2D to 3D conversion has become more common. The majority of non-CGI stereo 3D blockbusters are converted fully or at least partially from 2D footage. Even Avatar contains several scenes shot in 2D and converted to stereo in post-production.[3] The reasons for shooting in 2D instead of stereo are financial, technical and sometimes artistic:[1][4]


Even in the case of stereo shooting, conversion can frequently be necessary. Besides the mentioned hard-to-shoot scenes, there are situations when mismatches in stereo views are too big to adjust, and it is simpler to perform 2D to stereo conversion, treating one of the views as the original 2D source.

Translucent objects

Reflections

Fuzzy semitransparent object borders – such as hair, fur, foreground out-of-focus objects, thin objects

Film grain (real or artificial) and similar noise effects

Scenes with fast erratic motion

Small particles – rain, snow, explosions and so on.

Without respect to particular algorithms, all conversion workflows should solve the following tasks:[4][5]


High quality conversion methods should also deal with many typical problems including:

Quality semiautomatic conversion[edit]

Depth-based conversion[edit]

Most semiautomatic methods of stereo conversion use depth maps and depth-image-based rendering.[4][5]


The idea is that a separate auxiliary picture known as the "depth map" is created for each frame or for a series of homogenous frames to indicate depths of objects present in the scene. The depth map is a separate grayscale image having the same dimensions as the original 2D image, with various shades of gray to indicate the depth of every part of the frame. While depth mapping can produce a fairly potent illusion of 3D objects in the video, it inherently does not support semi-transparent objects or areas, nor does it represent occluded surfaces; to emphasize this limitation, depth-based 3D representations are often explicitly referred to as 2.5D.[6][7] These and other similar issues should be dealt with via a separate method. [6][8][9]

Automatic conversion[edit]

Depth from motion[edit]

It is possible to automatically estimate depth using different types of motion. In case of camera motion, a depth map of the entire scene can be calculated. Also, object motion can be detected and moving areas can be assigned with smaller depth values than the background. Occlusions provide information on relative position of moving surfaces.[15][16]

Depth from focus[edit]

Approaches of this type are also called "depth from defocus" and "depth from blur".[15][17] On "depth from defocus" (DFD) approaches, the depth information is estimated based on the amount of blur of the considered object, whereas "depth from focus" (DFF) approaches tend to compare the sharpness of an object over a range of images taken with different focus distances in order to find out its distance to the camera. DFD only needs two or three at different focus to properly work, whereas DFF needs 10 to 15 images at least but is more accurate than the previous method.


If the sky is detected in the processed image, it can also be taken into account that more distant objects, besides being hazy, should be more desaturated and more bluish because of a thick air layer.[17]

Depth from perspective[edit]

The idea of the method is based on the fact that parallel lines, such as railroad tracks and roadsides, appear to converge with distance, eventually reaching a vanishing point at the horizon. Finding this vanishing point gives the farthest point of the whole image.[15][17]


The more the lines converge, the farther away they appear to be. So, for depth map, the area between two neighboring vanishing lines can be approximated with a gradient plane.

Cardboard effect is a phenomenon in which 3D objects located at different depths appear flat to the audience, as if they were made of cardboard, while the relative depth between the objects is preserved

Edge sharpness mismatch

3D quality metrics[edit]

PQM[edit]

PQM[18] mimic the HVS as the results obtained aligns very closely to the Mean Opinion Score (MOS) obtained from subjective tests. The PQM quantifies the distortion in the luminance, and contrast distortion using an approximation (variances) weighted by the mean of each pixel block to obtain the distortion in an image. This distortion is subtracted from 1 to obtain the objective quality score.

HV3D[edit]

HV3D[19] quality metric has been designed having the human visual 3D perception in mind. It takes into account the quality of the individual right and left views, the quality of the cyclopean view (the fusion of the right and left view, what the viewer perceives), as well as the quality of the depth information.

VQMT3D[edit]

The VQMT3D project [20] includes several developed metrics for evaluating the quality of 2D to 3D conversion

Autostereoscopy

Crosstalk (electronics)

Digital 3D

– many of the issues involved in 3D conversion, such as object edge identification/recognition, are also encountered in colorization

Film colorization

Legend3D

List of 3D films

– many S-3D video games do not actually render two images but employ 2D + depth rendering conversion techniques too

Stereoscopic video game

Structure from motion

2D-plus-depth

3D display

3D film

3D reconstruction from multiple images

Mansi Sharma; Santanu Chaudhury; Brejesh Lall (2014). Kinect-Variety Fusion: A Novel Hybrid Approach for Artifacts-Free 3DTV Content Generation. In 22nd International Conference on Pattern Recognition (ICPR), Stockholm, 2014. :10.1109/ICPR.2014.395.

doi