Noise-driven Concurrent Stereo Matching

HOME

Location: NCSM >Conclusions

Subsections

Theoretical and experimental analysis of some best preforming current computer stereo matching techniques led to the development of a new alternative approach to 3D stereo reconstruction, called Noise-driven Concurrent Stereo Matching (NCSM). This framework reduces drawbacks of more conventional approaches due to more general image noise models and less restrictive matching goals and also the framework gives promising results because it separates the initial ill-posed problem into two well-posed problems to be solved sequentially. The NCSM separates 3D reconstruction into two independent stages:

the image noise estimation in order to outline spatial candidate volumes being equivalent from the standpoint of image matching under the noise and
the selection of one or more surfaces which closely fit these volumes with due account of partial occlusions of background objects with foreground ones. This framework circumvents the ``best match" or ``closest similarity" criteria exploited in almost all existing matching strategies in favour of a likely match criterion based on a local model of signal noise.

A family of the NCSM based algorithms developed presented here demonstrates high quality 3D reconstruction from various stereo pairs. Detailed analyses and comparisons show that the NCSM framework yields results competitive with those from the best-performing conventional algorithms on test stereo pairs with no contrast deviations but notably outperforms these algorithms in the presence of large contrast deviations. At the same time, the linear computational complexity ( for NCSM-SDPS and for NCSM-ITER)^5.1 of the NCSM based techniques is notably lower than of the best-performing conventional algorithms (for example, the minimum-cut algorithms are of complexity $O(n^{2}\log n)$ where is the image size and thus are very slow in practice even on a moderate-size stereo pair).

Contributions

The NCSM framework rules out two unrealistic assumptions which appear explicitly or implicitly in most of the known approaches to the computational binocular stereo problem, namely, (i) an assumed single continuous opaque surface to be reconstructed and (ii) an assumed solution framework based on a the ``best match" or "closest similarity" between corresponding areas in images that represent the same binocularly visible parts of the surface.

Multiple surfaces in a 3D scene:

Almost all conventional binocular stereo algorithms search for a single optical surface that yields the best correspondence between the images of a stereo pair under the constrained surface continuity, smoothness and visibility. However, as underscored in Chapter

, almost all the real 3D scenes contain multiple disjoint optical surfaces. Thus the assumption of a single surface is too restrictive for stereo matching.

Under this assumption, conventional algorithms cannot account for violations of the ordering constraint and must involve heuristic penalties to handle large jumps of disparity resulting in surface discontinuities and partial occlusions. Empirically chosen weights of each penalising term strongly influence the reconstruction accuracy [11]. The NCSM framework is based on a more realistic multilayered model of an observed 3D scene and need not penalise discontinuities due to transitions from one candidate volume to the next volumes. The surface fitting process is restricted to each continuous volume and proceeds from foreground to background with due account of possible occlusions. After each foreground surface is found, the corresponding background volumes are enlarged at the expense of their occluded portions, so that mutually consistent optical surfaces yielding high point-wise signal similarity are selected.

Noise-driven vs. ``best" matching:

Experiments with real stereo pairs presented in Chapter

have shown that ``best" matching does not always correspond to the ground truth. Conventional stereo algorithms rely on the ``best" matching due to too simplistic noise models. Optimal statistical decision rules based on these models lead to various energy minimisation schemes with different energy functions that quantitatively specify signal dissimilarity and surface imperfection. In particular, dynamic programming algorithms produce the ``best matching" epipolar profile (1D surface cross-section) modelled as a Markov chain of successive heights along the profile. The graph minimum-cut and belief propagation algorithms provide a close approximation of a ``best matching" 2D surface under shape constraints, the surface being modelled as a 2D Markov random field of heights. Although all of them assume that the ``best" matching of stereo images is the ultimate goal of computational binocular stereo, each real stereo pair contains a big many equally admissible matches. Thus the selection of only the ``best" matches may lead to many incorrect decisions.

Thus the computationally ``best" match is not always the best selection, especially for occluded regions where no ``best" matching exists at all. The NCSM framework circumvents the ``best matching" criteria in favour of the more realistic selection of all the likely matches that follow from a detailed image noise model.

Noise in stereo images:

The umbrella term ``noise" relates to all deviations between the corresponding signals in stereo images. The noise arises from multiple sources including random variations of sensitivity of optical sensors, non-Lambertian surface reflection, specific impacts of geometry of stereo observation (e.g. occlusions), etc. Although stereo matching criteria and strategies obviously depend on all the noise components, most of the conventional stereo algorithms account only for a very simple and thus unrealistic models of random pixel noise like statistically independent normal or uniform deviations. This is why these algorithms totally fail under more realistic noise models including spatially constant or variant contrast and offset deviations. The NCSM algorithms use two schemes for noise estimation at the first stage. The NCSM-SDPS algorithm takes account of possible contrast and offset distortions combined with independent intensity random deviations and occlusions along each epipolar 2D profile represented by the conjugate epipolar lines. However, it does not exploit the inter-dependence of these distortions across the set of profiles forming a 3D surface, i.e. the inter-dependence across the scan-lines in the images. The second algorithm, NCSM-ITER, uses a more realistic spatial noise model with uniform contrast and offset distortions for all the scene points at the same depth level, the distortions being independent on the different levels. Experiments in Chapter

confirm that the latter algorithm outperforms more conventional ones if stereo pairs have contrast and offset distortions.

Directions for Future Work

The proposed NCSM framework could be refined by developing more versatile models of image noise (e.g., spatially variant Markov random field models of corresponding image signals to account for spatial interdependence of the noise components), better detection of the likely occluded areas in stereo images, and more powerful surface fitting techniques suitable for slanted and curvilinear surfaces.

The current NCSM algorithms either build empirical probability models of signals in occluded areas using the likely occlusions detected by the symmetric DP stereo (NCSM-SDPS), or assume such signals are uniformly distributed at each disparity level (NCSM-ITER). More accurate modelling of geometric noise caused by occlusions and more theoretically justified comparisons of these models to other noise components will reduce errors in selection of the candidate volumes.

The current surface fitting selects only horizontal surfaces at fixed depth levels by comparing planar cross-sections of each volume in a level-by-level mode. This oversimplified process works well for a number of stereo pairs with mainly fronto-planar surfaces (e.g. the `Tsukuba' pair) but fails when inter-relations between the disparity levels in the candidate volumes cannot be ignored. Obviously, the surfaces to be reconstructed are not always parallel to the image plane and also are not always planar. Increasing a disparity resolution may lead to a better approximation of a curvilinear surface by horizontal planes, but a problem is how to extend this range for a given stereo pair. A more general surface fitting procedure has to be developed in order to properly handle slanted planar or curvilinear surfaces.

The NCSM framework has a high degree of inherent parallelism with the potential for high resolution, accurate and real-time 3D scene reconstruction. Indeed, this framework meets requirements for efficient hardware implementations because no complex optimisation is involved. This would allow the NCSM framework to be extended to stereo videos of moving 3D scenes both for 3D scene reconstruction and motion tracking.