Basics of Computational Stereo Vision

These notes are under continuous revision. Most of presented materials are taken from textbooks and other sources indicated in references. Lectures will generally follow topics enumerated in CONTENTS, but not necessarily cover all the topics. Additional information and examples will be presented during lectures.

Some useful Internet links:

Visual 3D Modelling from Images: a tutorial by M. Pollefeys (University of South Carolina, Chapel Hill, USA)

Elements of Geometric Computer Vision: a tutorial by A. Fusiello (Creative Commons, Stanford, California, USA)

Middlebury Stereo Vision Page by D. Scharstein and R. Szeliski (Middlebury College + Microsoft Research, USA)

Stereoscopic Vision and Perspective Projection by D. Young (University of Sussex, UK)

Stereo Vision by A. Wallace and S. Price (Inst. for Computer Based Learning, Heriot Watt University, Edinburgh, UK)

Projective Geometry for Image Analysis by R. Mohr and B. Triggs (INRIA Rhone-Alpes, France)

Lecture on Stereo by R. Owens (University of Western Australia, Australia)

Introduction
Epipolar Geometry / Matching Fundamentals
Stereo Correspondence : Correlation, Dynamic Programming, Graph-Cut Based Matching
References
Appendix 1: Matrix Diagonalisation Using Eigenvectors
Appendix 2: Singular Value Decomposition (SVD)

Introduction

Stereo vision infers spatial structure and 3D distances of visible points of a scene from two or more images taken from different viewpoints:

Left and right images of a horizontal stereo pair Left and right images of a horizontal stereo pair

In so doing, a stereo system acquires stereo images, determines which points of the images correspond to, i.e. are projections of the same scene point (the correspondence problem), and reconstructs the 3D location and structure of the observed objects from a number of corresponding points and available information on the geometry of stereo acquisition (the reconstruction problem). To solve the correspondence problem means to determine which parts of the stereo images, i.e. of the left and right images of a stereo pair represent the same scene element. To solve the reconstruction problem means that observed 3D surfaces are determined from the corresponding parts of the images and the known geometry of image acquisition.

Disparity, or difference in image position between corresponding points relates to 3D location of the scene point. The disparities of all the image points form the so-called disparity map that can be displayed as an image using greyscale coding of the disparities. If the geometry of image acquisition is known, the disparity map can be converted into a 3D map of the viewed scene:

Upper image of a vertical stereo pair Bottom image of a vertical stereo pair Disparity map

A simple stereo system below consists of two pinhole cameras with the coplanar image planes and parallel optical axes:

3D reconstruction depends on the solution
of the correspondence problem Depth is evaluated from the disparity
of the corresponding points

The left and right image planes are represented by the segments I_left and I_right respectively, and O_l and O_r are the centres of projection, or optical centres of the cameras. Because the optical axes are parallel, their point of intersection called the fixation point lies intinitely far from the cameras. Generally, stereo systems may have verging optical axes, with the fixation point at a finite distance from the cameras.

The position of an observed spatial point, e.g. P in the above figure, is determined by triangulation, i.e. by interesecting the backtracing rays from the images, p_l and p_r, of P formed by the two cameras through the centres of projection. If (p_l, p_r) is a pair of corresponding points, interesecting the rays p_lO_l and p_rO_r gives the spatial point P having projections p_l and p_r. Triangulation depends crucially on which pairs of image points are cosen as corresponding points. In the above figure, triangulation of the corresponding pairs (p_l, p_r) and (q_l, q_r) leads to interpreting the image points as projections of P and Q, respectively; but triangulation of the corresponding pairs (p_l, q_r) and (q_l, p_r) returns P′ and Q′. Both interpretations are totally different but equally justified once the respective correspondences are accepted.

Triangulation in the above simple stereo system reconstructs the 3D position of each single observed point P from its projections, p_l and p_r, using the known distance, T, between the centres of projection O_l and O_r and the common focal length, f, of the cameras. The distance T is called the baseline of the stereo system. Let (x_l, y) and (x_r, y) be the 2D coordinates of the image points p_l and p_r with respect to the principal points (traces of the optical axes) c_l and c_r as the origins of (x,y)-image coordinates, the x-axis being parallel to the baseline. The 3D (X,Y,Z)-coordinate frame has the distance (depth) Z-axis parallel to the optical axes, the X-axis parallel to the baseline and the image x-axes , and the Y-axis parallel to the image y-axes. Assuming the 3D coordinate frame has the origin in the optical centre of the left camera, the image coordinates of the spatial point P = (X_P,Y_P,Z_P) are as follows:

The disparity d measures the difference in image position between the corresponding points in the two images. The depth is inversely proportional to disparity, the infinitely far points (i.e. Z_P = ∞) having zero disparity in this simple stereo system.

Return to the table of contents

References

These lecture notes follow Chapter 7 "Stereopsis" of the textbook of E. Trucco and A. Verri, "Introductory Techniques for 3-D Computer Vision", Prentice Hall, NJ, 1998, with extra examples and materials taken mostly from the Web (with corresponding references) and from the following publications:

Boykov, Yu., and Kolmogorov, V., An experimental comparison of min-cut / max-flow algorithms for energy minimization in vision, IEEE Trans. Pattern Analysis Machine Intell., Vol.26:9, pp.1124 - 1137, 2004.
Boykov, Yu., Veksler, O., and Zabih, R., Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Analysis Machine Intell., Vol.23:11, pp.1222 - 1239, 2001.
Brown, M. Z. , Burschka, D., and Hager, G. D., Advances in computational stereo, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.25, no.8, pp.993 - 1008, 2003.
Förstner W., Image matching. In: Haralick, R. W., and Shapiro, L. G., Computer and Robot Vision, Vol.II, Chapter 16. Addison-Wesley: Reading, pp.289 - 378, 1993.
Ford, L. R., and Fulkerson, D. R., Flows in Networks. Princeton Univ. Press, 1962. % Friedman, D., and Drineas, P., Energy minimization via graph cuts: settling what is possible, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, June 20--26, 2005. IEEE CS Press: Los Alamitos, Vol.2, pp.939 - 946, 2005.
Geman, S., and Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Analysis Machine Intell., Vol.6, pp.721 - 741, 1984.
Gimel'farb, G., Stereo terrain reconstruction by dynamic programming, in: Handbook of Computer Vision and Applications (Jaehne, B., Haussecker, H., and Geissler P., Eds.), Vol. 2, Chapter 18. Academic Press: San Diego, pp.505 - 530, 1999.
Goldberg, A. V., and Tarjan, R. E., A new approach to the minimum flow problem, J. Assoc. Computing Machinery, Vol.35:4, pp.921 - 940, 1988.
Greig, D. M., Porteous, B. T., and Seheult, A. H., Exact maximum a posteriori estimation for binary images, J. Royal Statistic. Society, Ser.B., Vol.51, pp.271 - 279, 1989.
Ishikawa, H., Exact optimization for Markov random fields with convex priors, IEEE Trans. Pattern Analysis Machine Intell., Vol.25, pp.1333 - 1336, 2001.
Gruen, A. W., Adaptive least squares correlation: A powerful image matching technique, S. Afr. J. of Photogrammetry, Remote Sensing and Cartography, vol.14, no.3, pp.175 - 187, 1985.
Lu, Y., Reeves R., and Kubik K., Image matching and the compound techniques in terrain reconstruction. In: Proc. 1998 IEEE Int. Conf. on Systems, Man, and Cybernetics, 11-14 Oct 1998, San Diego, CA, USA, Vol.5, IEEE CS Press: Los Alamitos, pp.4459 - 4464, 1998.
Papadimitriou, C., and Steiglitz, K., An Algorithmic Theory of Numbers, Graphs, and Convexity, SIAM Press, 1986.
Vazirani, V. V., Approximation Algorithms. Springer: Berlin, 2003.

Return to the table of contents

Appendix 1: Matrix Diagonalisation Using Eigenvectors

Eigenvalues and Eigenvectors

Eigenvectors and eigenvalues of a matrix A are solutions of the matrix-vector equation Ae = λe, or (A−λI)e = 0 where I is the n×n identity matrix with the components I_ij = 1 if i = j and 0 otherwise. The latter equation suggests that the determinant of the matrix A−λI is equal to zero so that the eigenvalues are the solutions of the corresponding algebraic equation |A−λI| = 0.

If A is real-valued and symmetric n×n matrix, A_ij = A_ji, it has N linearly independent (or mutually orthogonal) eigenvectors e₁, e₂, …, e_n. Typically, the eigenvectors are normalised so that their dot product e_i•e_j = 1 if i=j and 0 otherwise. Such eigenvectors are called orthonormal.

Each eigenvector e_n has its own eigenvalue λ_n such that Ae_i = λ_ie_i. For example, let A be an arbitrary 2×2 symmetric real matrix with the components a, b, c:
.
Then its eigenvalues are the two solutions of the above quadratic equation:
.
For example, if a = c = 1 and b = 0 (the 2×2 identity matrix), then both the eigenvalues are equal to 1:
.
Actually, in this case the matrix equations Ae = λe themselves do not constrain the eigenvectors (they give only the identities e_j,i = e_j,i for i,j = 1, 2), so that only the constraints due to their orthonormality hold. In this case there exist infinitely many possible pairs of eigenvectors of the following form: e₁ = [cos θ sin θ]^T and e₂ = [−sin θ cos θ]^T where θ is an arbitrary angle, e.g. θ = 0 for the above pair e₁ = [1 0]^T and e₂ = [0 1]^T.

If a = c = 0 and b = 1, then λ₁ = 1 and λ₁ = −1. The corresponding eigenvectors are obtained as follows:
.
Two more examples of computing eigenvalues and eigenvectors for particular 3×3 symmetric real matrices:
,
and
.

Additional information about eigenvalues and eigenvectors can be found in:

Eigenvalue by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
Eigenvector by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
Eigenvalues and Eigenvectors of a Matrix by Dr. T. Lepikult (Tallin Technical University, Estonia)
PCA Principal Component Analysiis and Dataset Transformation by R. B. Fisher (UK)

Eigendecomposition

Matrix diagonalisation (called also eigendecomposition) is defined for a square n×n matrix A with N orthogonal eigenvectors such as a symmetric real one. Let E =[e₁ e₂ … e_n] be the n×n matrix with the orthonormal eigenvectors of A as the columns, i.e. E_ij = e_j,i. Then AE = [λ₁e₁ λ₂e₁ … λ_ne_n] is the matrix having as the columns the eigenvectors factored by their eigenvalues, i.e. the matrix with the components (AE)_ij = λ_je_j,i; i,j = 1, …, N, where e_j,i is the i-th component of the j-th eigenvector.

In the transpose, E^T, the same eigenvectors form the rows. Therefore, E^TE = I, so that the matrix E is orthogonal, that is, its transpose, E^T, is the (left) inverse, E⁻¹.

Hence E^TAE = Λ where Λ is the n×n diagonal matrix Λ = diag{λ₁, λ₂, …, λ_N} with the components &Lambda_ij = λ_i if i=j and 0 otherwise. Because every real symmetric matrix A is diagonalised by the transformation E^TAE = Λ, it is decomposed into a product of the orthogonal and diagonal matrices as follows: A = EΛE^T.

In particular,
.

Additional information about eigendecomposition can be found in:

Eigen Decomposition by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)

Appendix 2: Singular Value Decomposition (SVD)

SVD is defined for a generic, rectangular matrix as follows: Any m×n matrix A can be written as the product of three matrices: A = UDV^T. The columns of the m×m matrix U are mutually orthogonal unit vectors, as the columns of the n×n matrix V. The m×n matrix D is diagonal; its diagonal elements, σ_i, called singular values are such that σ₁≥σ₂≥ … ≥σ_N≥0. The matrices U and V are not unique, but the singular values are fully determined by the matrix A.

The SVD has the following properties:

A square matrix A is nonsingular if and only if all its singular values are different from zero; the values σ_i show also how close the matrix is to be singular: the ratio C=σ₁/σ_n, called condition number, measures the degree of singularity of A. If 1/C is comparable with the arithmetic precision of a computer, the matrix A is ill-conditioned and for all practical purposes should be considered singular.
If A is a rectangular matrix, the number of non-zero singular values σ_i equals the rank of A. Given a fixed tolerance, ε, being typically of order 10⁻⁶, the number of singular values greater than ε equals the effective rank of A.
If A is a square, nonsingular matrix, its inverse can be written as A⁻¹ = VD⁻¹U^T.
Be A singular or not, the pseudoinverse of A, A⁺, is A⁺ = VD₀⁻¹U^T, with D₀⁻¹ equal to D⁻¹ for all nonzero singular values and zero otherwise. If A is nonsingular, then D₀⁻¹ = D⁻¹ and A⁺ = A⁻¹.
The columns of U corresponding to the nonzero singular values span the range of A; the columns of V corresponding to the zero singular value span the null space of A (see e.g. reference materials on null space in Wikipedia or lecture notes on matrices prepared by Dr. T. Lepikult, Tallin Technical University, Estonia).
The columns of U are eigenvectors of AA^T. The columns of V are eigenvectors of A^TA. The squares of the nonzero singular values are the nonzero eigenvalues of both the n×n matrix A^TA and m×m matrix AA^T. Moreover, Av_i = σ_iu_i and A^Tu_i = σ_iv_i, where u_i and v_i are the columns of U and V corresponding to σ_i.
The Frobenius norm of a matrix A is the sum of the squares of the entries a_ij of A, or . It follows from the SVD A = UDV^T that .

An example of the SVD of the symmetric singular square 3×3 matrix being used earlier in the eigendecomposition is as follows:
.
The next example presents the SVD of a simple 3×2 rectangular matrix:
.

Note that there is another definition of SVD with the m×n matrix U and n×n matrices D and V which is typically used in computations (see e.g. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in FORTRAN: The Art of Scientific Computing, Cambridge University Press, UK, 2nd ed., 1992: Section 2.6 "Singular Value Decomposition", pp. 51-63) because of a smaller memory space for the matrices: mn + 2N² rather that m² + mn + N² for the initial definition as typically N << m. The latter SVD definition is used e.g. in the C-subroutine svd() being a very straighforward translation into C of the Fortran program developed in the Argonne National Laboratory, USA, in 1983.

Additional information about SVD can be found in:

Introduction to Singular Value Decomposition by Todd Will (University of Wisconsin - La Crosse, USA)
Singular Value (in Linear Algebra) by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
Singular Value Decomposition by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)

Least squares and pseudoinverse

Let an over-determined system of m linear equations, Ax = b, have to be solved for an unknown N-dimensional vector x. The m×n matrix A contains the coefficients of the equations, and the m-dimensional vector b contains the data. If not all the components of b are null, the optimal in the least square sense and the shortest-length solution x^∗ of the system is given by x^∗ = (A^TA)⁺ A^Tb. The derivation is as follows:

If the inverse of (A^TA) exists, then the matrix A⁺ = (A^TA)⁻¹ A^T is the pseudoinverse, or Moore-Penrose inverse. If there are more equations than unknowns, the pseudoinverse is more likely to coinside with the inverse of A^TA, but it is better to compute the pseudoinverse of A^TA through SVD to account for the condition number of this matrix.

While only a square full-rank matrix A has the conventional inverse A⁻¹, the pseudoinverse A⁺ exists for any m × n matrix; m > n. Generally, SVD (singular value decomposition) is the most effective way of getting the pseudoinverse: if A = UDV^T where the m×m matrix U and n×n matrix V are orthogonal and the m×n matrix D is diagonal with real, non-negative singular values σ₁≥σ₂≥ … ≥σ_N≥0, then A⁺ = V(D^TD)⁻¹ D^TU^T.

This and additional information about the pseudoinverse matrix can be found in:

Moore-Penrose Matrix Inverse by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
Normal Equation by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
Pseudoinverse by R. K. Bock (CERN, Switzerland, 1998)
Pseudoinverse Matrix and Optimum Solution by Dr. T. Lepikult (Tallin Technical University, Estonia)

Homogeneous Systems

Let one have to solve a homogeneous system of m linear equations in N unknowns, Ax = 0 with m≥n−1 and rank(A)=N−1. A nontrivial solution unique up to a scale factor is easily found through SVD: this solution is proportional to the eigenvector corresponding to the only zero eigenvalue of A^TA (all other eigenvalues being strictly positive because the rank of A is equal to N−1). This is proved as follows. Since the solution may have an arbitrary norm, a solution of unit norm in the least square sense has to minimise the squared norm ||Ax||² = (Ax)^TAx = x^TA^T Ax, subject to the constraint x^Tx = 1. This is equivalent to minimise the Lagrangian L(x) = x^TA^T Ax − λ(x^Tx − 1). Equating to zero the derivative of the Lagrangian with respect to x gives A^TAx − λx = 0, so that λ is an eigenvalue of the matrix A^TA, and the solution, x = e_λ, is the corresponding eigenvector. The solution makes the Lagrangian L(e_λ) = λ; therefore, the minimum is reached at λ = 0, the least eigenvalue of A^TA.

According to the properties of SVD, this solution could have been equivalently established as the column of V corresponding to the only null singular value of A (the kernel of A). This is why one need not distinguish between these two seemingly different solutions of the same problem.

Matrix Estimates with Constraints

Let entries of a matrix, A, to be estimated satisfy some algebraic constraints (e.g., A is an orthogonal or the fundamental matrix). Due to errors introduced by noise and numerical computations, the estimated matrix, say A_est, may not satisfy the given constraints. This may cause serious problems if subsequent algorithms assume that A_est satisfies exactly the constraints.

SVD allows us to find the closest matrix to A_est, in the sense of the Frobenius norm, which satisfies the constraints exactly. The SVD of the estimated matrix, A_est = UDV^T, is computed, and the diagonal matrix D is replaced with D′ obtained by changing the singular values of D to those expected when the constraints are satisfied exactly (if A_est is a good numerical estimate, its singular values should not be too far from the expected ones). Then, the entries of the new estimated matrix, A = UD′V^T satisfy the desired constraints by construction.

Return to the table of contents


Left and right images of a horizontal stereo pair			Left and right images of a horizontal stereo pair


Upper image of a vertical stereo pair	Bottom image of a vertical stereo pair	Disparity map


3D reconstruction depends on the solution of the correspondence problem		Depth is evaluated from the disparity of the corresponding points