# Basics of Computational Stereo Vision

These notes are under continuous revision. Most of presented materials are taken from textbooks and other sources indicated in references. Lectures will generally follow topics enumerated in CONTENTS, but not necessarily cover all the topics. Additional information and examples will be presented during lectures.

Some useful Internet links:

### Introduction

Stereo vision infers spatial structure and 3D distances of visible points of a scene from two or more images taken from different viewpoints:    Left and right images of a horizontal stereo pair Left and right images of a horizontal stereo pair

In so doing, a stereo system acquires stereo images, determines which points of the images correspond to, i.e. are projections of the same scene point (the correspondence problem), and reconstructs the 3D location and structure of the observed objects from a number of corresponding points and available information on the geometry of stereo acquisition (the reconstruction problem). To solve the correspondence problem means to determine which parts of the stereo images, i.e. of the left and right images of a stereo pair represent the same scene element. To solve the reconstruction problem means that observed 3D surfaces are determined from the corresponding parts of the images and the known geometry of image acquisition.

Disparity, or difference in image position between corresponding points relates to 3D location of the scene point. The disparities of all the image points form the so-called disparity map that can be displayed as an image using greyscale coding of the disparities. If the geometry of image acquisition is known, the disparity map can be converted into a 3D map of the viewed scene:   Upper image of a vertical stereo pair Bottom image of a vertical stereo pair Disparity map

A simple stereo system below consists of two pinhole cameras with the coplanar image planes and parallel optical axes:  3D reconstruction depends on the solution of the correspondence problem Depth is evaluated from the disparity of the corresponding points

The left and right image planes are represented by the segments Ileft and Iright respectively, and Ol and Or are the centres of projection, or optical centres of the cameras. Because the optical axes are parallel, their point of intersection called the fixation point lies intinitely far from the cameras. Generally, stereo systems may have verging optical axes, with the fixation point at a finite distance from the cameras.

The position of an observed spatial point, e.g. P in the above figure, is determined by triangulation, i.e. by interesecting the backtracing rays from the images, pl and pr, of P formed by the two cameras through the centres of projection. If (pl, pr) is a pair of corresponding points, interesecting the rays plOl and prOr gives the spatial point P having projections pl and pr. Triangulation depends crucially on which pairs of image points are cosen as corresponding points. In the above figure, triangulation of the corresponding pairs (pl, pr) and (ql, qr) leads to interpreting the image points as projections of P and Q, respectively; but triangulation of the corresponding pairs (pl, qr) and (ql, pr) returns P′ and Q′. Both interpretations are totally different but equally justified once the respective correspondences are accepted.

Triangulation in the above simple stereo system reconstructs the 3D position of each single observed point P from its projections, pl and pr, using the known distance, T, between the centres of projection Ol and Or and the common focal length, f, of the cameras. The distance T is called the baseline of the stereo system. Let (xl, y) and (xr, y) be the 2D coordinates of the image points pl and pr with respect to the principal points (traces of the optical axes) cl and cr as the origins of (x,y)-image coordinates, the x-axis being parallel to the baseline. The 3D (X,Y,Z)-coordinate frame has the distance (depth) Z-axis parallel to the optical axes, the X-axis parallel to the baseline and the image x-axes , and the Y-axis parallel to the image y-axes. Assuming the 3D coordinate frame has the origin in the optical centre of the left camera, the image coordinates of the spatial point P = (XP,YP,ZP) are as follows: The disparity d measures the difference in image position between the corresponding points in the two images. The depth is inversely proportional to disparity, the infinitely far points (i.e. ZP = ∞) having zero disparity in this simple stereo system.

### References

These lecture notes follow Chapter 7 "Stereopsis" of the textbook of E. Trucco and A. Verri, "Introductory Techniques for 3-D Computer Vision", Prentice Hall, NJ, 1998, with extra examples and materials taken mostly from the Web (with corresponding references) and from the following publications:

• Boykov, Yu., and Kolmogorov, V., An experimental comparison of min-cut / max-flow algorithms for energy minimization in vision, IEEE Trans. Pattern Analysis Machine Intell., Vol.26:9, pp.1124 - 1137, 2004.
• Boykov, Yu., Veksler, O., and Zabih, R., Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Analysis Machine Intell., Vol.23:11, pp.1222 - 1239, 2001.
• Brown, M. Z. , Burschka, D., and Hager, G. D., Advances in computational stereo, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.25, no.8, pp.993 - 1008, 2003.
• Förstner W., Image matching. In: Haralick, R. W., and Shapiro, L. G., Computer and Robot Vision, Vol.II, Chapter 16. Addison-Wesley: Reading, pp.289 - 378, 1993.
• Ford, L. R., and Fulkerson, D. R., Flows in Networks. Princeton Univ. Press, 1962. % Friedman, D., and Drineas, P., Energy minimization via graph cuts: settling what is possible, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, June 20--26, 2005. IEEE CS Press: Los Alamitos, Vol.2, pp.939 - 946, 2005.
• Geman, S., and Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Analysis Machine Intell., Vol.6, pp.721 - 741, 1984.
• Gimel'farb, G., Stereo terrain reconstruction by dynamic programming, in: Handbook of Computer Vision and Applications (Jaehne, B., Haussecker, H., and Geissler P., Eds.), Vol. 2, Chapter 18. Academic Press: San Diego, pp.505 - 530, 1999.
• Goldberg, A. V., and Tarjan, R. E., A new approach to the minimum flow problem, J. Assoc. Computing Machinery, Vol.35:4, pp.921 - 940, 1988.
• Greig, D. M., Porteous, B. T., and Seheult, A. H., Exact maximum a posteriori estimation for binary images, J. Royal Statistic. Society, Ser.B., Vol.51, pp.271 - 279, 1989.
• Ishikawa, H., Exact optimization for Markov random fields with convex priors, IEEE Trans. Pattern Analysis Machine Intell., Vol.25, pp.1333 - 1336, 2001.
• Gruen, A. W., Adaptive least squares correlation: A powerful image matching technique, S. Afr. J. of Photogrammetry, Remote Sensing and Cartography, vol.14, no.3, pp.175 - 187, 1985.
• Lu, Y., Reeves R., and Kubik K., Image matching and the compound techniques in terrain reconstruction. In: Proc. 1998 IEEE Int. Conf. on Systems, Man, and Cybernetics, 11-14 Oct 1998, San Diego, CA, USA, Vol.5, IEEE CS Press: Los Alamitos, pp.4459 - 4464, 1998.
• Papadimitriou, C., and Steiglitz, K., An Algorithmic Theory of Numbers, Graphs, and Convexity, SIAM Press, 1986.
• Vazirani, V. V., Approximation Algorithms. Springer: Berlin, 2003.

## Appendix 1: Matrix Diagonalisation Using Eigenvectors

#### Eigenvalues and Eigenvectors

Eigenvectors and eigenvalues of a matrix A are solutions of the matrix-vector equation Ae = λe, or (A−λI)e = 0 where I is the n×n identity matrix with the components Iij = 1 if i = j and 0 otherwise. The latter equation suggests that the determinant of the matrix A−λI is equal to zero so that the eigenvalues are the solutions of the corresponding algebraic equation |A−λI| = 0.

If A is real-valued and symmetric n×n matrix, Aij = Aji, it has N linearly independent (or mutually orthogonal) eigenvectors e1, e2, …, en. Typically, the eigenvectors are normalised so that their dot product ei•ej = 1 if i=j and 0 otherwise. Such eigenvectors are called orthonormal.

Each eigenvector en has its own eigenvalue λn such that Aei = λiei. For example, let A be an arbitrary 2×2 symmetric real matrix with the components a, b, c: .
Then its eigenvalues are the two solutions of the above quadratic equation: .
For example, if a = c = 1 and b = 0 (the 2×2 identity matrix), then both the eigenvalues are equal to 1: .
Actually, in this case the matrix equations Ae = λe themselves do not constrain the eigenvectors (they give only the identities ej,i = ej,i for i,j = 1, 2), so that only the constraints due to their orthonormality hold. In this case there exist infinitely many possible pairs of eigenvectors of the following form: e1 = [cos θ sin θ]T and e2 = [−sin θ cos θ]T where θ is an arbitrary angle, e.g. θ = 0 for the above pair e1 = [1 0]T and e2 = [0 1]T.

If a = c = 0 and b = 1, then λ1 = 1 and λ1 = −1. The corresponding eigenvectors are obtained as follows: .
Two more examples of computing eigenvalues and eigenvectors for particular 3×3 symmetric real matrices: ,
and .

Additional information about eigenvalues and eigenvectors can be found in:

• Eigenvalue by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
• Eigenvector by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)
• Eigenvalues and Eigenvectors of a Matrix by Dr. T. Lepikult (Tallin Technical University, Estonia)
• PCA Principal Component Analysiis and Dataset Transformation by R. B. Fisher (UK)

#### Eigendecomposition

Matrix diagonalisation (called also eigendecomposition) is defined for a square n×n matrix A with N orthogonal eigenvectors such as a symmetric real one. Let E =[e1 e2en] be the n×n matrix with the orthonormal eigenvectors of A as the columns, i.e. Eij = ej,i. Then AE = [λ1e1 λ2e1 … λnen] is the matrix having as the columns the eigenvectors factored by their eigenvalues, i.e. the matrix with the components (AE)ij = λjej,i; i,j = 1, …, N, where ej,i is the i-th component of the j-th eigenvector.

In the transpose, ET, the same eigenvectors form the rows. Therefore, ETE = I, so that the matrix E is orthogonal, that is, its transpose, ET, is the (left) inverse, E−1.

Hence ETAE = Λ where Λ is the n×n diagonal matrix Λ = diag1, λ2, …, λN} with the components &Lambdaij = λi if i=j and 0 otherwise. Because every real symmetric matrix A is diagonalised by the transformation ETAE = Λ, it is decomposed into a product of the orthogonal and diagonal matrices as follows: A = EΛET.

In particular, .

Additional information about eigendecomposition can be found in:

• Eigen Decomposition by Eric W. Weisstein (from MathWorld - A Wolfram Web Resource; Wolfram Research)

## Appendix 2: Singular Value Decomposition (SVD)

SVD is defined for a generic, rectangular matrix as follows: Any m×n matrix A can be written as the product of three matrices: A = UDVT. The columns of the m×m matrix U are mutually orthogonal unit vectors, as the columns of the n×n matrix V. The m×n matrix D is diagonal; its diagonal elements, σi, called singular values are such that σ1≥σ2≥ … ≥σN≥0. The matrices U and V are not unique, but the singular values are fully determined by the matrix A.

The SVD has the following properties:

1. A square matrix A is nonsingular if and only if all its singular values are different from zero; the values σi show also how close the matrix is to be singular: the ratio C1n, called condition number, measures the degree of singularity of A. If 1/C is comparable with the arithmetic precision of a computer, the matrix A is ill-conditioned and for all practical purposes should be considered singular.
2. If A is a rectangular matrix, the number of non-zero singular values σi equals the rank of A. Given a fixed tolerance, ε, being typically of order 10−6, the number of singular values greater than ε equals the effective rank of A.
3. If A is a square, nonsingular matrix, its inverse can be written as A−1 = VD−1UT.
Be A singular or not, the pseudoinverse of A, A+, is A+ = VD0−1UT, with D0−1 equal to D−1 for all nonzero singular values and zero otherwise. If A is nonsingular, then D0−1 = D−1 and A+ = A−1.
4. The columns of U corresponding to the nonzero singular values span the range of A; the columns of V corresponding to the zero singular value span the null space of A (see e.g. reference materials on null space in Wikipedia or lecture notes on matrices prepared by Dr. T. Lepikult, Tallin Technical University, Estonia).
5. The columns of U are eigenvectors of AAT. The columns of V are eigenvectors of ATA. The squares of the nonzero singular values are the nonzero eigenvalues of both the n×n matrix ATA and m×m matrix AAT. Moreover, Avi = σiui and ATui = σivi, where ui and vi are the columns of U and V corresponding to σi.
6. The Frobenius norm of a matrix A is the sum of the squares of the entries aij of A, or . It follows from the SVD A = UDVT that .

An example of the SVD of the symmetric singular square 3×3 matrix being used earlier in the eigendecomposition is as follows: .
The next example presents the SVD of a simple 3×2 rectangular matrix: .

Note that there is another definition of SVD with the m×n matrix U and n×n matrices D and V which is typically used in computations (see e.g. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in FORTRAN: The Art of Scientific Computing, Cambridge University Press, UK, 2nd ed., 1992: Section 2.6 "Singular Value Decomposition", pp. 51-63) because of a smaller memory space for the matrices: mn + 2N2 rather that m2 + mn + N2 for the initial definition as typically N << m. The latter SVD definition is used e.g. in the C-subroutine svd() being a very straighforward translation into C of the Fortran program developed in the Argonne National Laboratory, USA, in 1983.

Additional information about SVD can be found in:

#### Least squares and pseudoinverse

Let an over-determined system of m linear equations, Ax = b, have to be solved for an unknown N-dimensional vector x. The m×n matrix A contains the coefficients of the equations, and the m-dimensional vector b contains the data. If not all the components of b are null, the optimal in the least square sense and the shortest-length solution x of the system is given by x = (ATA)+ ATb. The derivation is as follows: .

If the inverse of (ATA) exists, then the matrix A+ = (ATA)−1 AT is the pseudoinverse, or Moore-Penrose inverse. If there are more equations than unknowns, the pseudoinverse is more likely to coinside with the inverse of ATA, but it is better to compute the pseudoinverse of ATA through SVD to account for the condition number of this matrix.

While only a square full-rank matrix A has the conventional inverse A−1, the pseudoinverse A+ exists for any m × n matrix; m > n. Generally, SVD (singular value decomposition) is the most effective way of getting the pseudoinverse: if A = UDVT where the m×m matrix U and n×n matrix V are orthogonal and the m×n matrix D is diagonal with real, non-negative singular values σ1≥σ2≥ … ≥σN≥0, then A+ = V(DTD)−1 DTUT.

This and additional information about the pseudoinverse matrix can be found in:

#### Homogeneous Systems

Let one have to solve a homogeneous system of m linear equations in N unknowns, Ax = 0 with m≥n−1 and rank(A)=N−1. A nontrivial solution unique up to a scale factor is easily found through SVD: this solution is proportional to the eigenvector corresponding to the only zero eigenvalue of ATA (all other eigenvalues being strictly positive because the rank of A is equal to N−1). This is proved as follows. Since the solution may have an arbitrary norm, a solution of unit norm in the least square sense has to minimise the squared norm ||Ax||2 = (Ax)TAx = xTAT Ax, subject to the constraint xTx = 1. This is equivalent to minimise the Lagrangian L(x) = xTAT Ax − λ(xTx − 1). Equating to zero the derivative of the Lagrangian with respect to x gives ATAx − λx = 0, so that λ is an eigenvalue of the matrix ATA, and the solution, x = eλ, is the corresponding eigenvector. The solution makes the Lagrangian L(eλ) = λ; therefore, the minimum is reached at λ = 0, the least eigenvalue of ATA.

According to the properties of SVD, this solution could have been equivalently established as the column of V corresponding to the only null singular value of A (the kernel of A). This is why one need not distinguish between these two seemingly different solutions of the same problem.

#### Matrix Estimates with Constraints

Let entries of a matrix, A, to be estimated satisfy some algebraic constraints (e.g., A is an orthogonal or the fundamental matrix). Due to errors introduced by noise and numerical computations, the estimated matrix, say Aest, may not satisfy the given constraints. This may cause serious problems if subsequent algorithms assume that Aest satisfies exactly the constraints.

SVD allows us to find the closest matrix to Aest, in the sense of the Frobenius norm, which satisfies the constraints exactly. The SVD of the estimated matrix, Aest = UDVT, is computed, and the diagonal matrix D is replaced with D′ obtained by changing the singular values of D to those expected when the constraints are satisfied exactly (if Aest is a good numerical estimate, its singular values should not be too far from the expected ones). Then, the entries of the new estimated matrix, A = UD′VT satisfy the desired constraints by construction.