Stereo Matching by Graph-cut Techniques

CONTENTS

The search for a whole continuous 3-D surface making the minimum dissimilarity or maximum similarity between corresponding points of a stereo pair can be considered as combinatorial optimisation on graphs. The graphs describe binary relationships between the neighbouring disparities and signals that restrict the solution. Typically, the graph specifies neighborhoods, i.e. subsets of mutually dependent pixels or points, and weights of its nodes and edges characterize possible image signals and/or disparities. The surfaces are evaluated in terms of local and global energies accumulating the weights of nodes and edges. Generally, under dense 2D neighborhoods, an optimal minimum-energy solution is an NP-complete problem. But it can be at least approximately solved or sometimes is reduced to an exactly solvable particular case.

Maximum flow problem

Let G = [N;E] denote a directed (linear) graph with a collection of nodes (vertices, points) N = {xa, xb, xc, …} and a subset EN × N of ordered pairs (edges, arcs) (xα,xβ) of elements from N. A chain is a sequence of nodes x1, x2, … such that (xi, xi+1) ∈ E. A path is a sequence x1, x2, … such that (xi, xi+1) ∈ E or (xi+1, xi) ∈ E. Let A(x) = {y | yN; (x,y) ∈ E} and B(x) = {y | yN; (y,x) ∈ E} be subsets of the nodes after x and before x, respectively.

A network is a graph with two special nodes s and t called source and sink, respectively, and with a non-negative capacity c(x,y) ≥ 0 assigned to every edge (x,y) ∈ E. The function c: ER≥0, where R≥0 is the set of non-negative real numbers, is called the capacity function.

A static flow of value v from s to t in a network G = [N;E] with a capacity function c is a function f: ER≥0 that satisfies the linear constraints:

  1. The flow through every edge does not exceed the edge capacity: ∀(x,y) ∈ E   f(x,y) ≤ c(x,y)
  2. Every node other than the source and the sink has equal total out- and inflows:
A simple example below shows the network flow of value 3 assuming that f(x,y) ≤ c(x,y) for all edges:
.

The static maximum flow problem is to maximize the variable v subject to the above flow constraints. To simplify the notation, let (X,Y)); XN, YN denote a set of all edges from nodes xX to nodes yY. For any function g: ER, let g(X,Y) denote the sum of values of the function over the set of edges:

It is easily shown that
so that for disjoint sets Y and Z
In particular, B(x),x)=(N,x) and (x,A(x))=(x,N) so that

A cut C in G = [N;E] separating the source s and the sink t is a set of edges (X,N\X) such that s ∈ X and tN\X. The capacity of a cut (X,N\X) is c(X,N\X). A simple example below

shows a set of edges C = {(s,y), (x,y), (x,t)} with X = {s,x} and N\X ={y,t}. The set C is the cut separating s and t, and its capacity is c(s,y) + c(x,y) + c(x,t) = 3 + 1 + 3 = 7.

Lemma 1 [Ford, Fulkerson;1956]: Let f be a flow from s to t in a network [N; E], and let f have value v. If (X, N\X) is a cut, separating s and t, then

f(X, N\X) − f(N\X, X) ≤ c(X, N\X )

Proof: Since f is a flow, then f(s,N) − f(N,s) = v; f(x,N) − f(N,x) = 0 for all x ∈ N\{s,t}, and f(t,N) − f(N,t) = −v. Let us sum these equations over x ∈ X. Since s ∈ X and t ∈ N\X, the result is

Because of the obvious relationship the above equality yields
thus verifying the equality in Lemma. Since f(N\X, X) ≥ 0 and f(X, N\X) ≤ c(X, N\X) due to the flow constraints, the inequality in Lemma 1 follows immediately.

The equality in Lemma 1 states that the value of a flow from s to t is equal to the net flow across any cut separating s and t. The fundamental result concerning the maximal network flow is given by
MAX-FLOW MIN-CUT Theorem [Ford, Fulkerson; 1956]: For any network the maximum flow value from s to t is equal to the minimal cut capacity of all cuts separating s and t.

Return to the local table of contents

Return to the global table of contents

Ford-Fulkerson max-flow algorithm

Let a flow augmenting path with respect to a flow f be defined as a path from s to t such that f < c on forward edges of the path and f > 0 on reverse edges of the path. The following corollary is of fundamental importance in searching for the maximal network flows:
Corollary 1 [Ford, Fulkerson; 1956]: A flow f is maximal if and only if there is no flow augmenting path with respect to f.

This corollary states that in order to increase the value of a flow, it is sufficient to search for improvements of a very restricted kind. Let an edge (x,y) be called saturated with respect to a flow f if f(x,y) = c(x,y) and called flowless with respect to f if f(x,y) = 0. An edge being both saturated and flowless has zero capacity. A minimal cut is characterized in terms of these notions by
Corollary 2 [Ford, Fulkerson; 1956]: A cut (X, N\X) is minimal if and only if every maximal flow saturates all edges of (X, N\X) whereas all edges of (N\X, X) are flowless with respect to f.

Ford-Fulkerson labelling algorithm: Under mild restrictions on the capacity function, the proof of the max-flow min-cut theorem provides an algorithm for constructing a maximal flow and minimal cut in a network. To ensure termination, all the capacities should have integer values. Initialization is with the zero flow. Then a sequence of "labelings" is performed (Routine A below). Each labeling either results in a flow of higher value (Routine B below) or terminates with the conclusion that the present flow is maximal. Given an integral flow f, labels are assigned to nodes of the network according to Routine A. A node can be in one of three states: unlabeled(UL), labeled and scanned (LS), or labeled and unscanned (LUS). Each label has one of the forms (x+,ε) or (x,ε) where xN and ε is a positive integer or ∞. Initially all nodes are UL.


Routine A: labeling


The labeling process searches systematically for a flow augmenting path from s to t (see Corollary 1). Information about the paths is carried along in the labels, so that if the sink is labelled, the resulting flow change along the path is readily made. On the other hand, if Routine A ends and the sink is not labelled, the flow is maximal and the set of edges leading from labelled (LUS,LS) nodes to unlabelled (UL) nodes is a minimal cut.

The labelling process is computationally efficient because once a node is labelled and scanned, it is ignored for the remainder of the process. Labelling a node x locates a path from s to x that can be the initial segment of a flow augmenting path. There may be many such paths from s to x, but it is sufficient to find one.

Computational complexity: Both the initial Ford-Fulkerson max-flow min-cut algorithm and the subsequent solutions of the same problem have polynomial time complexity (below, n = |N| is the size, or cardinality of the set of nodes N and m = |E| is the size of the set of edges E):
Algorithm Principle Complexity
Ford--Fulkerson, 1956 Finding flow augmenting paths O(nm2)
Dinic, 1970 Shortest augmenting paths in one step O(n2m)
in a dense graph:O(n3)
in a sparse graph:O(nm log(n))
Goldberg--Tarjan, 1985 Pushing a pre-flow O(nm log(n2/m))

Time complexity of the Ford-Fulkerson algorithm depends on how the flow augmenting paths are determined. The O(nm2) Edmonds-Karp algorithm implements the Ford-Fulkerson scheme by using the breadth-first search for the augmenting path being a shortest path from s to t in the residual graph (network) where each edge has unit length, or weight. For a given network G = [N; E] and a flow f, the residual network Gf induced by f consists of edges, Ef, that allow for increasing the flow. The additional flow permitted by the edge capacity is called the residual capacity of the edge:

rf(u,v) = c(u,v) − f(u,v); (u,v) ∈ E.
An edge (x,y) with the non-zero residual capacity is called a residual edge if rf(x,y) > 0. The residual graph Gf = [N; Ef] for a pre-flow f is the graph with the same set of nodes N but with the set of only residual edges: Ef = {(x,y): (x,y) ∈ E; rf(x,y) > 0}. The maximum increment of the flow along each augmenting path is called the residual capacity of the path and is equal to
rf(path p) = min(u,v)∈p{rf(u,v)}.

Return to the local table of contents

Return to the global table of contents

Goldberg-Tarjan push-label max-flow algorithm

Search strategies based on flow augmenting paths keep a flow at each step although the maximal flow is needed only at the very end. This is why these strategies turned to be less efficient than the subsequent ones based on a pre-flow notion coined by A. V. Karzanov in 1974. The Karzanov's pre-flow violates the flow conditions in that the in-flow and out-flow nodes may not be equal, the difference at node x being called the excess at x). If f is a pre-flow in a graph G = [N; E], then the excess e(x) at x is:

e(x) = f(B(x),x) − f(x,A(x)).
The "push-label" search process pushes as much as possible into the graph, then trims the excess to zero. Until the very end (when the excess at each node goes eventually to zero), no flow exists in the graph. To describe the process, the residual capacity rf(x,y) of an edge (x,y) is defined as:
rf(x,y) = c(x,y) − f(x,y) + f(y,x).
The distance dG(x,y) from a node x to the node y in G is defined as the minimum number of edges on a path from x to y in G. If there is no such path, dG(x,y) = ∞. To estimate the distance from a node to s or t, a valid labelling d is defined as a function d: NR≥0, such that d(s) = n, d(t) = 0, and d(x) ≤ d(y) + 1 for every residual edge (x,y). If d(x) < n, then d(x) is a lower bound of the actual distance from x to t in the residual graph Gf, and if d(x) ≥ n, then d(x) − n is a lower bound on the actual distance to s in the residual graph (it can be proven that in the latter case t is not reachable from x in Gf). A node x is called active if xN\{s,t}, d(x) < ∞, and e(x) > 0 (the source and sink are never active). The algorithm below iteratively selects active nodes, tries to push as much as possible from them, and relabel them in order to convert a pre-flow into a maximal flow:


Return to the local table of contents

Return to the global table of contents

Energy minimization via graph cuts: a binary case

Greig et al. (see References) were the first who have shown how thew Bayesian denoising (restoration) of binary images can be reformulated as a minimum cut problem in a certain network and solved exactly with the max-flow/min-cut technique such as the aforementioned Ford-Fulkerson algorithm. Let x = (x1, …, xn) and y = (y1, …, yn) be a hidden noiseless and a measured noisy image, respectively, such that the measurements yi; i = 1,…,n, are conditionally independent given a noiseless image x. Each measured signal yi has a known conditional probability density function p(yi|xi), depending on x only through xi. Therefore,

The hidden noiseless images are modeled as samples of a Markov random field with pairwise interactions having the Gibbs probability distribution
where βii = 0 and βij = βji ≥ 0. The inequality βij = βji > 0 holds for neighbours in the lattice, e.g. for the nearest 4- or 8-neighbours. Apart from an additive constant, the posterior log-likelihood is as follows:
where is a log-likelihood ratio at pixel i. The Bayesian MAP estimate of the image maximises the posterior likelihood:

A capacitated network model for this optimisation problem contains n + 2 nodes, being a source s, a sink t, and the n pixels. If the log-likelihood ratio is positive λi > 0, there is a directed edge (s,i) from s to pixel i with capacity c(s,i) = λi; otherwise, there is a directed edge (i,t) from i to t with capacity c(i,t) = −λi. There is an undirected edge (i,j) between two pixels i and j with capacity c(i,j) = βij if these pixels are neighbours. Figure below exemplifies such a network model for restoring a binary image under a Markov-Gibbs prior model with the 4-neighbourhood interactions between pixels (here, signals y1,…,yn of an observed noisy image that result in positive, λi > 0, and non-positive, λi ≤ 0, log-likelihood ratios are shown by the black and white pixel nodes, respectively):

Any binary image x = (x1,…,xn) produces a cut (S, T) where S = {s} ∪ {i : xi = 1} and T = {i : xi = 0} ∪ {t} with the capacity
that can be represented as
The latter differs from − by a term which does not depend on x. Therefore, maximizing is equivalent to finding the minimum cut of the above network: in the MAP estimate pixels xi = 1 if they are on the source side S of the minimum cut and xi = 0 otherwise. Experiments conducted by Greig et al. (see References) have underscored the importance of an adequate prior distribution because its global properties very rapidly begin to dominate the likelihood contribution to the posterior distribution. At the same time, the stochastic optimization such as simulated annealing applied with a practicable "cooling" schedule does not necessarily yield a good approximation to the MAP estimate. According to the "annealing theorem" of S. Geman and D. Geman, the inverse logarithmic decrease with time of the temperature term in a Gibbs distribution that controls the annealing process guarantees that the MAP solution will be eventually reached, but unfortunately this may take the infinite time.

Accelerated solution: A ten-fold acceleration of the basic Ford-Falkerson algorithm is obtained for this network model by partitioning the input image into 2K×2K connected sub-images. The MAP estimate is calculated for each sub-image separately, then the sub-images are amalgamated to form a set of 2K−1×2K−1 larger sub-images. The MAP estimate is formed for each of them, and this process is continued until the MAP estimate of the complete image.

Return to the local table of contents

Return to the global table of contents

Energies being minimized by graph cuts

Theorem FD[Friedman, Drineas; 2005] (generally, it is considered as a part of combinatorial optimization folklore): Let xi ∈ {0,1}; i = 1,…,n, and let

where L represents terms that are linear in xi plus any constants, i.e.
.
Then E can be minimized via graph cut techniques if and only if βij ≤ 0 for all i,j.

Proof: To prove the "if" direction, the energy can be rewritten as

where αij = −βij and the linear term L is altered. The minimum energy E(…) over the binary variables xi is given by a minimum cut in a complete graph with n nodes and edge weights wij = αij. The cut splits the nodes with xi = 0 from those with xj = 1 because only the pair xi = 1 and xj = 0 adds the value αij to the energy. As follows from the combinatorial optimization theory, polynomial time min cut is possible if and only if wij ≥ 0, that implies βij ≤ 0.

For altered linear terms Λ = γixi + σ there is an edge (s,i) with the weight wsi = γi if γi ≥ 0 and an edge (i,t) with wit = |γi| if γi < 0, so that all weights are non-negative.

The proof of "only if" is only slightly more complicated.

The MAP estimates based on the Markov-Gibbs posteriors with pairwise interactions result in the class of energy functions:

Each pairwise energy term has only four values and can be equivalently rewritten in terms of these four values as
Therefore, in accord with Theorem FD, such energies can be minimized with the graph min-cut techniques if and only if the regularity condition:
holds for all neighbouring pixel pairs i,j forming edges of the graph model.

More sophisticated sufficient conditions for energy minimization using the min-cut techniques exist also for energy functions with k-wise pixel interaction, k > 2. But natural generalisations of the minimum cut problem to the case of more than two terminals, such as a multiway cut and minimum k-cut, are NP-hard apart from a few special cases with very restrictive conditions on energy functions:

The problem of finding a minimum weight multiway cut is NP-hard for any fixed k ≥ 3 (the case k = 2 is the minimum s-t cut problem). The minimum k-cut problem has polynomial time complexity for fixed k and is NP-hard if k is one of input variables. Both the problems have approximation algorithms that guarantee the solution within factor 2−2/k from the exact optimum.

When the energy functions involve multiple labels per pixel, the pixel-wise stochastic global minimization with simulated annealing (SA) or the deterministic pixel-wise local minimization with the ``greedy" iterated conditional modes (ICM) algorithm typically produce very poor results. Even in the simplest case of binary labeling considered above, the simulated annealing and ICM algorithms converge to stable points such that are too far from the global minimum. Although simulated annealing provably converges to the global minimum of energy, this could be obtained only in exponential time; this is why no practical implementation can closely approach the goal.

The main drawback of the simulated annealing or ICM algorithms is that they are pixel-wise, i.e. each iteration changes only one label in a single pixel in accord with the neighbouring labels. Therefore each iteration results in an extremely small move in the space of possible labellings. Obviously, the convergence rate should become faster under larger moves that simultaneously change labels in a large number of pixels.

Return to the local table of contents

Return to the global table of contents

Approximate minimum multiway cut and k-cut

As was already mentioned, the minimum multiway cut problem of finding the minimum capacity set of edges whose removal separates a given set of k terminals is NP-hard. But it has a provably good approximate solution that can be obtained with the exact min-cut / max-flow techniques. Let an isolating cut for a terminal si in a given set S = {s1,…,sk} be defined as a subset of edges whose removal separates si from the rest of the terminals. Then the following algorithm gives an approximate solution to the multicut problem:


Algorithm AMC ("Approximate Multiway Cut") [Vazirani, 2003]

Step 1 exploits k separate max-flow / min-cut computations. Because the removal of C from the graph disconnects every pair of terminals, it is a multiway cut.

Theorem [Vazirani, 2003]: Algorithm AMC guarantees a solution within a factor 2 − 2/k of the optimal solution.

Proof: Let Copt be an optimal multiway cut in a graph G = [N; E]. Then Copt can be considered as the union of k cuts as follows:

Since each edge in Copt is incident at two of these components, each edge will be in two of the cuts Copt,i. Hence,
Because Copt,i is an isolating cut for si and Ci is a minimum capacity isolating cut for si, it holds that c(Ci) ≤ c(Copt,i). This already gives a factor 2 algorithm, by taking the union of all k cuts Ci. But since C is obtained by discarding the heaviest of the cuts Ci,

The minimum k-cut problem of finding a minimum-capacity set of edges whose removal leaves k connected components in a graph G has a natural factor 2 −2/k approximate solution:

  1. starting from G, compute a minimum cut in each connected component and remove the lightest one;
  2. repeat until there are k connected components.
This algorithm does guarantee a solution within the factor 2 −2/k of the optimal, however the proof is very complicated. There exist simpler algorithms achieving the same guaranteed approximation and having simpler proofs.

Return to the local table of contents

Return to the global table of contents

Large moves using min-cut / max-flow techniques

Two fast approximate algorithms for energy minimization developed by Boykov, Veksler, and Zabih (see References) improve the poor convergence of simulated annealing by replacing pixel-wise changes with specific large moves. The resulting process converges to a solution that is provably within a known factor of the global energy minimum.

The energy function to be minimized is

(E1)                      
where L = {1,2,…,L} is an arbitrary finite set of labels, N ⊂ {1,…,n}2 denotes a set of neighbouring, or interacting pairs of pixels, Vi: LR, where R denotes a set of real numbers, is a pixel-wise potential function specifying pixel-wise energies in a pixel i under different labels, and Vij: L2R is a pairwise potential function specifying pairwise interaction energies for different labels in a pair (i,j) of neighbours. The pixel-wise energies Vi(…) can be arbitrary, but the pairwise interaction energies have to be either semimetric, i.e. satisfy the constraints
α,β ∈ L   Vij(α,α) = 0;   Vij(α,β) = Vij(β,α) ≥ 0
or metric, i.e. satisfy the same constraints plus the triangle inequality
α,β,γ ∈ L   Vij(α,β) ≤ Vij(α,γ) + Vij(γ,β)
Each pixel labelling x with a finite set of indices L = {1,…,L} partitions the set of pixels R = {1,…,n} into L disjoint subsets Rλ = {i | iR; xi = λ ∈ L} (some of them may be empty) i.e. creates a partition P = {Rλ : λ = 1,…,L}. Each change of a labelling x changes the corresponding partition P.

The approximate Boykov-Veksler-Zabih minimization algorithms work with any semimetric or metric Vij by using large α-β-swap or α-expansion moves respectively. The conditionally optimal moves are found with a min-cut / max-flow technique.

The α-β-swap for an arbitrary pair of labels α,β∈L is a move from a partition P for a current labelling x to a new partition P′ for a new labelling x′ such that Rλ = Rλ for any label λ ≠ α, β. In other words, this move changes only the labels α and β in their current region Rαβ = RαRβ whereas all other labels in R\Rαβ remain fixed. In the general case, after an α-β-swap some pixels change their labels from α to β and some others -- from β to α. A special variant is when the label α is assigned to some pixels previously labelled β.

The α-expansion of an arbitrary label α is a move from a partition P for a current labelling x to a new partition P′ for a new labelling x′ such that RαRα and R\Rα = λ∈L; λ ≠ αRλR\Rα = λ∈L; λ ≠ αRλ. In other words, after this move any subset of pixels can change their labels to α.

Energy minimization algorithms: The SA and ICM algorithms use standard pixel-wise relaxation moves changing one label each time. Such a move is both α-β-swap and α-expansion, so that these latter generalize the standard relaxation scheme. The algorithms based on these generalizations are sketched below.


Swap algorithm for semimetric interaction potentials


Expansion algorithm for metric interaction potentials

An iteration at Step 2 performs L individual α-expansion moves in the expansion algorithm and L2 individual α-β-swap moves in the swap algorithm. It is possible to prove that the minimisation terminates in a finite number of iterations being of order of the image size n. Actually, image segmentation and stereo reconstruction experiments conducted by Boykov, Kolmogorov, Veksler, and Zabih (see References) have shown that these algorithms converge to the local energy minimum just in a few iterations.

Given a current labelling x (partition P) and a pair of labels (α,β) or a label α, the swap or expansion moves, respectively, at Step 2.1 in the above algorithms use the min-cut / max-flow optimization techique to find a better labelling . This labelling minimises the energy over all labellings within one α-β-swap (the swap algorithm) or one α-expansion (the expansion algorithm) of x and corresponds to a minimum cut of a specific graph having O(n) nodes associated with pixels. The swap and expansion graphs are different, and the exact number of their pixels, their topology, and edge weights vary from step to step in accord with the current partition.

Swap algorithm: finding the optimal move: Figure below exemplifies a graph Gαβ = [Nαβ; E{αβ] for finding the optimal swap move for a set of pixels Rαβ = RαRβ with the labels α and β:

The graph is built only on the pixels iRαβ having the labels α and β in the partition P corresponding to the current labelling x. The set of nodes Nαβ includes the two terminals, denoted α and β, and all the pixels in Rαβ. Each pixel iRαβ is connected to the terminals α and β by edges tα,i and tβ,i, respectively, called t-links (terminal links). Each pair of the nodes (i,j) ⊂ Rαβ which are neighbours, i.e. (i,j) ∈ E, is connected with an edge ei,j called n-link (neighbour link). Therefore, the set of edges Eαβ consists of the t- and n-links.

If the edges have the following weights:

then each cut C on Gαβ must include exactly one t-link for any pixel iRαβ; otherwise either there would be a path between the terminals if both the links are included, or a proper subset of C would become a cut if both the links are excluded. Therefore, any cut C provides a natural labelling xC such that every pixel iRαβ is labelled with α or β if the cut C separates i from the terminal α or β, respectively, and the other pixels keep their initial labels:
as illustrated below:

A cut C on Gαβ for two pixels i,jN connected by an n-link ei,j (dashed edges are cut by C).

Each labelling xC corresponding to a cut C on the graph Gαβ is one α-β-swap away from the initial labelling x.

Because a cut separates a subset of pixels in Rαβ associated with one terminal from a complementary subset of pixels associated with another terminal, it includes (i.e. severs in the graph) an n-link ei,j between the neighbouring pixels in Rαβ if and only if the pixels i and j are connected to different terminals under this cut:

By taking into account that the function Vij(xC,i, xC,j) is a semimetric and by considering possible cuts involving t-links of and n-link between i and j and the corresponding labellings, it is possible to prove

Theorem BVZ-T1 [Boykov, Veksler, Zabih, 2001]: There is an one-to-one correspondence between cuts C on Gαβ and labellings xC that are one α-β swap from x. The capacity of a cut C on Gαβ is c(C) = E(xC) plus a constant where E(…) is the energy function in Eq.(E1).

Corollary BVZ-C1 [Boykov, Veksler, Zabih, 2001]: The lowest energy labelling within a single α-β-swap move from a current labelling x is the labelling = x°C corresponding to the minimum cut C° on Gαβ.

Expansion algorithm: finding the optimal move: The set of nodes Nα of the graph Gα = [Nα; Eα] for finding an optimal expansion-move includes the two terminals, denoted α and α, all the pixels iR, and a set of auxiliary nodes for each pair of the neighbouring nodes (i,j) ∈ E such that have different labels xixj in the current partition P. The auxiliary nodes are on the boundaries between the partition sets Rλ; λ ∈ L. Thus the set of nodes is

A simple 1D graph G α below gives an example of finding the optimal expansion move for the set of pixels in the image:
Here, the set of pixels is R = {i,j,k,l,m}, and the current partition is P = {Rα, Rβ,Rγ} where Rα = {i}, Rβ = {j,k}, and Rγ = {l,m}. Two auxiliary nodes ai,j and ak,l are added between the neighboring pixels with different labels in the current partition, i.e. at the boundaries of the subsets Rλ.

Each pixel iR is connected to the terminals α and α by t-links tα,i and t α,i, respectively. Each pair of the neighboring nodes (i,j) ⊂ N that are not separated in the current partition, i.e. have the same labels xi = xj in the current labelling, is connected with an n-link ei,j. For each pair of the separated neighboring pixels (i,j) ∈ N such that xixj, the introduced auxiliary node ai,j results in three edges Ei,j = {ei, ai,j, eai,j ,j, t α,ai,j} where the first pair of n-links e connects the pixels i and j to the auxiliary node ai,j, and the t-link connects the auxiliary node ai,j to the terminal α. Therefore, the set of all edges Eα is

The edges have the following weights:
That any cut C on Gα must include exactly one t-link for any pixel iR provides a natural labelling xC corresponding to the cut :
as illustrated below:

Figure MC: A minimum cut C on Gα for two pixels i,jN such that xi ≠ xj (aai,j) is
an auxiliary node between the neighbouring pixels i and j; dashed edges are cut by C.

Each labelling xC corresponding to a cut C on the graph Gα is one α-expansion away from the initial labelling x.

Because a cut separates a subset of pixels in R associated with one terminal from a complementary subset of pixels associated with another terminal, it severs an n-link ei,j between the neighboring pixels (i,j) ∈ N if and only if the pixels i and j are connected to different terminals under this cut, or in formal terms:

(P1)               

The triplet of edges Ei,j corresponding to a pair of neighboring pixels (i,j) ∈ N such that xi ≠ xj may be cut in different ways even when the pair of severed t-links at i and j are fixed. However, a minimum cut defines uniquely the edges to sever in Ei,j in these cases due to the minimality of the cut and the metric properties of the potentials associated with the edges {ei,ai,j, eai,j,j, tα,ai,j} ∈ Ei,j. The triangle inequality suggests that it is always better to cut any one of them, rather than the other two together. This property of a minimum cut C illustrated in above Figure MC has the following formal representation: if (i,j) ∈ C and xi ≠ xj, then C satisfies the conditions

(P2)               

These properties may hold for non-minimal cuts, too. If an elementary cut is defined as a cut satisfying the above conditions P1 and P2, then it is possible to prove

Theorem BVZ-T2 [Boykov, Veksler, Zabih, 2001]: Let a graph Gα be constructed as above given a labelling x and α. Then, there is an one-to-one correspondence between elementary cuts on Gα and labellings within one α-expansion from x. The capacity of any elementary cut C is c(C) = E(xC) where E(…) is the energy of Eq.(E1).

Corollary BVZ-C2 [Boykov, Veksler, Zabih, 2001]: The lowest energy labelling within a single α-expansion move from x is the labelling = xC° corresponding to the minimum cut C° on Gα.

Return to the local table of contents

Return to the global table of contents

Optimality of large moves

Although the swap move algorithm has a wider application area due to only semimetric requirements to potentials Vi,j(…), generally it possesses no proven optimality properties. But a local minimum obtained with the expansion move algorithm is within a fixed factor of the global minimum, according to

Theorem BVZ-T3 [Boykov, Veksler, Zabih, 2001]: Let be a labelling for a local energy minimum when the expansion moves are allowed, and let x* be the globally optimal solution. Then, where

Proof: Let some α ∈ L be fixed and let R*α = {iR | x*i = α}. Let xα be a labelling within one α-expansion move from such that

Since is a local minimum if expansion moves are allowed,
(E2)                                                  .

Let S = SpixSpair be an union of an arbitrary subset Spix of pixels in R; SpixR, and of an arbitrary subset Spair of neighbouring pixels in N; SpairN. A restriction of the energy of labelling x to S is defined as

Let I*α, B*α, and O*α denote the union of pixels and pairs of neighbouring pixels contained inside, on the boundary, and outside of R*α, respectively:
The following three relationships hold:
The relationships (a) and (c) follow directly from the definitions of R*α and xα. The relationship (b) holds because Vi,j(xα,i,xα,j) ≤ cV(x*i,y*i) ≠ 0 for any (i,j) ∈ B*α.

The union I*αB*αOα includes all the pixels in R and all the neighbouring pairs of pixels in N. Therefore, Eq.(E2) can be rewritten as

By substituting the above relationships (a)-(c), one can obtain:
To get the bound on the total energy, this relationship has to be summed over all the labels α ∈ L:

(E3)                         

For every (i,j) ∈ B = α∈L B*α, the term appears twice on the left side of Eq.(E3): once in for α = x*i, and once in for α = x*j. Similarly, every appears 2c times on the right side of Eq.(E3). Therefore, Eq.(E3) can be rewritten as

to give the bound of 2c for the factor of the global minimum.

Pictures below taken from the Middlebury Stereo Vision Page http://www.middlebury.edu/stereo (see: D.Scharstein and R.Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms", Int. J. Computer Vision, vol.47 (1/2/3), pp.7-42, April-June 2002) show that the graph-cut stereo algorithm notably outperforms both the dynamic programming stereo and the SSD-based stereo algorithms (this graph-cut algorithm of V.Kolmogorov and R.Zabih takes account of both matches and occlusions; see "Computing Visual Correspondence with Occlusions using Graph Cuts", Proc. 8th IEEE Int. Conf. on Computer Vision, Vancouver, Canada, July 9-12, 2001, vol.2, pp.508-515, 2001):
Stereo pair "Tsukuba" True disparity map
SSD stereo (window 21×21) Dynamic programming stereo Graph-cut stereo
Grey-coded reconstructed disparity maps
Grey-coded signed disparity errors w.r.t. the true disparity map

Return to the local table of contents

Return to the global table of contents