next up previous
Next: Summary Up: Generic Markov-Gibbs Model and Previous: Model Identification

Subsections

Structural Identification of a Generic MGRF Model

A generic MGRF model of spatially homogeneous textures is specified by the explicit spatial geometry (the characteristic neighbourhood) and quantitative strengths of statistical dependencies (clique potentials). Conventional model identification involves estimating both the characteristic neighbourhood and the Gibbs potential of each selected family, from a given texture. The process, in particular potential refinement via stochastic approximation, involves exponential time complexity [38].

In order to simplify the identification, a structural approach is proposed which focuses on identifying the geometric structure of texels (short for TEXture ELement [43]) and placement rules of their spatial arrangement. In the resulting texture description, a texture is constructed by a group of texels that repeats many times over the image plane by certain regular or stochastic spatial arrangement. This description is in line with structural approach to texture analysis [43], which suggests texels and their spatial organisation constitute a ``two-layered structure'' in a texture, specifying local and global properties, respectively,

In the proposed method, each texel is a micro geometric element, consisting of a group of image pixels (not necessarily continuous) with a certain spatial configuration. Each individual texel is distinguished by both the geometric structure and the combination of image signals over the structure. Usually, a texel involves a rather simple spatial configuration of only a small number of pixels. For example, Figure 6.4 shows a simple texel with hexagonal structure.

Figure 6.1: A simple hexagonal texel of six-pixels.
\includegraphics[width=2.2in]{texel-simple.eps}

The structural identification of a generic MGRF model in Eq. (6.2.5) for a given training image $ {g}^{\circ}$ involves the following steps:

  1. Construct the MBIM using the approximate relative partial energy given by Eq. (6.3.4).
  2. Select a geometric structure of the texels by taking account of the higher-energy clique families in the MBIM.
  3. Derive the placement rules for these texels.

Figure 6.2: Brodatz textures [8] and their MBIMs. The textures and their MBIMs are of size $ 128\times 128$ and $ 125\times 125$ respectively. On the left-hand side are stochastic textures, and on the right-hand side are regular textures.
\includegraphics[scale=0.7]{d4.bmp.eps}    \includegraphics[scale=0.7]{d4.m.bmp.eps}    \includegraphics[scale=0.7]{d6.bmp.eps}    \includegraphics[scale=0.7]{d6.m.bmp.eps}
D4   D6  
\includegraphics[scale=0.7]{d12.bmp.eps}    \includegraphics[scale=0.7]{d12.m.bmp.eps}    \includegraphics[scale=0.7]{d20.bmp.eps}    \includegraphics[scale=0.7]{d20.m.bmp.eps}
D12   D20  
\includegraphics[scale=0.7]{d24.bmp.eps}    \includegraphics[scale=0.7]{d24.m.bmp.eps}    \includegraphics[scale=0.7]{d52.bmp.eps}    \includegraphics[scale=0.7]{d52.m.bmp.eps}
D24   D52  
\includegraphics[scale=0.7]{d66.bmp.eps}    \includegraphics[scale=0.7]{d66.m.bmp.eps}    \includegraphics[scale=0.7]{d76.bmp.eps}    \includegraphics[scale=0.7]{d76.m.bmp.eps}
D66   D76  
\includegraphics[scale=0.7]{d105.bmp.eps}    \includegraphics[scale=0.7]{d105.m.bmp.eps}    \includegraphics[scale=0.7]{d77.bmp.eps}    \includegraphics[scale=0.7]{d77.m.bmp.eps}
D105   D77  

Figure 6.3: Stochastic textures D29 [8] and Bark0009 [82], their MBIMs and recovered texels.
\includegraphics[scale=0.7]{d29.bmp.eps}    \includegraphics[scale=0.7]{bark0009.bmp.eps}
D29 Bark0009
\includegraphics[scale=0.35]{d29MBIM.eps}    \includegraphics[scale=0.35]{bark09MBIM.eps}
D29 MBIM Bark0009 MBIM
\framebox{
\includegraphics[scale=0.42]{d29texel.eps}
}    \framebox{
\includegraphics[scale=0.42]{bark09texel.eps}}
D29 Texels Bark0009 Texels

Figure 6.4: Regular textures D34 and D101 [8], their MBIMs and recovered texels.
\includegraphics[scale=0.7]{d34.bmp.eps}    \includegraphics[scale=0.7]{d101.bmp.eps}
D34 D101
   
\includegraphics[scale=0.35]{d34MBIM.eps}    \includegraphics[scale=0.35]{d101MBIM.eps}
D34 MBIM D101 MBIM
\framebox{
\includegraphics[scale=0.42]{d34texel.eps}
}    \framebox{
\includegraphics[scale=0.42]{d101texel.eps}}
D34 Texel D101 Texel

Model-Based Interaction Map

An MBIM jointly depicts the partial energies of all clique families in a texture, from which a characteristic pixel neighbourhood $ \mathcal{N}$ can be estimated.

A generic MGRF model allows an arbitrary structure of pairwise pixel interactions, but the longest interaction to recover depends on the size of each individual training image in practice. Without loss of generality, Markov property is assumed such that all the neighbours of a pixel $ i$ are limited into a large search window $ \mathcal{W}_i$ centred on the pixel. So, $ \mathcal{N}_i \in
\mathcal{W}_i$. In order to capture the periodicity structure of a texture, the search window must be large enough to cover at least a few repetitions. Empirically, the window size is chosen between $ 1/3$ and $ /1/2$ of the size of the training texture. The empirical rule accounts for the need of enough samples for each clique family to reliably collect GLCHs and estimate energies.

Let the width and the height of a search window $ \mathcal{W}$ be denoted $ \vert\mathcal{W}\vert _x$ and $ \vert\mathcal{W}\vert _y$ respectively. A clique family $ \mathbf{C}_a$ in a texture has the maximum displacement of $ \frac{\vert\mathcal{W}\vert _x}{2}$ and $ \frac{\vert\mathcal{W}\vert _y}{2}$ along $ x$ and $ y$ axes respectively, so that,

$\displaystyle \mathbf{C}_a=\left\{(i,j):i,j \in \mathbf{R};
i-j=(\delta{x_a},\d...
...W}\vert _x}{2},\vert\delta{y_a}\vert<\frac{\vert\mathcal{W}\vert _y}{2}\right\}$

Formally, an MBIM is a 2D function, $ f: \mathcal{W} \rightarrow
\mathbf{E}$, which maps a search window $ \mathcal{W}$ onto real-valued partial energies $ \mathbf{E}$. A two-step procedure is involved in computing the partial energy for each clique family. First, a GLCH matrix is collected for the family, by convolving the displacement vector with the image. Second, the partial energy is computed by Eq. (6.3.4) from the collected statistics. A generic MGRF model only counts on the relative frequency, $ F_a(q,s;q,s\in \mathbf{Q} \mid {g}^{\circ})$, of co-occurrence of every two grey levels $ q$ and $ s$, $ q,s \in
\mathbf{Q}$, so the resulting matrix has a dimensionality of $ \vert\mathbf{Q}\vert \times \vert\mathbf{Q}\vert$. To have statistically meaningful estimates for small training images (e.g., $ 128\times 128-256 \times
256$) and also due to computational restrictions, a texture is typically converted into a 16-level gray-scale image before further processing [38].


\begin{algorithm}
% latex2html id marker 3758\caption{Computing an MBIM}
\beg...
...\varepsilon}_{a}^{\circ}$}
\par
\ENDFOR \ENDFOR
\end{algorithmic}\end{algorithm}

Since the relative energy is sufficient for ranking clique families, the first approximation of relative partial energy $ \mathsf{\varepsilon}_{a}^{\circ}$ in Eq. (6.3.4) is used to approximate the `real' partial energy $ E_a$ in forming an MBIM. The procedure of computing the MBIM is outlined by Algorithm 5.

An MBIM is symmetric about the origin $ (0,0)$, because a clique family $ \mathbf{C}_{\delta{x_a},\delta{y_a}}$ has the same pixel interaction as the clique family $ \mathbf{C}_{-\delta{x_a},-\delta{y_a}}$. It also should be noted that there is no such a clique family with a zero displacement vector $ (0,0)$.

The major computation in constructing an MBIM comes from convolution operation in collecting the GLCH statistics for each clique family of the image. The process involves quadratic time complexity of $ O(\vert\mathbf{R}\vert\cdot \vert\mathcal{W}\vert)$, which depends on the sizes of both the searching window and the training image.

An MBIM can be represented graphically either by a 2D bitmap or on a 3D surface, in which each coordinate $ (\delta{x_a},\delta{y_a})$ represents a clique family $ \mathbf{C}_{\delta{x_a},\delta{y_a}}$, and the corresponding scalar value represents the partial energy $ E_a$ of that family. Figure 6.2 shows a few examples of 2D representation of MBIMs by grey-level images, with the partial energy encoded by intensity, i.e. dark intensities correspond to higher-energy clique families, light intensities correspond to lower-energy clique families. In Figs 6.3 and 6.4, the MBIMs are rendered on a 3D surface, where the $ z$ values represent real-valued partial energies.

An MBIM reveals several important properties of the clique families in a texture, which helps to recover a characteristic neighbourhood and identify texels from a texture. Experimental comparisons have shown that the more accurate potential estimates in Appendix A produce quite similar MBIMs to those with the potentials of Eq. (6.3.3). For simplicity, below, only the latter estimates are used.

Energy Distribution.

The vast majority of the clique families in the MBIM have low energies indicating almost independent pixels and only a small characteristic group with higher energies actually impacts upon the spatial pattern of the texture. Energy histograms in Fig 6.6 show the non-characteristic majority of the clique families form a dominant peak whereas a relatively small characteristic families form a high-energy "tail". The latter has to be separated from the dominant part in order to estimate $ \mathcal{N}$.

Figure 6.5: The spatial patterns in an MBIM are related to a underlying lattice of the periodicity structure of a texture. (a) Texture D06. (b) The MBIM for the texture. (c) Underlying lattice formed by the spatial distribution of characteristic clique families in the MBIM. (d) Superimpose the lattice over the original texture, showing that the lattice is in line with texture's periodicity structure.
\includegraphics[scale=0.9]{mbim-grid.eps}

Structural Property.

An MBIM displays clusters formed by the small group of high-energy clique families. The clusters are distributed in line with particular spatial structures with obvious correspondence to periodicity structure of related textures. Stochastic textures, like D4, D12, D24, and D66 in Fig 6.2, have only one dominant central cluster indicating that only short-range pixel interactions dominate these textures. In contrast, regular textures, like D6, D20, D52 and D76 in Fig 6.2, have also a number of prominent, peripheral clusters spatially distributed in a structured and repetitive manner, reflecting the underlying translation lattice of pixel interaction. As further illustrated in Fig 6.5, for regular texture D34, the spatial distribution of clusters in the MBIM is closely related to the periodicity structure appeared in the image.

Characteristic Neighbourhood

The characteristic neighbourhood for a generic MGRF model consists of a group of the most energetic clique families and represents most probably pixel interactions in a texture.

The simplest approach to estimating the structure is to select most energetic clique families by using a threshold of partial energy [37,36]. But the selection of threshold is largely heuristic and the statistical interplay between clique families have not been taken into account.

An alternative sequential approach in [107] selects each next characteristic clique family by comparing the training GLCHs to those for an image sampled from the MGRF with the currently estimated neighbourhood. Since each step involves image sampling (generation) and re-collection of the GLCHs, the process is very computationally complex, i.e. to find a neighbourhood structure $ \mathcal{N}$, the time complexity is $ O(\max\{\vert\mathcal{R}\vert\vert\mathcal{N}\vert T,
\vert\mathcal{R}\vert\vert\mathcal{W}\vert\})$ where $ T$ is the expected number of steps, which, at least, theoretically $ T$ grows exponentially with $ \vert\mathcal{R}\vert$ although it is limited to a few hundred steps empirically to generate a single MGRF sample using the MCMC process of pixel-wise stochastic relaxation.

The structural identification simplifies selection of the characteristic neighbourhood due to the observation that most energetic clique families form isolated clusters (or blobs) in the MBIMs shown in Figs 6.3 and 6.4. From the statistical point of view, the clique family having maximum energy in each cluster is related to locally the most probable pixel interaction. Pixel interactions represented by other clique families in the vicinity are also likely but with lower probability. This indicates variations of neighbourhood structure at different locations of an image. The size and shape of the clusters reflect the degree of local variations and could be used to measure texture homogeneity, i.e. a strict homogeneous texture should have a smaller cluster.

It is reasonable to use the local maximum of each cluster as a representative clique family for a local region. A two-step approach is proposed to find these local maxima. First, all the significant clusters are identified from the MBIM, and second, the clique family having the maximal partial energy in each cluster is added into the characteristic neighbourhood.

Most MBIMs have their corresponding histograms of partial energies as a unimodal distribution, i.e. each histogram is positively skewed and has a single peak at the lower energy end, as in Fig 6.6. This allows to apply a unimodal thresholding algorithm [87] to determine a threshold that separates high-energy clique families from the majority of low-energy ones. Consequently, clusters formed by high-energy families are isolated and can be easily identified by grouping connected clique families using the classic connected-component labelling algorithm [86]. A local maximum is then identified by as the clique family with maximal energy from each cluster.

Figure 6.6: Energy histograms for textures in Figs 6.3 and 6.4.
\framebox{
\includegraphics[scale = 0.35]{d29_hist.eps}
}    \framebox{
\includegraphics[scale = 0.35]{bark09_hist.eps}}
D29 bark0009
\framebox{
\includegraphics[scale = 0.35]{d34_hist.eps}
}    \framebox{
\includegraphics[scale = 0.35]{d101_hist.eps}}
D34 D101

Figure 6.7: Unimodal thresholding [87]. The algorithm computes the distance between every point on the histogram curve to the straight line. The threshold is at the point on the histogram curve where the maximum distance occurs.
\framebox{
\includegraphics[scale=0.5]{unimodal.eps}
}

Unimodal Thresholding

[87] yields an optimal threshold for an adequate separation of a unimodal distribution. No heuristic parameter is involved in the process. In the structural identification, this algorithm is applied to separate a small number of decisive, high energy clique families from the rest ones.

The thresholding algorithm draws a straight line from the peak of the histogram to the last non-empty bin. The threshold is at the point on the histogram curve having the maximal distance to the straight line. The algorithm is illustrated in Fig 6.7.

Connected-Component Labelling

[86] is applied after unimodal thresholding to identify all the clusters of spatially connected clique families. The unimodal thresholding algorithm converts an MBIM into a binary image, e.g., by assigning the clique families with the partial energy above the threshold with `1's, and with `0's otherwise. With a rather aggressive threshold, the regions with `1's are expected to be well isolated from the background with `0's. This allows the labelling algorithm to identify the clusters by sequentially scanning the binary image. The labelling algorithm scans a binary image twice in a raster scanline order, e.g, from left to right and from top to bottom of the image. In the first pass, the algorithm tags each `1' pixel with either an old label if it is connected to any pixel with the label or a new label otherwise. In the second pass, the algorithm merges equivalent labels, i.e. different labels assigned to connected pixels, and discovers possible new connections. As the result of the algorithm, each image region for a cluster is identified by a distinguishing label. The connectivity could be 8- or 4-nearest neighbourhood in the simplest case. This two-pass labelling algorithm might be neither space or time efficient especially for large images [70], but it is proved to be sufficient for usually small sized MBIMs. Smoothing operations might be applied before and after clustering in order to remove noisy or spurious clusters. The pseudo code for a single step of the labelling algorithm is outlined in Algorithm 6.

Figure 6.8: Detecting connected components by sequential scans.
\includegraphics[scale=0.6]{connectedcomp.eps}


\begin{algorithm}
% latex2html id marker 3886\caption{Connected Component Labe...
...,y)$ and $(x,y-1)$ are not
identical.} \ENDIF
\end{algorithmic}\end{algorithm}

Geometric Structure of Texels

For simplicity, the structural identification assumes a uniform geometric structure for all texels in a texture, i.e. the same pixel interactions are applicable at every location in the image. The geometric structure is statistically the most probable one, i.e. the central tendency if consider a distribution of all possible texel's structures. This simplification is implied by the assumption of translational-invariant pixel interaction in a generic MGRF model.

The geometric structure of texels in a texture is derived from the characteristic neighbourhood consisting of local maxima of clusters. Structures for regular and stochastic textures are specified in separate ways, due to the different nature of two texture classes.

Geometric Structure of Texels for a Regular Texture.

In the MBIM for a regular texture, the clusters closest to the MBIM's origin relate to the ``primary'' pixel interactions. The group of these ``primary'' clusters is related to a single repetition of the periodicity structure of a texture. The other clusters relate to ``secondary'' interactions that appear due to statistical interplay of the ``primary" ones. Therefore, the group of primary clusters specifies desired geometric structure of texels for a regular texture. The resulting texel has a rather simple structure with only a few pixels, e.g., six pixels for the texture D34. Such a simple texel is not particularly meaningful by itself and even is not noticeable in a texture. However, multiple occurrences of these simple structures in accord with the guiding grid produces various textures similar to the training image.

Geometric Structure of Texels for a Stochastic Texture.

In this case, the shape of the single central cluster defines the desired structure. Each such texel behaves like a seed, and a texture is generated by randomly scattering the seeds over the image lattice.

Figures 6.3 and 6.4 schematically illustrate several texels for stochastic and regular textures, respectively, where each texture has a distinguishing geometric structure of texels corresponding to the spatial distribution of clusters in the MBIM.

The geometric structure can be used as a mask for sampling (retrieving) texels from a texture. For example, one can superimpose a mask at an arbitrary location on an image, the group of signals covered by the mask are related to a particular texel for the texture. In a same way, texels of the same structure but different signals are retrieved at different locations from the image. Since in our experiments most texels look like a bunch of image signals, this sampling method is also called in [40] bunch sampling.

Figure 6.9: Parameters, $ \Psi=(\theta_x,\theta_y, m, n)$, of the placement grid for regular texture D34.
\includegraphics[scale=0.4]{d34_grid.eps}

Figure 6.10: Tessellation of regular texture D34 by the placement grid and relative positions of bunches (texels), e.g., $ (0,0)$ for Bunch a and $ (\delta_{x},\delta_{y})$ for Bunch b.
\includegraphics[scale=0.3]{d34_tess.eps}

Placement Rule for Texels

Repetitions of various texels create periodicity, structure and symmetry in a texture. A special placement rule is derived to model spatial relationship between texels, i.e. how the image locations of multiple occurrences of a single texel or different texels are related in forming a texture pattern.

In a stochastic texture, texels appear with no explicit placement rules and have mostly weak spatial interrelations. As shown later, a synthetic texture of this class can be formed by sampling, repeating and randomly placing of the distinct training texels.

In contrast, a regular texture involves strict placement rules reflecting the underlying strong periodicity. Due to the assumption of translation-invariant pixel interaction, only translation symmetry is taken into account in estimating the placement rule for texels for textures in this class.

In a regular texture, there exists a underlying placement grid guiding the repetitions of texels. Each grid cell is a compact bounding parallelogram around a texel mask, specified by parameters $ \Psi=(\theta_x,\theta_y, m, n)$, where $ \theta_x$ and $ \theta_y$ are guiding angles that give orientations of cell sides with respect to the image coordinate axes, and $ m$ and $ n$ are the side lengths, i.e. the mask's spans along the orientational directions. The bounding parallelogram could be decided using an invariant fitting algorithm, which is related to a canonical frame of the texel mask [103]. Figure 6.9 shows a hexagonal texel mask of texture D34, its bounding parallelogram and related parameters $ \Psi$. In general, the grid with parallelogram cells represents any five possible types of translation lattice known in the theory of wallpaper groups for repeated patterns [89].

The placement grid is related to a coordinate system with non-orthogonal basis vectors, which leads to a hexagonal tessellation of the image plane [94]. Figure 6.10 shows a tessellation of texture D34 in line with a placement grid defined by the cell in Fig 6.9.

Figure 6.10 also shows that, with the tessellation, each texel is associated with a relative shift, $ (\delta_{x},\delta_{y})$, of its centre $ (x,y)$, with respect to the origin made coincident with the centre of the closest grid cell:

\begin{displaymath}\begin{array}{ccc} \delta_{x}& = &(y \cdot \sin \theta_y + x ...
...\cdot \cos \theta_y - x \cdot \sin \theta_x)\bmod n \end{array}\end{displaymath} (6.4.1)

The placement rule is to repeat each training texel at arbitrary locations having the same relative shift with respect to the placement grid. Due to an infinite number of absolute image locations related to a same relative shift, the rule reflects translation symmetry of regular textures. It also suggests that an arbitrarily large texture can be synthesised by expanding the image lattice in line with the infinite placement grid.

Comparison and Discussion

The structural identification yields a texel-based description that characterises a texture by texels and placement rules of their arrangement. Recently, several similar works describing a texture by its primitive elements have been proposed. However, instead of `texel', the majority of the known works use an alternative term `texton', suggested by Julesz, to refer to small objects or characteristic regions that comprise a texture. Research shows that only the difference in textons or in their density can be detected pre-attentively by human early visual system [54].

Motivated by Julesz's texton theory, a few recent works [63,71,100,101,102] employ a filter-based spectral analysis technique to relate textons to the centres of clusters in filter responses of a stack of training images. Conceptually, each texton, as a feature descriptor in the spectral domain, represents a particular statistical spectral feature describing repetitive patterns of a texture in the spatial domain. All the textons form a global texton dictionary, or a feature space, allowing to characterise a texture by an empirical probability distribution of the textons, i.e. the frequency with which each texton in the dictionary occurs in the texture. A nearest-neighbour classifier with similarity metrics based on chi-square distance between texton distributions, can classify textures into different categories. But since only the occurrences of a texton are taken into account, spatial information about relationships between the textons is completely lost in the description.

A texton-based generative model of image in [112] contains local constructs at three levels: pixels, image bases, and textons. An image base is a group of pixels forming a micro geometric element like a circle or a line. In this case, a texton is actually a mini-template consisting of a number of image bases of some geometric and photometric configurations. In an image, these textons are meaningful objects like stars or birds that could be observed. The probability model with parameters $ \Theta=\{\Psi,\Pi,\kappa\}$ is specified as follows:

$\displaystyle \Pr(g^\circ;\Theta)=\int{\Pr(g^\circ\vert\mathbf{B};\Psi)\Pr(\mathbf{B}\vert\mathbf{T};\Pi)\Pr(\mathbf{T};\kappa)d\mathbf{B}d\mathbf{T}}$ (6.4.2)

where $ \Psi$ and $ \Pi$ denote global base and texton maps containing all the image bases or textons, respectively, in the entire configuration space of images, $ \mathbf{B}$ and $ \mathbf{T}$ are the base and textons maps, respectively, specific to a particular image $ g^\circ$, and the probability distribution $ \Pr(\mathbf{T};\kappa)$ accounts for the textons and their spatial relationships in the image $ g^\circ$.

The MLE of the model parameters $ \Theta$ or the estimates minimising the Kullback-Leibler divergence are learned using a data-driven MCMC algorithm. Due to the complex likelihood function, the experiments were limited in several aspects in order to keep the problem tractable: (i) only a small number of image bases in the global base map, e.g., a few Laplacian-of-Gaussian and Gabor base functions; (ii) the spatially independent textons for simplicity, and (iii) only very simple textures with a priori obvious image bases and textons.

In both above mentioned and most of other known texton-based approaches, spatial relationship among the textons is either neglected or otherwise too difficult to represent. In contrast, the proposed structure identification provides a much simpler but yet more complete texture description.


next up previous
Next: Summary Up: Generic Markov-Gibbs Model and Previous: Model Identification
dzho002 2006-02-22