next up previous
Next: Today's Methods of Texture Up: Texture Analysis and Synthesis: Previous: Texture Analysis: an Overview

Subsections

Markov-Gibbs Random Field Models of Textures

A wide range of MGRF models have been proposed [6,45,20,34,13,7,75,37,113] over the last several decades. Essentially, an MGRF model considers an image as a realisation of a Markov random field (MRF). A major advantage of MGRF models is that they allow to derive a global texture description by specifying local properties of textures. In this section, fundamental theories of MRFs, Gibbs distribution, and MGRF-based texture modelling are presented.

Images and Probabilities

Consider a two dimensional $ M\times N$ lattice $ \mathbf{R}=\{i=(x,y):0\leqslant x \leqslant M-1;0\leqslant y
\leqslant N-1\}$. If each site $ i \in \mathbf{R}$ is associated with a random variable $ \mathbf{S}_{i}\in\mathbf{Q}$, one obtains a random field, $ \mathbf{S}=\{\mathbf{S}_{i}, i \in
\mathbf{R}\}$ [34]. A digital image $ {g}=\{{g}_i: i \in \mathbf{R}\}$ could be considered as a realisation or a random sample of the random field $ \mathbf{S}$, i.e. a particular assignment of a set of signal values to all sites of the lattice $ \mathbf{R}$ [41].

Since the random variables in a random field are considered statistically dependent, a joint probability distribution of random variables for an image $ {g}$ defines a probability model of the image from the statistical viewpoint. The joint distribution is denoted by $ Pr(S={g})$ or simply $ Pr({g})$, representing the chance to observe the image given the configuration space  $ \mathcal{S}$ of all images associated with the random field.

A desired joint probability distribution for an image model should yields mostly or almost zero probability for all the images except a subset-of-interest around a given training image. Formally, the distribution function is expected to be similar to $ \delta$-function having its maximum mode on or in a close vicinity of the training image.

If no prior knowledge is available, a probability model must enumerate all possible labelling of the lattice in order to derive a joint distribution. This is an intractable task, because the configuration space $ \mathcal{S}$ has combinatorial cardinality, $ \vert\mathcal{S}\vert = {Q}^{\vert\mathbf{R}\vert}$. If each site is assumed to be independent of any others, i.e. an independent random field (IRF), a probability model is specified by the joint probability distribution, $ Pr({g})=\frac{1}{{Q}^{\vert\mathbf{R}\vert}}$. The probability is uninformative and the resulting image model is not particular useful.

Most real-life probability models assume certain local dependency among random variables/image pixels. A conditional probability of a pixel $ {g}_i$ given all other pixels $ {g}^i$, denoted by $ Pr({g}_i \mid
{g}^i)$, is commonly exploited to represent local dependency. In this case, a random field is represented by a undirected graph $ \mathbf{G}$ in which each graph vertex represents a random variable while an edge represents statistical dependency between two connected vertices. The absence of an edge between two vertices in the graph implies that the associated random variables are conditionally independent given all other random variables.

Within such a graphical structure, a probability model for a random field could be factorised into a normalised product of strictly positive, real-valued potential functions defined on subsets of connected random variables in $ {Q}^{\vert\mathbf{R}\vert}$. Each subset form a local neighbourhood system with a spatial configuration, and the image corresponds to a global configuration of all random variables. A potential function has no probabilistic meaning other than representing constraints on the local configuration on which it is defined [60]. This graphical representation indicates that the probability distribution of an image can be specified in terms of potential functions of local neighbourhood, which reduces the complexity of a global image modelling problem. This is the main idea of applying MRF models for texture modelling.

Markov-Gibbs Random Fields

MGRF-based texture models have been successfully applied into texture analysis [6,45,20,34,13,7,75,37,113], texture classification and segmentation [23,24,73], and texture synthesis [29,104,65,59]. In this section, definitions and general theories related to an MGRF are introduced.

Markov Random Field: the Definition

Define a neighbourhood $ \mathcal{N}_i \subset \mathbf{R}^i$ as the set of all neighbouring sites of a site $ i \in \mathbf{R}$. A random field is an MRF, if
  1. The joint probability distribution $ Pr({g})$ is strictly positive for each sample in the configuration space $ \mathcal{S}$, $ Pr({g})>0$, and
  2. For each site $ i \in \mathbf{R}$,

    $\displaystyle Pr({g}_i \mid {g}^i)= Pr({g}_i \mid {g}_{j}:j \in \mathcal{N}_i)$ (5.2.1)

Here, the first condition is known as the positivity of an MRF, which is trivial. The second condition is known as Markov property or Markovianity. The equivalence, between the conditional probability $ Pr({g}_i \mid
{g}^i)$ of the signal value $ {g}^i$ given all the other signal values $ {g}^i=\{{g}_j :j
\in \mathbf{R}^i= \mathbf{R}-\{i\} \}$ and the conditional probability $ Pr({g}_i \mid {g}_{j}:j \in
\mathcal{N}_i)$ of signal value $ {g}^i$ given all signal values of the neighbouring sites $ \mathcal{N}_i$, suggests that the pixel value at a site in the image is only dependent of that of its neighbouring sites.

The Markov property describes conditional dependence of image signals within a local neighbourhood system, which is proved to be useful in capturing and modelling usually highly localised and coherent texture features. Another appealing property of an MRF is that, by the Hammersley-Clifford theorem [42], an MRF can be characterised by a global Gibbs distribution.

Gibbs Distribution

In general, a Gibbs distribution takes the following form,

$\displaystyle Pr({g})=\frac{1}{Z} \exp\left \{-E({g})\right \},$ (5.2.2)

$\displaystyle Z=\sum_{{g} \in \mathcal{S}}\exp\left \{-E({g})\right \}$ (5.2.3)

where $ Z$ and $ E({g})$ are the partition function and the energy function, respectively.

The partition function is also known as the normalising constant. Because it is defined over the entire configuration space, the partition function is intractable. This would arise difficulties in parameter estimation of the density function of a Gibbs distribution. A random field having a Gibbs distribution is called a Gibbs random field (GRF).

A Gibbs distribution is usually defined with respect to cliques, i.e. proper subgraphs of a neighbourhood graph on the lattice. A clique is a particular spatial configuration of pixels, in which all its members are statistically dependent of each other. All the statistical links between pixels in an image constitute a neighbourhood graph or system denoted by $ \mathcal{N}$. Cliques and the neighbourhood system model spatial geometry of pixel interactions, which are particularly useful in texture modelling. All spatial configurations of cliques related to the 8-nearest neighbourhood system are shown in Fig 5.2. Formally, a clique is defined as follows,

Definition 5.2.1   Given a neighbourhood system $ \mathcal{N}$, a clique $ C
\subset \mathbf{R}$ is either a single site, or a subset of sites in which each pair of distinct sites is the neighbour of each other.

Figure 5.2: Cliques in an 8-nearest neighbourhood system.
\includegraphics[width=4in]{cliques.eps}

The number of pixels in a clique specifies the statistical order of a Gibbs distribution and the related texture model . For texture modelling, first-order and second-order cliques are frequently used, because higher-order cliques causes too much computational complexity. First-order and second-order cliques over the image lattice $ \mathbf{R}$ are denoted as by $ \mathbf{C}_1$ and $ \mathbf{C}_2$, respectively:

$\displaystyle \mathbf{C}_1\equiv\mathbf{R} =\{i:i\in \mathbf{R}\}$

$\displaystyle \mathbf{C}_2=\{(i,j):i,j\in \mathbf{R}, i\neq j, j \in \mathcal{N}_i  i \in \mathcal{N}_j\}$

Statistical dependence between pixels within a clique is also referred to pixel interaction. Let a non-constant function $ \emph{V}_C({g}_i:i\in C)$ denote the strength of the pixel interaction in a clique $ C$. The function $ \emph{V}$ is also known as clique potential. The energy function $ E({g})$ in Eq. (5.2.2) could be factorised into the sum of clique potentials $ \emph{V}_C({g}_i:i\in C)$ over the set $ C$, so that

$\displaystyle E({g})=\sum_{C \in \mathbf{C}}V_C({g}_i: i \in C)$

Hence, Eq. (5.2.2) can be rewritten in terms of clique potentials [41],

$\displaystyle Pr({g})=\frac{1}{Z} \exp \left \{-\sum_{C \in \mathbf{C}}V_C({g}_i: i \in C) \right\}$ (5.2.4)

Markov-Gibbs Equivalence

Hammersley-Clifford Theorem asserts that a random field is a Markov random field if and only if the corresponding joint probability distribution is a Gibbs distribution [42]. The theorem, also known as MRF-Gibbs distribution equivalence, has been proved by Grimmett [41], Besag [6], and Gemans [34] respectively.

The theorem establishes the equivalence between an MRF and a GRF, which provides a factorisation of a global probability distribution $ Pr({g})$ in terms of local interaction functions with respect to cliques. Particularly, the joint probability $ Pr({g})$ and the conditional probability $ Pr({g}_i \mid
{g}^i)$ can specify each other.

By the Hammersley-Clifford Theorem, the conditional probability of an MRF $ Pr({g}_i \mid
{g}^i)$ can be specified in following form [34], in terms of clique potential, $ V_c$.

$\displaystyle Pr({g}_i \mid {g}^i) = \frac{1}{Z_i} \exp \left \{\sum_{\emph{C} \in \mathbf{C}} V_c({g}_i,{g}_j:i,j \in \emph{C};i\neq j) \right \}$ (5.2.5)

$\displaystyle Z_i=\sum_{{g}_i \in \mathbf{Q}} {\exp \left \{ \sum_{\emph{C} \in \mathbf{C}} V_c({g}_i,{g}_j:i,j \in \emph{C};i\neq j) \right \}}$ (5.2.6)

Similarly, the clique potential $ V$ of the Gibbs distribution can also be derived from the conditional probability $ Pr({g}_i \mid
{g}^i)$ of an MRF but in a more complicated form. An MRF is uniquely related to a normalised potential, called the canonical potential [64].

In summary, Markov-Gibbs equivalence provides a theoretical basis for texture modelling based on MGRFs, which allows to employ the global characteristics in modelling while the local characteristics in processing a field [34].

Texture Modelling Based on MGRFs

Texture modelling based on MGRFs is mainly about to choose ``appropriate potential functions for desired system behaviour'' [64]. In a typical modelling scenario, an MRF is defined over the image lattice to encapsulate and represent domain knowledge and observations about the texture to be modelled, e.g., by specifying a neighbourhood system and related conditional probabilities. Then, by Markov-Gibbs equivalence, a Gibbs distribution of the MRF is derived as a texture model. The resulting MGRF model is often parameterised by a function, consisting of two basic components, namely, observable variables and model parameters. The quality of a model depends on how well two components are represented and integrated.

The parametric form of the Gibbs distribution in Eq. (5.2.2) with parameters $ \Lambda$ is given as follows:

$\displaystyle Pr({g} \mid \Lambda)=\frac{1}{Z} \exp \left \{-\sum_{C \in \mathbf{C}}V_C({g}_i:i \in C \mid \Lambda) \right\}$ (5.2.7)

Here, $ Pr({g} \mid \Lambda)$ is called likelihood of $ \Lambda$ with respect to samples.

Observable Variables

are quantitative measures of meaningful observations, i.e. texture features, from an image. In most cases, the observable variables are assumed to be sufficient statistics of an MGRF model, which summarise all relevant information retrieved from a training texture. Textures having the same sufficient statistics are indiscriminate with respect to the model. A relatively small set of sufficient statistics among the many that constitute the sample enables a compact texture model, but it also suggests the importance of selecting and validating appropriate sufficient statistics in modelling.

Model Parameters

characterise the distribution of observed variables from an image, which are to be estimated in order to determine the posterior probability distribution for the texture. The process of parameter estimation fits the model into a sample texture, which is also known as model identification. Recall the Gibbs distribution and the partition functions in Eqs. (5.2.2) and (5.2.3), the mathematical structure of model function is non-trivial, because, first, the Gibbs distribution involves a multivariate probability density, and second, the partition function $ Z$, defined over the entire configuration space of the distribution, is intractable both analytically and numerically. Therefore, it is inevitably difficulty to directly estimate parameters for an MGRF model. In statistics, such a problem of learning parameters of an unknown probability distribution from observed data generated by that distribution is solved in a framework of probabilistic inference by methods like Gibbs sampling [34].

Model Identification

Model identification is usually formulated as a global optimisation problem that maximises either the entropy or the likelihood of probability distributions.

Maximum Entropy Principle

If domain knowledge and observed statistics are represented as a set of constraints, Maximum entropy (MaxEnt) principle could be applied in estimating parameters of model distributions.

The MaxEnt principle states that a desired prolixity distribution should be assigned by maximising the entropy while subject to a set of constraints [52,60]. The entropy of a probability distribution measures uncertainty [92], which is denoted by:

$\displaystyle H(Pr) = - \sum_{i}{Pr_i \log (Pr_i)}
$

The MaxEnt principle is formulated as an optimisation problem that maximises $ H(Pr)$,

$\displaystyle Pr^*=\displaystyle\operatornamewithlimits{arg max}_{Pr\in \mathcal{P}}{H(Pr)}$ (5.2.8)

subjecting to constraints,

$\displaystyle \mathds{E}_{Pr}{(f_i)}=\mathds{E}_{{Pr}^{\ast}}(f_i):\forall{f_i}\in {F=\{f_i:i=1..n\}}$ (5.2.9)

Here, $ \mathcal{P}$ denotes a set of allowed probability distributions, $ \mathds{E}_{Pr}(f_i)$ is the mathematical expectation of a feature $ f_i$ with respect to the model distribution $ Pr$, and $ \mathds{E}_{{Pr}^{\ast}}(f_i)$ is the empirical feature value computed from the training sample that can be considered also as the expectation of a feature $ f_i$ with respect to the empirical distribution $ {Pr}^{\ast}$.

This constrained optimisation problem can be solved by maximising following Lagrangian,

$\displaystyle \Psi(Pr,\Lambda)= H(Pr)+\sum_{i}\lambda_i({\mathds{E}_{Pr}(f_i)-\mathds{E}_{{Pr}^{\ast}}(f_i)})$ (5.2.10)

where $ \Lambda=\{\lambda_i:i=1..n\}$ are the Lagrange factors.

The resulting MaxEnt distribution is as follows,

$\displaystyle Pr=\frac{1}{Z(\Lambda^*)}\exp\left(\sum_{i}{\lambda^*_i\cdot f_i}\right)$     (5.2.11)
$\displaystyle Z({\Lambda}^*)=\sum_{Pr \in
\mathcal{P}}\exp\left(\sum_{i}{\lambda^*_i\cdot f_i}\right)$     (5.2.12)

Here, $ Z(\Lambda^*)$ is the normalised constant. The obtained parameters $ \Lambda^*=\{\lambda^*_i:i=1..n\}$ maximises the Lagrangian $ \Psi(Pr,\Lambda)$ in Eq. (5.2.10) and therefore the entropy $ H(Pr)$. Without any constraints, the MaxEnt yields a uniform distribution giving equal probabilities to all possible images. Otherwise the MaxEnt yields the closet one to the uniform distribution, subject to the constraints. Since images are modelled in terms of a set of sufficient statistics, the constraints ensure the empirical feature values for the training data [60].

Similar forms of the probability distributions in Eqs. (5.2.7) and (5.2.11) suggest that the Gibbs distribution of an MGRF belongs to the exponential families and is an MaxEnt distribution. Actually, all distributions belonging to so-called exponential families [2] possess the MaxEnt property. In particular, a generic MGRF model is specified by an MaxEnt distribution constrained by the training sufficient statistics, i.e. co-occurrence frequency distributions.

The MaxEnt principle suggests an alternative approach to texture modelling. One can first select sufficient statistics to characterise textures and then infer the MaxEnt distribution subject to the constraints that the marginal probability of sufficient statistics with respect to the model should match the empirical probability of the same statistics in the training data. As introduced later, both the generic MGRF [37,36] and FRAME [113] models use this strategy to derive their MaxEnt distributions. But the former derivation refers to the exponential families, while the latter approach does not mention the exponential families at all and derives their sufficient features from the scratch.

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is one of the most commonly used procedure for estimating unknown parameters of a distribution. MLE is based on likelihood principle which suggests model parameters should maximising the probability of occurrence of the observation, e.g., a training image, given the configuration space.

Given the parametric form of a Gibbs distribution $ Pr({g}\vert\Lambda)$ in Eq. (5.2.7), the likelihood function for $ \Lambda$ is given by,

$\displaystyle \mathcal{L}(\Lambda \mid {g})\equiv Pr({g}\vert\Lambda)
$

The maximum likelihood estimator is defined by,

$\displaystyle {\Lambda}^{*}=\displaystyle\operatornamewithlimits{arg max}_{\Lambda}{\mathcal{L}(\Lambda
\mid {g})}
$

If $ \mathcal{L}(\Lambda\mid{g})$ is differentiable, the parameters $ {\Lambda}^{*}$ for the MLE, given a training image $ {g}^{\circ}$, is computed by solving

$\displaystyle \frac{\partial{\mathcal{L}(\Lambda \mid {{g}^{\circ}})}}{\partial
{\Lambda}}=0
$

Usually, the following log-likelihood function in line with Eq. (5.2.7) is used in the differentiation,

$\displaystyle \mathcal{L}(\Lambda \mid {g} )=-\log{Z(\Lambda)}-\sum_{C \in \mathbf{C}}V_C({g}_i^{\circ}:i \in C \mid \Lambda)$ (5.2.13)

However, due to the intractable normalisation constant $ Z(\Lambda)$ in Eq. (5.2.13), numerical techniques such as Markov Chain Monte Carlo (MCMC) algorithms have to be used for estimating $ {\Lambda}^{*}$. As shown in [4], the MaxEnt method is equivalent to the MLE in estimating distributions in exponential families and therefore Gibbs distributions.

Parameter Estimation via MCMC

An MCMC algorithm allows to simulate a probability distribution by constructing a Markov chain with the desired distribution as its stationary distribution. A Markov chain, defined over a set of sequential states, is an one-dimensional case of an MRF. Each state of the chain relates to a particular assignment of all involves random variables on which the stationary distribution is defined. The transition from a state $ t$ to $ t+1$ is controlled by the transition probability, $ Pr({g}^{[t]} \mid {g}^{[t-1]})$, indicating Markov property of the chain,

$\displaystyle Pr({g}^{[t]} \mid {g}^{[u]}, u \neq t)=Pr({g}^{[t]} \mid {g}^{[t-1]})$ (5.2.14)

The MCMC algorithm iteratively updates the Markov chain based on the transition probability from a state to another state. Eventually, the chain attains the state of equilibrium when the joint probability distribution for the current state approaches the stationary distribution. The parameters that leads to the stationary distribution are considered as the model parameters learnt for the particular training image.

In the case of texture modelling, each state of an Markov chain relates to an image as a random sample drawn from the joint distribution. The image obtained at the state of equilibrium is expected to resemble the training image. In other words, parameter estimation via MCMC algorithms simulates the generative process of a texture and therefore synthesises textures.

Each state of a Markov chain is obtained by sampling a probability distribution. Among various sampling techniques, Metropolis algorithm [77] and Gibbs sampler [34] are two most well-known ones. Metropolis algorithm provides a mechanism to explore the entire configuration space by random walk. At each step, the algorithm performs a random modification to the current state to obtain a new state. The new state is either accepted or rejected with a probability computed based on energy change in the transition. The states generated by the algorithm form a Markov chain. Algorithm 1 outlines a basic Metropolis algorithm.


\begin{algorithm}
% latex2html id marker 1438\caption{Metropolis Algorithm}
\b...
...w {g}^{\ast}$. \UNTIL{equilibrium is attained.}
\end{algorithmic}\end{algorithm}

Gibbs sampler is a special case of the Metropolis algorithm, which generates new states by using univariate conditional probabilities [64]. Because direct sampling from the complex joint distribution of all random variables is difficult, Gibbs sampler instead simulate random variables one by one from the univariate conditional distribution. A univariate conditional distribution involves only one random variable conditioned on the rest variables having fixed values, which is usually in a simple mathematical form. For instance, the conditional probability $ Pr({g}_i\vert{{g}^i})$ for an MRF is a univariate conditional distribution. A typical Gibbs sampling algorithm [34] is outlined in Algorithm 2.


\begin{algorithm}
% latex2html id marker 1466\caption{Gibbs Sampling Algorithm...
...^i})$. \ENDFOR \UNTIL{equilibrium is
attained.}
\end{algorithmic}\end{algorithm}

In summary, an MCMC method learns a probability distribution $ Pr({g})$ by simulating the distribution via a Markov chain. The Markov chain is specified by three probabilities, an initial probability $ Pr({g}^{[0]})$, the transition probability $ Pr({g}^{[t]} \mid {g}^{[t-1]})$, and the stationary probability $ Pr({g})$. The MCMC process is specified by the following equation,

$\displaystyle Pr({g}^{[n]}) = Pr({g}^{[0]})\prod_{t=1}^{n} Pr({g}^{[t]} \mid {g}^{[t-1]})$ (5.2.15)

Auto-models

Auto-models [6] are the simplest MGRF models in texture analysis, which are defined over rather simple neighbourhood systems with ``contextual constraints on two labels'' [64]. The Gibbs energy function for an auto-model only involves first- and second-order cliques, having the following general form [25],

$\displaystyle E({g})=\sum_{i \in \mathbf{R}}{\mathbf{\alpha}_i({g}_i) \cdot {g}_i}+ \sum_{(i,j) \in C}\beta_{ij} \cdot {{g}_i}{{g}_j}$ (5.2.16)

where $ \mathbf{\alpha}_i$ is an arbitrary functions, $ \beta_{ij}$ specifies the interaction between sites $ i$ and $ j$, and $ C$ is the set of all second order cliques. The model parameters to be estimated are $ \{\mathbf{\alpha}_i({g}_i)\}$ and $ \{\beta_{ij}\}$. Due to their simplicity, auto-models involve relatively low computational cost in model creation and identification.

There are several variations of auto-models, such as auto-binomial model [20] and auto-normal model [13,18], each of which assumes slightly different models of pair-wise pixel interaction.

Auto-binomial Model

assumes the conditional probability $ Pr({g}_i \vert {g}^i)$ of a pixel is a binomial distribution. That is, a pixel $ {g}_i$, taking values from the set $ \mathbf{Q}$, is binomial distributed over $ Q$-times independent random trials, with the probability of success, $ q$, being,

$\displaystyle q=\frac{\exp\{\alpha_i+\sum_{j \in \mathcal{N}_i
}\beta_{ij}{{g}_j}\}}{1+\exp\{\alpha_i+\sum_{j \in \mathcal{N}_i
}\beta_{ij}{{g}_j}\}}$

.

Hence, the conditional distribution is given by,

$\displaystyle Pr({g}_i \mid {g}^i)=\mathcal{C}_{Q}^{{g}_i}q^{{g}_i}(1-q)^{Q-{g}_i}$ (5.2.17)

By Markov-Gibbs equivalence, the Gibbs distribution is specified by the energy function,

$\displaystyle E({g})=\sum_{i \in \mathbf{R}} \ln \left(\begin{array}{c}{{g}_i}\...
...\in \mathbf{R}} {\alpha}_i{{g}_i}+ \sum_{\{i,j\} \in C}\beta_{ij}{{g}_i}{{g}_j}$ (5.2.18)

In an auto-binomial model, model parameters are the sets $ \{\alpha_i\}$ and $ \{\beta_{ij}\}$. The set $ \{\alpha_i\}$ has a fixed size, but the size of the set $ \{\beta_{ij}\}$ is related to the order of the selected neighbourhood system. To simplify parameter estimation, one could reduce the number of parameters, for instance, by assuming a uniform value $ {\alpha}$ for all the sites and using only the simplest neighbourhood system, i.e  4- or 8-nearest neighbours.

Auto-normal Models

[13,18] assume that the conditional probability $ Pr({g}_i \vert {g}^i)$ of a pixel is a normal distribution over an $ N \times N$ neighbourhood, which is specified as follows,

$\displaystyle Pr({g}_i \mid {g}^i)=\frac{1}{\sqrt{2\pi{\sigma}^2}}\exp\{-\frac{1}{2{\sigma}^2}[{g}_i-\mu({g}_i \mid {g}^i)]^2\}$ (5.2.19)

Here, $ \mu({g}_i \mid {g}^i)=\mu_i-\sum_{j
\in \mathcal{N}_i}\beta_{ij}({g}_j-\mu_j)$ is the conditional mean and $ \sigma$ is the conditional variance (standard deviation).

By Markov-Gibbs equivalence, an auto-normal model is specified by the following joint probability density:

$\displaystyle Pr({g})=\frac{\sqrt{\vert\mathbf{B}\vert}}{\sqrt{(2\pi{\sigma}^2)...
...c{1}{2{\sigma}^2}({g}-\mathbf{\mu})^{\mathrm{T}}\mathbf{B}({g}-\mathbf{\mu}) \}$ (5.2.20)

where $ \mathbf{\mu}$ is an $ N \times 1$ vector of the conditional means, and $ \mathbf{B}$ is an $ N \times N$ interaction matrix with unit diagonal elements, the off diagonal element at the position $ (i,j)$ being $ -\beta_{ij}$. Since the model distribution in Eq. (5.2.20) is Gibbsian and also Gaussian, an auto-normal model is also known as a Gaussian-Markov random field model.

An auto-normal model defines a continuous Gaussian MRF. The model parameters, $ \mathbf{B}$, $ \mathbf{\mu}$, and $ \sigma$, can be estimated using either MLE or pseudo-MLE methods [7]. In addition, a technique representing the model in the spatial frequency domain [13] has also been proposed for parameter estimation.

FRAME model

FRAME (short for Filters, Random Fields, and Maximum Entropy) [113] is an MGRF model constructed from the empirical marginal distributions of filter responses based on the MaxEnt principle. The FRAME model is derived based on Theorem. 5.2.1, which asserts that the joint probability distribution of an image is decomposable into the empirical marginal distributions of related filter responses. The proof of the theorem is given in [113].

Theorem 5.2.1   Let $ Pr({g})$ be the $ \vert\mathbf{R}\vert$-dimensional continuous joint probability distribution of a texture. Then $ Pr({g})$ is a linear combination of $ Pr^{(\xi)}$, the latter being the marginal distributions of the linear filter responses $ F^{(\xi)}*{{g}}$.

The theorem suggests that the joint probability distribution $ Pr({g})$ could be inferred by constructing a probability $ Pr^{\ast}({g})$ whose marginal distributions $ Pr^{(\xi)}$ match with the same distributions of $ Pr({g})$. Although, theoretically, an infinite number of empirical distributions (filters) are involved in the decomposition of a joint distribution, the FRAME model assumes that only a relative small number of important filters (a filter bank), $ \mathbf{F}=\{F^{(\alpha)}:\alpha= 1,...,k\}$, are sufficient to the model distribution $ Pr({g})$. Within an MaxEnt framework, by Eq. (5.2.8), the modelling is formulated as the following optimisation problem,

$\displaystyle Pr^{*}({g})= \operatornamewithlimits{arg max}_{Pr} \left \{- \int Pr({g})\log Pr({g})d{g}\right\}$ (5.2.21)

subject to constraints:

$\displaystyle {\mathcal{E}}_{Pr}[\delta(z-{g}^{(\alpha)}(v))]=Pr^{(\alpha)}(z):...
...l z\in \mathbf{Q},\quad \forall \alpha = 1,...,k,\quad \forall v \in \mathbf{R}$ (5.2.22)

and

$\displaystyle \int Pr({g})d{g}=1$ (5.2.23)

In Eq. (5.2.22), $ Pr^{(\alpha)}(z)$ denotes the marginal distribution of $ Pr({g})$ with respect to the filter $ F^{(\alpha)}$ at location $ v$, and by definition,

$\displaystyle Pr^{(\alpha)}_{v}(z)=\int\;\int_{{g}^{(\alpha)}(v)=z}{Pr({g})d{g}}={\mathcal{E}}_{Pr}[\delta(z-{g}^{(\alpha)}(v))]$ (5.2.24)

where $ \delta(.)$ is the Dirac delta function and $ {\mathcal{E}}_{Pr}[.]$ denotes the mathematical expectation of a particular function under the probability $ Pr({g})$. By the assumption of translation invariance, the marginal distribution $ Pr^{(\alpha)}_{v}(z)$ is independent of the location $ v$. Therefore, the first constraints are obtained by replacing $ Pr^{(\alpha)}_{v}(z)$ with $ Pr^{(\alpha)}(z)$ in Eq. (5.2.24).

The second constraint in Eq. (5.2.23) is the normalising condition of the joint probability distribution $ Pr({g})$.

The MaxEnt distribution $ Pr^*({g})$ are found by maximising the entropy using Lagrange multipliers,

  1. Construct a Lagrange function $ \mathcal{L}( Pr({g}))$ as follows:
    $\displaystyle \mathcal{L}( Pr({g}))$ $\displaystyle =$ $\displaystyle -\int Pr({g}) \log Pr({g})\:d{{g}}$  
        $\displaystyle + \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot\left(
{\mathcal{E}}_{Pr}[\delta(z-{g}^{(\alpha)}(v))]-Pr^{(\alpha)}(z)\right)$  
        $\displaystyle + \lambda^{(k+1)}
\cdot \left(\int Pr({g})\:d{{g}} -1\right)$  

  2. Differentiate $ \mathcal{L}( Pr({g}))$ regarding $ Pr({g})$:
    $\displaystyle \frac {\partial \mathcal{L}(Pr({g}))}{\partial Pr({g})}$ $\displaystyle =$ $\displaystyle \frac {\partial \mathcal{L}(Pr({g}))}{\partial
{g}} \cdot \frac{1}{\frac{dPr({g})}{d{g}}}$  
    $\displaystyle % &=& \frac{1}{Pr^{'}({g})} \cdot \left (- Pr({g}) \log
$ $\displaystyle =$ $\displaystyle \frac{Pr({g})}{Pr^{'}({g})} \cdot \left( - \log Pr({g}) + \sum_{\...
...k}
\lambda^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(v))+ \lambda^{(k+1)}\right)$  

  3. By solving the equation $ \frac {\partial \mathcal{L}(Pr({g}))}{\partial Pr({g})}=0$ with respect to $ Pr({g})$, the resulting MaxEnt distribution is:
    $\displaystyle Pr^*({g})$ $\displaystyle =$ $\displaystyle \exp\left\{ \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(v))+ \lambda^{(k+1)}\right\}$  
      $\displaystyle =$ $\displaystyle \exp\left\{ \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(v))\right\} \cdot \exp \left\{\lambda^{(k+1)}\right\}$  
      $\displaystyle =$ $\displaystyle \frac{1}{Z} \cdot \exp\left\{ \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(v))\right\}$  


    where $ \Lambda_k=\{\lambda^{(1)},\lambda^{(2)},...\lambda^{(k)}\}$ are Lagrange factors. The partition function $ Z$ is derived from the second constraints as follows:

    $\displaystyle \int {Pr^*({g})\:d{g}}=1 \Rightarrow
$

    $\displaystyle \frac{1}{Z} \cdot \int {\exp\left\{ \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(v))\right\} \:d{g}}=1
\Rightarrow$

    $\displaystyle Z(\Lambda_k)=\left. \int \right. \exp\left\{ \sum_{\alpha =1}^{k}
\lambda^{(\alpha)} \cdot
\delta\left(z-{g}^{(\alpha)}(v)\right)\right\}\:d{{g}}
$

The discrete form of $ Pr^*({g}\mid \Lambda_k)$ is derived by the following transformations,


$\displaystyle Pr^*({g}\mid \Lambda_k)$ $\displaystyle =$ $\displaystyle \frac{1}{Z(\Lambda_k)} \cdot \exp\left\{-
\sum_{\vec{v}} \sum_{\a...
...}
\int \lambda^{(\alpha)}(z) \cdot
\delta(z-{g}^{(\alpha)}(\vec{v}))dz \right\}$  
  $\displaystyle =$ $\displaystyle \frac{1}{Z(\Lambda_k)} \cdot \exp\left\{- \sum_{\vec{v}}
\sum_{\a...
...z=0}^{Q-1} \lambda_z^{(\alpha)} \cdot
\delta(z-{g}^{(\alpha)}(\vec{v}))\right\}$  
  $\displaystyle =$ $\displaystyle \frac{1}{Z(\Lambda_k)} \cdot \exp\left\{-
\sum_{\alpha =1}^{k}
\s...
...bda_z^{(\alpha)} \cdot
\sum_{\vec{v}} \delta(z-{g}^{(\alpha)}(\vec{v}))\right\}$  
  $\displaystyle =$ $\displaystyle \frac{1}{Z(\Lambda_k)} \cdot \exp\left\{-
\sum_{\alpha =1}^{k}
\sum_{z=0}^{Q-1} \lambda_z^{(\alpha)} \cdot
H_z^{(\alpha)}\right\}$  
  $\displaystyle =$ $\displaystyle \frac{1}{Z(\Lambda_k)} \cdot \exp\left\{-
\sum_{\alpha =1}^{k}
\langle
\lambda^{(\alpha)} \cdot
H^{(\alpha)}\rangle\right\}$ (5.2.25)

Here, the vector of piecewise functions, $ H^{(\alpha)}=\{H_0^{(\alpha)},H_1^{(\alpha)},...,H_{Q-1}^{(\alpha)}
\}$ and $ \lambda^{(\alpha)}=\{\lambda_0^{(\alpha)},\lambda_1^{(\alpha)},...,\lambda_{Q-1}^{(\alpha)}
\}$, represent the histogram of filtered image $ {g}^{(\alpha)}$ and the Lagrange parameters, respectively.

As shown in Eq. (5.2.25), the FRAME model is specified by a Gibbs distribution with the marginal empirical distribution (histogram) of filter responses as its sufficient statistics. The Lagrange multipliers $ \Lambda_k$ are model parameters to estimate for each particular texture. Typically, the model parameters are learnt via stochastic approximation which updates parameter estimates iteratively based on the following equation,

$\displaystyle \lambda_{[t]}^{(\alpha)}=\lambda_{[t-1]}^{(\alpha)}+c
(H^{(\alpha)}({g}^{[t]})-H^{(\alpha)}({g}^{[0]}))
$

where $ ({g}^{[t]})$ is an image drawn at random from the distribution $ Pr({g};\Lambda_{[t-1]})$ using, e.g., a Gibbs sampler. Since the FRAME model involves $ k \times Q$ parameters, the computational complexity of parameters estimation depends on both the number of selected filters and signal levels in an image.


next up previous
Next: Today's Methods of Texture Up: Texture Analysis and Synthesis: Previous: Texture Analysis: an Overview
dzho002 2006-02-22