CBIR: Texture Features

QBE Using Texture Features

"Texture - ...(in extended use) the constitution, structure, or substance of anything with regard to its constituents or formative elements." (The Oxford Dictionary, 1971; 1989).

"Texture - ...a basic scheme or structure; the overall structure of something incorporating all of most of parts." (Webster's Dictionary, 1959; 1986).

Texture is a very general notion that can be attributed to almost everything in nature. For a human, the texture relates mostly to a specific, spatially repetitive (micro)structure of surfaces formed by repeating a particular element or several elements in different relative spatial positions. Generally, the repetition involves local variations of scale, orientation, or other geometric and optical features of the elements.

Image textures are defined as images of natural textured surfaces and artificially created visual patterns, which approach, within certain limits, these natural objects. Image sensors yield additional geometric and optical transformations of the perceived surfaces, and these transformations should not affect a particular class of textures the surface belongs.

Flowers0001 Flowers0002 Leaves0011 Metal0002
Synthesised textures
Examples of natural image textures (VisTex database, MIT Media Lab., USA) and textures synthesised using a generic Gibbs random field (GGRF) model with multiple pairwise pixel interactions.

It is almost impossible to describe textures in words, although each human definition involves various informal qualitative structural features, such as fineness - coarseness, smoothness, granularity, lineation, directionality, roughness, regularity - randomness, and so on. These features, which define a spatial arrangement of texture constituents, help to single out the desired texture types, e.g. fine or coarse, close or loose, plain or twilled or ribbed textile fabrics. It is difficult to use human classifications as a basis for formal definitions of image textures, because there is no obvious ways of associating these features, easily perceived by human vision, with computational models that have the goal to describe the textures. Nonetheless, after several decades of reseach and development of texture analysis and synthesis, a variety of computational characteristics and properties for indexing and retrieving textures have been found. The textural features describe local arrangements of image signals in the spatial domain or the domain of Fourier or other spectral transforms. In many cases, the textural features follow from a particular random field model of textured images (Castelli & Bergman, 2002).

Texture Features and Co-occurrence Matrices

Many statistical texture features are based on co-occurrence matrices representing second-order statistics of grey levels in pairs of pixels in an image. The matrices are sufficient statistics of a Markov/Gibbs random field with multiple pairwise pixel interactions.

A co-occurrence matrix shows how frequent is every particular pair of grey levels in the pixel pairs, separated by a certain distance d along a certain direction a.

Let g = (gx,y: x = 1, ..., M; y = 1, ..., N) be a digital image. Let Q={0, ..., qmax} be the set of grey levels. The co-occurrence matrix for a given inter-pixel distance d and directional angle a is defined as

COOCm=d cos a, n=d sin a(g)=[COOCm, n(q, s|g): q, s = 0, ..., qmax]

where COOCm, n(q, s|g) is the cardinality of the set Cm,n of pixel pairs [(x,y), (x+m, y+n)] such that gx,y=q and gx+m,y+n=s.

Various statistical and information theoretic properties of the co-occurrence matrices can serve as textural features (e.g., such features as homogeneity, coarseness, or periodicity introduced by Haralick). But these features are expensive to compute, and they were not very efficient for image classification and retrieval (Castelli & Bergman, 2002).

Tamura's Texture Features

Today's CBIR systems use in most cases the set of six visual features, namely,

selected by Tamura, Mori, and Yamawaki (Tamura e.a., 1977, Castelli & Bergman, 2002) on the basis of psychological experiments.

Coarseness relates to distances of notable spatial variations of grey levels, that is, implicitly, to the size of the primitive elements (texels) forming the texture. The proposed computational procedure accounts for differences between the average signals for the non-overlapping windows of different size:

  1. At each pixel (x,y), compute six averages for the windows of size 2k × 2k, k=0,1,...,5, around the pixel.
  2. At each pixel, compute absolute differencesEk(x,y) between the pairs of nonoverlapping averages in the horizontal and vertical directions.
  3. At each pixel, find the value of k that maximises the difference Ek(x,y) in either direction and set the best size Sbest(x,y)=2k.
  4. Compute the coarseness feature Fcrs by averaging Sbest(x,y) over the entire image.

Instead of the average of Sbest(x,y, an improved coarseness feature to deal with textures having multiple coarseness properties is a histogram characterising the whole distribution of the best sizes over the image (Castelli & Bergman, 2002).

Contrast measures how grey levels q; q = 0, 1, ..., qmax, vary in the image g and to what extent their distribution is biased to black or white. The second-order and normalised fourth-order central moments of the grey level histogram (empirical probability distribution), that is, the variance, σ2, and kurtosis, α4, are used to define the contrast: where and m is the mean grey level, i.e. the first order moment of the grey level probability distribution. The value n=0.25 is recommended as the best for discriminating the textures.

Degree of directionality is measured using the frequency distribution of oriented local edges against their directional angles. The edge strength e(x,y) and the directional angle a(x,y) are computed using the Sobel edge detector approximating the pixel-wise x- and y-derivatives of the image:

e(x,y) = 0.5(|Δx(x,y)| + |Δy(x,y)| )
a(x,y) = tan-1(Δy(x,y) / Δx(x,y))

where Δx(x,y) and Δy(x,y) are the horizontal and vertical grey level differences between the neighbouring pixels, respectively. The differences are measured using the following 3 × 3 moving window operators:

−1 0 1         1   1   1
−1 0 1         0   0   0
−1 0 1       −1 −1 −1

A histogram Hdir(a) of quantised direction values a is constructed by counting numbers of the edge pixels with the corresponding directional angles and the edge strength greater than a predefined threshold. The histogram is relatively uniform for images without strong orientation and exhibits peaks for highly directional images. The degree of directionality relates to the sharpness of the peaks:

where np is the number of peaks, ap is the position of the pth peak, wp is the range of the angles attributed to the pth peak (that is, the range between valleys around the peak), r denotes a normalising factor related to quantising levels of the angles a, and a is the quantised directional angle (cyclically in modulo 180o).

Three other features are highly correlated with the above three features and do not add much to the effectiveness of the texture description. The linelikeness feature Flin is defined as an average coincidence of the edge directions (more precisely, coded directional angles) that co-occurred in the pairs of pixels separated by a distance d along the edge direction in every pixel. The edge strength is expected to be greater than a given threshold eliminating trivial "weak" edges. The coincidence is measured by the cosine of difference between the angles, so that the co-occurrences in the same direction are measured by +1 and those in the perpendicular directions by -1.

The regularity feature is defined as Freg=1-r(scrs+scon+sdir + slin) where r is a normalising factor and each s... means the standard deviation of the corresponding feature F... in each subimage the texture is partitioned into. The roughness feature is given by simply summing the coarseness and contrast measures: Frgh=Fcrs+Fcon

In the most cases, only the first three Tamura's features are used for the CBIR. These features capture the high-level perceptual attributes of a texture well and are useful for image browsing. However, they are not very effective for finer texture discrimination (Castelli & Bergman, 2002).

Markov Random Field Texture Models

Random field models consider an image as a 2D array of random scalars (grey values) or vectors (colours). In other words, the signal at each pixel location is a random variable. Each type of textures is characterised by a joint probability distribution of signals that accounts for spatial inter-dependence, or interaction among the signals. The interacting pixel pairs are usually calles neighbours, and a random field texture model is characterised by geometric structure and quantitative strength of interactions among the neighbours.

If pixel interactions are assumed translation invariant, the interaction structure is given by a set N of characteristic neighbours of each pixel. This results in the Markov random field model where the conditional probability of signals in each pixel (x,y) depends only on the signals in the neighbourhood {(x+m,y+n): (m,n) from the set N}.

In a special case of the simultaneous autoregressive (SAR) Gauss-Markov model, the texture is represented by a set of parameters of the autoregression:

Here, w is independent (white) noise with zero mean and unit variance, and parameters a(m,n) and s specify the SAR model. The basic problem is how to find the adequate neighbourhood, and this nontrivial problem has no general solution.

More general generic Gibbs random field models with multiple pairwise pixel interactions allow to relate the desired neighbourhood to a set of most "energetic" pairs of the neighbours. Then the interaction structure itself and relative frequency distributions of signal cooccurrences in the chosen pixel pairs can serve as the texture features (Gimel'farb & Jain, 1996).

MIT "Fabrics0008" Interaction structure
with 35 neighbours

Similarity Measures for Texture Features

Texture features are usually compared on the basis of dissimilarity between the two feature vectors. The dissimilarity is given by the Euclidean, Mahalanobis, or city-block distance. In some cases, the weigted distances are used where the weight of each vector component is inversely proportional to the standard deviation of this feature in the database.

If the feature vector represents relative frequency distribution (e.g., a normalised grey level cooccurrence histogram), the dissimilarity can also be measured by the relative entropy, or Kullback-Leibler (K-L) divergence. Let D(g,q) denote the divergence between two distributions, fg = (fg,t : t=1, ..., T) and fq = (fq,t : t=1, ..., T). Then

This dissimilarity measure is asymmetric and does not represent a distance because the triangle inequality is not satisfied. The symmetric distance is obtained by averaging D(g,q) and D(q,g). It should be noted that no single similarity measure achieves the best overall performance for retrieval of different textures (Castelli & Bergman, 2002).

Wold Decomposition Based and Gabor Texture Features

If a texture is modelled as a sample of a 2D stationary random field, the Wold decomposition can also be used for similarity-based retrieval (Liu & Picard, 1996, Castelli & Bergman, 2000). In the Wold model a spatially homogeneous random field is decomposed into three mutually orthogonal components, which approximately represent periodicity, directionality, and a purely random part of the field.

The deterministic periodicity of the image is analysed using the autocorrelation function. The corresponding Wold feature set consists of the frequencies and the magnitudes of the harmonic spectral peaks (e.g., the K largest peaks). The indeterministic (random) components of the image are modelled with the multiresolution simultaneous autoregressive (MR-SAR) process. The retrieval uses matching of the harmonic peaks and the distances between the MRSAR parameters. The similarity measure involves a weighted ordering based on the confidence in the query pattern regularity. Experiments with some natural texture datrabases had shown that the Wold model provides perceptually better quality retrieval than the MR-SAR model or the Tamura's features (Castelli & Bergman, 2002).

An alternative to the spatial domain for computing the texture features is to use domains of specific transforms, such as the discrete Fourier transform (DFT), the discrete cosine transform (DCT), or the discrete wavelet transforms (DWT). Global power spectra computed from the DFT have not been effective in texture classification and retrieval, comparting to local features in small windows. At present, most promising for texture retrieval are multiresolution features obtained with orthogonal wavelet transforms or with Gabor filtering. The features describe spatial distributions of oriented edges in the image at multiple scales.

A 2D Gabor function γ(x,y) and its Fourier transform Γ(u,v) are as follows (Manjunath & Ma, 1996):

where σu = 1/2πσx and σv = 1/2πσy. The Gabor function is a product of an elliptical Gaussian and a complex-plane wave and it minimises joint 2D uncertainty in both spatial and frequency domain. Appropriate dilations and rotations of this function yield a class of self-similar Gabor filters for orientation- and scale-tunable edge and line detection. The filters form a complete but non-orthogonal basis set for expanding an image and getting its localised spatial frequency description. The total number of Gabor filters is equal to the product of the numbers of scales and orientations.

A class of self-similar Gabor wavelets is produced from the "mother" wavelet γ(x,y) by the dilations and rotations specified with the generating function that depends on the integer parameters m and k and the scale factor a > 1 as follows: γmk(x,y) = amγ(x′n,y′k) where x′k = am(x cos θk + y sin θk); y′k = am(−x sin θk + y cos θk); &thetak = kπ/K, and K is the total number of orientations. The scale factor am for x′k and y′k makes the filter energy independent of m. To exclude sensitivity of the filters to absolute intensity values, the real (even) components of the 2D Gabor filters are usually biased to make them zero mean. An ensemble of grey-coded differently oriented odd (a) and even (b) 2D Gabor filters are exemplified below (see Lee, 1996 for more detail):

These generalised Gabor functions are of the following 2D form:

where (x0, y0) is the spatial location of the filter centre in the image, and (u0, v0) is the spatial frequency of the filter in the frequency domain.

The Gabor wavelets are not orthogonal. Therefore, to reduce the informational redundancy of the filter outputs, the typical filter design strategy makes the half-peak magnitude supports of the filter responses in the spatial frequency domain touch each other. For example (Manjunath & Ma, 1996), the contours below correspond to the half-peak magnitude of the filter responses in the set of Gabor filters with the upper centre frequency of interest, uh = 0.4, the lower centre frequency of interest, ul = 0.05, six orientations (K = 6), and four scales (S = 4):

In such a design, the scale factor a and filter parameters σu and σv (and thus σx and σy) are specified in terms of uh = W, ul, K, and S, in particular, a = (uh/ul)−1/(S−1) (Manjunath & Ma, 1996, Vajihollahi & Farahbod, 2002):

The Gabor texture features include the mean and the standard deviation of the magnitude of the Gabor wavelet transform coefficients. Given an image g = (g(x,y): (x = 0, 1, ..., X−1; y = 0, 1, ..., Y−1), the Gabor wavelet transform is defined as

where the asterisk (*) indicates the complex conjugate, and the mean μmk and the stabdard deviation σmk of the magnitude of the transformed image describe each local texture region under an assumption of its spatial homogeneity:

A feature vector contains these pairs for all the scales and orientations of the wavelets, e.g. for the six orientations (K = 6), and four scales (S = 4) the feature vector contains 24 pairs: f=(μ00, σ00, ..., μ35, σ35).

MPEG-7 Texture Descriptors

The MPEG-7 multimedia content description interface involves three texture descriptors for representing texture regions in images (Manjunath e.a., 2001; Sikora, 2001), namely,