! A Gentle Introduction to Haskell: Arrays The Gentle Introduction To Haskell
back next

11  Arrays

Ideally, arrays in a functional language would be regarded simply as functions from indices to values, but pragmatically, in order to assure efficient access to array elements, we need to be sure we can take advantage of the special properties of the domains of these functions, which are isomorphic to finite contiguous subsets of the integers. Haskell, therefore, does not treat arrays as general functions with an application operation, but as abstract data types with a subscript operation.

Two main approaches to functional arrays may be discerned: incremental and monolithic definition. In the incremental case, we have a function that produces an empty array of a given size and another that takes an array, an index, and a value, producing a new array that differs from the old one only at the given index. Obviously, a naive implementation of such an array semantics would be intolerably inefficient, requiring a new copy of an array for each incremental redefinition; thus, serious attempts at using this approach employ sophisticated static analysis and clever run-time devices to avoid excessive copying. The monolithic approach, on the other hand, constructs an array all at once, without reference to intermediate array values. Although Haskell has an incremental array update operator, the main thrust of the array facility is monolithic.

Arrays are not part of the Standard Prelude -- the standard library contains the array operators. Any module using arrays must import the Array module.

11.1  Index types

The Ix library defines a type class of array indices:

class  (Ord a) => Ix a  where
    range       :: (a,a) -> [a]
    index       :: (a,a) a -> Int
    inRange     :: (a,a) -> a -> Bool

Instance declarations are provided for Int, Integer, Char, Bool, and tuples of Ix types; in addition, instances may be automatically derived for enumerated and tuple types. We regard the primitive types as vector indices and tuples as indices of multidimensional rectangular arrays. Note that the first argument of each of the operations of class Ix is a pair of indices; these are typically the bounds (first and last indices) of an array. For example, the bounds of a 10-element, zero-origin vector with Int indices would be (0,9), while a 100 by 100 1-origin matrix might have the bounds ((1,1),(100,100)). (In many other languages, such bounds would be written in a form like 1:100, 1:100, but the present form fits the type system better, since each bound is of the same type as a general index.)

The range operation takes a bounds pair and produces the list of indices lying between those bounds, in index order. For example,

range (0,4) => [0,1,2,3,4]

range ((0,0),(1,2)) => [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]

The inRange predicate determines whether an index lies between a given pair of bounds. (For a tuple type, this test is performed componentwise.) Finally, the index operation is what is needed to address a particular element of an array: Given a bounds pair and an in-range index, the operation yields the zero-origin ordinal of the index within the range; for example:

index (1,9) 2 => 1

index ((0,0),(1,2)) (1,1) => 4

11.2  Array Creation

Haskell's monolithic array creation function forms an array from a pair of bounds and a list of index-value pairs (an association list):

array                   :: (Ix a) => (a,a) -> [(a,b)] -> Array a b

Here, for example, is a definition of an array of the squares of numbers from 1 to 100:

squares                 =  array (1,100) [(i, i*i) | i <- [1..100]]

This array expression is typical in using a list comprehension for the association list; in fact, this usage results in array expressions much like the array comprehensions of the language Id.

Array subscripting is performed with the infix operator !, and the bounds of an array can be extracted with the function bounds:

squares!7 => 49

bounds squares => (1,100)

We might generalize this example by parameterizing the bounds and the function to be applied to each index:

mkArray                 :: (Ix a) => (a -> b) -> (a,a) -> Array a b
mkArray f bnds          =  array bnds [(i, f i) | i <- range bnds]

Thus, we could define squares as mkArray (\i -> i * i) (1,100).

Many arrays are defined recursively; that is, with the values of some elements depending on the values of others. Here, for example, we have a function returning an array of Fibonacci numbers:

fibs    :: Int -> Array Int Int
fibs n  =  a  where a = array (0,n) ([(0, 1), (1, 1)] ++ 
                                     [(i, a!(i-2) + a!(i-1)) | i <- [2..n]])

Another example of such a recurrence is the n by n wavefront matrix, in which elements of the first row and first column all have the value 1 and other elements are sums of their neighbors to the west, northwest, and north:

wavefront       :: Int -> Array (Int,Int) Int
wavefront n     =  a  where
                   a = array ((1,1),(n,n))
                        ([((1,j), 1) | j <- [1..n]] ++
                         [((i,1), 1) | i <- [2..n]] ++
                         [((i,j), a!(i,j-1) + a!(i-1,j-1) + a!(i-1,j))
                                     | i <- [2..n], j <- [2..n]])

The wavefront matrix is so called because in a parallel implementation, the recurrence dictates that the computation can begin with the first row and column in parallel and proceed as a wedge-shaped wave, traveling from northwest to southeast. It is important to note, however, that no order of computation is specified by the association list.

In each of our examples so far, we have given a unique association for each index of the array and only for the indices within the bounds of the array, and indeed, we must do this in general for an array be fully defined. An association with an out-of-bounds index results in an error; if an index is missing or appears more than once, however, there is no immediate error, but the value of the array at that index is then undefined, so that subscripting the array with such an index yields an error.

11.3  Accumulation

We can relax the restriction that an index appear at most once in the association list by specifying how to combine multiple values associated with a single index; the result is called an accumulated array:

accumArray :: (Ix a) -> (b -> c -> b) -> b -> (a,a) -> [Assoc a c] -> Array a b

The first argument of accumArray is the accumulating function, the second is an initial value (the same for each element of the array), and the remaining arguments are bounds and an association list, as with the array function. Typically, the accumulating function is (+), and the initial value, zero; for example, this function takes a pair of bounds and a list of values (of an index type) and yields a histogram; that is, a table of the number of occurrences of each value within the bounds:

hist            :: (Ix a, Integral b) => (a,a) -> [a] -> Array a b
hist bnds is    =  accumArray (+) 0 bnds [(i, 1) | i <- is, inRange bnds i]

Suppose we have a collection of measurements on the interval [a,b), and we want to divide the interval into decades and count the number of measurements within each:

decades         :: (RealFrac a) => a -> a -> [a] -> Array Int Int
decades a b     =  hist (0,9) . map decade
                   where decade x = floor ((x - a) * s)
                         s        = 10 / (b - a)

11.4  Incremental updates

In addition to the monolithic array creation functions, Haskell also has an incremental array update function, written as the infix operator //; the simplest case, an array a with element i updated to v, is written a // [(i, v)]. The reason for the square brackets is that the left argument of (//) is an association list, usually containing a proper subset of the indices of the array:

(//)            :: (Ix a) => Array a b -> [(a,b)] -> Array a b

As with the array function, the indices in the association list must be unique for the values to be defined. For example, here is a function to interchange two rows of a matrix:

swapRows :: (Ix a, Ix b, Enum b) => a -> a -> Array (a,b) c -> Array (a,b) c
swapRows i i' a =  a // ([((i ,j), a!(i',j)) | j <- [jLo..jHi]] ++
                         [((i',j), a!(i ,j)) | j <- [jLo..jHi]])
                   where ((iLo,jLo),(iHi,jHi)) = bounds a

The concatenation here of two separate list comprehensions over the same list of j indices is, however, a slight inefficiency; it's like writing two loops where one will do in an imperative language. Never fear, we can perform the equivalent of a loop fusion optimization in Haskell:

swapRows i i' a =  a // [assoc | j <- [jLo..jHi],
                                 assoc <- [((i ,j), a!(i',j)),
                                           ((i',j), a!(i, j))] ]
                   where ((iLo,jLo),(iHi,jHi)) = bounds a

11.5  An example: Matrix Multiplication

We complete our introduction to Haskell arrays with the familiar example of matrix multiplication, taking advantage of overloading to define a fairly general function. Since only multiplication and addition on the element type of the matrices is involved, we get a function that multiplies matrices of any numeric type unless we try hard not to. Additionally, if we are careful to apply only (!) and the operations of Ix to indices, we get genericity over index types, and in fact, the four row and column index types need not all be the same. For simplicity, however, we require that the left column indices and right row indices be of the same type, and moreover, that the bounds be equal:

matMult         :: (Ix a, Ix b, Ix c, Num d) =>
                   Array (a,b) d -> Array (b,c) d -> Array (a,c) d
matMult x y     =  array resultBounds
                         [((i,j), sum [x!(i,k) * y!(k,j) | k <- range (lj,uj)])
                                       | i <- range (li,ui),
                                         j <- range (lj',uj') ]
        where ((li,lj),(ui,uj))         =  bounds x
              ((li',lj'),(ui',uj'))     =  bounds y
                | (lj,uj)==(li',ui')    =  ((li,lj'),(ui,uj'))
                | otherwise             = error "matMult: incompatible bounds"

As an aside, we can also define matMult using accumArray, resulting in a presentation that more closely resembles the usual formulation in an imperative language:

matMult x y     =  accumArray (+) 0 resultBounds
                              [((i,j), x!(i,k) * y!(k,j))
                                      | i <- range (li,ui),
                                        j <- range (lj',uj')
                                        k <- range (lj,uj)  ]
        where ((li,lj),(ui,uj))         =  bounds x
              ((li',lj'),(ui',uj'))     =  bounds y
                | (lj,uj)==(li',ui')    =  ((li,lj'),(ui,uj'))
                | otherwise             = error "matMult: incompatible bounds"

We can generalize further by making the function higher-order, simply replacing sum and (*) by functional parameters:

genMatMult      :: (Ix a, Ix b, Ix c) =>
                   ([f] -> g) -> (d -> e -> f) ->
                   Array (a,b) d -> Array (b,c) e -> Array (a,c) g
genMatMult f g x y  =  array resultBounds
                        [((i,j), f [x!(i,k) `g` y!(k,j) | k <- range (lj,uj)])
                                 | i <- range (li,ui),
                                   j <- range (lj',uj') ]
        where ((li,lj),(ui,uj))         =  bounds x
              ((li',lj'),(ui',uj'))     =  bounds y
                | (lj,uj)==(li',ui')    =  ((li,lj'),(ui,uj'))
                | otherwise             = error "matMult: incompatible bounds"

APL fans will recognize the usefulness of functions like the following:

genMatMult maximum (-)
genMatMult and (==)

With the first of these, the arguments are numeric matrices, and the (i,j)-th element of the result is the maximum difference between corresponding elements of the i-th row and j-th column of the inputs. In the second case, the arguments are matrices of any equality type, and the result is a Boolean matrix in which element (i,j) is True if and only if the i-th row of the first argument and j-th column of the second are equal as vectors.

Notice that the element types of genMatMult need not be the same, but merely appropriate for the function parameter g. We could generalize still further by dropping the requirement that the first column index and second row index types be the same; clearly, two matrices could be considered conformable as long as the lengths of the columns of the first and the rows of the second are equal. The reader may wish to derive this still more general version. (Hint: Use the index operation to determine the lengths.)

The Gentle Introduction To Haskell
back next