The Gentle Introduction To Haskell
back next


8  Standard Haskell Classes

In this section we introduce the predefined standard type classes in Haskell. We have simplified these classes somewhat by omitting some of the less interesting methods in these classes; the Haskell report contains a more complete description. Also, some of the standard classes are part of the standard Haskell libraries; these are described in the Haskell Library Report.

Standard Haskell Classes

Figure 3

8.1  Equality and Ordered Classes

Haskell's standard classes form the somewhat imposing inclusion structure shown in Figure 3. At the top of the figure, we see Eq with its subclass Ord below it. These were defined in the previous section.

8.2  The Enumeration Class

Class Enum has a set of operations that underlie the syntactic sugar of arithmetic sequences; for example, the arithmetic sequence expression [1,3..] stands for enumFromThen 1 3 (see §3.10 for the formal translation). We can now see that arithmetic sequence expressions can be used to generate lists of any type that is an instance of Enum. This includes not only most numeric types, but also Char, so that, for instance, ['a'..'z'] denotes the list of lower-case letters in alphabetical order. Furthermore, user-defined enumerated types like Color can easily be given Enum instance declarations. If so:

[Red..Violet] => [Red, Green, Blue, Indigo, Violet]

Note that such a sequence is arithmetic in the sense that the increment between values is constant, even though the values are not numbers. Most types in Enum can be mapped onto integers; for these, the fromEnum and toEnum convert between Int and a type in Enum.

8.3  The Read and Show Classes

The instances of class Show are those types that can be converted to character strings (typically for I/O). The class Read provides operations for parsing character strings to obtain the values they may represent. As the primitive operations in these classes are somewhat esoteric, let's begin with one of the higher-level functions that is defined in terms of them:

show                    :: (Show a) => a -> String

Naturally enough, show takes any value of an appropriate type and returns its representation as a character string (list of characters), as in show (2+2), which results in "4". This is fine as far as it goes, but we typically need to produce more complex strings that may have the representations of many values in them, as in

"The sum of " ++ show x ++ " and " ++ show y ++ " is " ++ show (x+y) ++ "."

and after a while, all that concatenation gets to be a bit inefficient. Specifically, let's consider a function to represent the binary trees of Section 2.2.1 as a string, with suitable markings to show the nesting of subtrees and the separation of left and right branches (provided the element type is representable as a string):

showTree                :: (Show a) => Tree a -> String
showTree (Leaf x)       =  show x
showTree (Branch l r)   =  "<" ++ showTree l ++ "|" ++ showTree r ++ ">"

Because (++) has time complexity linear in the length of its left argument, showTree is quadratic in the size of the tree.

To restore linear complexity, the function shows is provided:

shows                   :: (Show a) => a -> String -> String

shows
takes a printable value and a string and returns that string with the value's representation concatenated at the front. The second argument serves as a sort of string accumulator, and show can now be defined as shows with the null accumulator:

show x                  =  shows x ""

We can use shows to define a more efficient version of showTree, which also has a string accumulator argument:

showsTree               :: (Show a) => Tree a -> String -> String
showsTree (Leaf x) s    =  shows x s
showsTree (Branch l r) s=  '<' : showsTree l ('|' : showsTree r ('>' : s))

This solves our efficiency problem (showsTree has linear complexity), but the presentation of this function (and others like it) can be improved. First, let's create a type synonym:

type ShowS              =  String -> String

This is the type of a function that returns a string representation of something followed by an accumulator string. Second, we can avoid carrying accumulators around, and also avoid amassing parentheses at the right end of long constructions, by using functional composition:

showsTree               :: (Show a) => Tree a -> ShowS
showsTree (Leaf x)      =  shows x
showsTree (Branch l r)  =  ('<':) . showsTree l . ('|':) . showsTree r . ('>':)

Something more important than just tidying up the code has come about by this transformation: We have raised the presentation from an object level (in this case, strings) to a function level. We can think of the typing as saying that showsTree maps a tree into a showing function. Functions like ('<' :) or ("a string" ++) are primitive showing functions, and we build up more complex functions by function composition.

Now that we can turn trees into strings, let's turn to the inverse problem. The basic idea is a parser for a type a, which is a function that takes a string and returns a list of (a, String) pairs. The Prelude provides a type synonym for such functions:

type ReadS a            =  String -> [(a,String)]

Normally, a parser returns a singleton list, containing a value of type a that was read from the input string and the remaining string that follows what was parsed. If no parse was possible, however, the result is the empty list, and if there is more than one possible parse (an ambiguity), the resulting list contains more than one pair. The standard function reads is a parser for any instance of Read:

reads                   :: (Read a) => ReadS a

We can use this function to define a parsing function for the string representation of binary trees produced by showsTree. List comprehensions give us a convenient idiom for constructing such parsers (An even more elegant approach to parsing uses monads and parser combinators. These are part of a standard parsing library distributed with most Haskell systems.):

readsTree               :: (Read a) => ReadS (Tree a)
readsTree ('<':s)       =  [(Branch l r, u) | (l, '|':t) <- readsTree s,
                                              (r, '>':u) <- readsTree t ]
readsTree s             =  [(Leaf x, t)     | (x,t)      <- reads s]

Let's take a moment to examine this function definition in detail. There are two main cases to consider: If the first character of the string to be parsed is '<', we should have the representation of a branch; otherwise, we have a leaf. In the first case, calling the rest of the input string following the opening angle bracket s, any possible parse must be a tree Branch l r with remaining string u, subject to the following conditions:

  1. The tree l can be parsed from the beginning of the string s.
  2. The string remaining (following the representation of l) begins with '|'. Call the tail of this string t.
  3. The tree r can be parsed from the beginning of t.
  4. The string remaining from that parse begins with '>', and u is the tail.
Notice the expressive power we get from the combination of pattern matching with list comprehension: The form of a resulting parse is given by the main expression of the list comprehension, the first two conditions above are expressed by the first generator ("(l, '|':t) is drawn from the list of parses of s."), and the remaining conditions are expressed by the second generator.

The second defining equation above just says that to parse the representation of a leaf, we parse a representation of the element type of the tree and apply the constructor Leaf to the value thus obtained.

We'll accept on faith for the moment that there is a Read (and Show) instance of Int (among many other types), providing a reads that behaves as one would expect, e.g.:

(reads "5 golden rings") :: [(Int,String)] => [(5, " golden rings")]

With this understanding, the reader should verify the following evaluations:

readsTree "<1|<2|3>>" => [(Branch (Leaf 1) (Branch (Leaf 2) (Leaf 3)), "")]
readsTree "<1|2" => []

There are a couple of shortcomings in our definition of readsTree. One is that the parser is quite rigid, allowing no white space before or between the elements of the tree representation; the other is that the way we parse our punctuation symbols is quite different from the way we parse leaf values and subtrees, this lack of uniformity making the function definition harder to read. We can address both of these problems by using the lexical analyzer provided by the Prelude:

lex                     :: ReadS String

lex
normally returns a singleton list containing a pair of strings: the first lexeme in the input string and the remainder of the input. The lexical rules are those of Haskell programs, including comments, which lex skips, along with whitespace. If the input string is empty or contains only whitespace and comments, lex returns [("","")]; if the input is not empty in this sense, but also does not begin with a valid lexeme after any leading whitespace and comments, lex returns [].

Using the lexical analyzer, our tree parser now looks like this:

readsTree               :: (Read a) => ReadS (Tree a)
readsTree s             =  [(Branch l r, x) | ("<", t) <- lex s,
                                              (l,   u) <- readsTree t,
                                              ("|", v) <- lex u,
                                              (r,   w) <- readsTree v,
                                              (">", x) <- lex w         ]
                           ++
                           [(Leaf x, t)     | (x,   t) <- reads s       ]

We may now wish to use readsTree and showsTree to declare (Read a) => Tree a an instance of Read and (Show a) => Tree a an instance of Show. This would allow us to use the generic overloaded functions from the Prelude to parse and display trees. Moreover, we would automatically then be able to parse and display many other types containing trees as components, for example, [Tree Int]. As it turns out, readsTree and showsTree are of almost the right types to be Show and Read methods, needing only the addition of an extra parameter each that has do do with parenthesization of forms with infix constructors. We refer the interested reader to §D for details.

We can test the Read and Show instances by applying (read . show) (which should be the identity) to some trees, where read is a specialization of reads:

read                    :: (Read a) => String -> a

This function fails if there is not a unique parse or if the input contains anything more than a representation of one value of type a (and possibly, comments and whitespace).

8.4  Derived Instances

Recall the Eq instance for trees we presented in Section 5; such a declaration is simple---and boring---to produce: We require that the element type in the leaves be an equality type; then, two leaves are equal iff they contain equal elements, and two branches are equal iff their left and right subtrees are equal, respectively. Any other two trees are unequal:

instance  (Eq a) => Eq (Tree a)  where
    (Leaf x)     == (Leaf y)        =  x == y
    (Branch l r) == (Branch l' r')  =  l == l' && r == r'
    _            == _               =  False

Fortunately, we don't need to go through this tedium every time we need equality operators for a new type; the Eq instance can be derived automatically from the data declaration if we so specify:

data  Tree a            =  Leaf a | Branch (Tree a) (Tree a)  deriving Eq

The deriving clause implicitly produces an Eq instance declaration just like the one in Section 5. Instances of Ord, Enum, Ix, Read, and Show can also be generated by the deriving clause. [More than one class name can be specified, in which case the list of names must be parenthesized and the names separated by commas.]

The derived Ord instance for Tree is slightly more complicated than the Eq instance:

instance  (Ord a) => Ord (Tree a)  where
    (Leaf _)     <= (Branch _)      =  True
    (Leaf x)     <= (Leaf y)        =  x <= y
    (Branch _)   <= (Leaf _)        =  False
    (Branch l r) <= (Branch l' r')  =  l == l' && r <= r' || l <= l'

This specifies a lexicographic order: Constructors are ordered by the order of their appearance in the data declaration, and the arguments of a constructor are compared from left to right. Recall that the built-in list type is semantically equivalent to an ordinary two-constructor type. In fact, this is the full declaration:

data [a]        = [] | a : [a] deriving (Eq, Ord)     -- pseudo-code

(Lists also have a Text instance, which is not derived.) The derived Eq and Ord instances for lists are the usual ones; in particular, character strings, as lists of characters, are ordered as determined by the underlying Char type, with an initial substring comparing less than a longer string; for example, "cat" < "catalog".

In practice, Eq and Ord instances are almost always derived, rather than user-defined. In fact, we should provide our own definitions of equality and ordering predicates only with some trepidation, being careful to maintain the expected algebraic properties of equivalence relations and total orders. An intransitive (==) predicate, for example, could be disastrous, confusing readers of the program and confounding manual or automatic program transformations that rely on the (==) predicate's being an approximation to definitional equality. Nevertheless, it is sometimes necessary to provide Eq or Ord instances different from those that would be derived; probably the most important example is that of an abstract data type in which different concrete values may represent the same abstract value.

An enumerated type can have a derived Enum instance, and here again, the ordering is that of the constructors in the data declaration. For example:

data Day = Sunday | Monday | Tuesday | Wednesday
         | Thursday | Friday | Saturday         deriving (Enum)

Here are some simple examples using the derived instances for this type:

[Wednesday..Friday] => [Wednesday, Thursday, Friday]
[Monday, Wednesday ..] => [Monday, Wednesday, Friday]

Derived Read (Show) instances are possible for all types whose component types also have Read (Show) instances. (Read and Show instances for most of the standard types are provided by the Prelude. Some types, such as the function type (->), have a Show instance but not a corresponding Read.) The textual representation defined by a derived Show instance is consistent with the appearance of constant Haskell expressions of the type in question. For example, if we add Show and Read to the deriving clause for type Day, above, we obtain

show [Monday..Wednesday] => "[Monday,Tuesday,Wednesday]"

8.5  Monadic Classes

The Prelude contains a number of classes which are related to the mathematical notion of a "monad." While the terminology used is derived from category theory, it is not necessary to delve into abstract mathematics to get an intuitive understanding of what these classes do. Indeed, although the names of the classes and methods may seem unusual, these monadic operations are very useful for general programming.

Monads are built on polymorphic types such as lists. The monad is defined by a set of instances in the monadic classes for this underlying polymorphic type.

Mathematically, monads are defined by set of laws which govern the monadic operations. Haskell includes other overloaded operations that are governed by laws, at least informally. For example, x /= y and not (x == y) ought to be the same. However, since both == and /= are class methods in the Eq class, there is no way to assure that == and =/ are related in this manner for arbitrary instances of Eq. In the same sense, the monadic laws presented here are not enforced by Haskell, but should be obeyed by all instances of a monadic class.

There are four classes associated with monads: Functor, Monad, MonadZero, and MonadPlus; none of them are derivable. In addition to IO, two other types in the Prelude are members of the monadic classes: lists ([]) and exceptions (Maybe). We will discuss only the instances for lists. The Maybe type behaves as a truncated list in monadic operations: Nothing is the same as [] and Just x is the same as [x].

The Functor class, already discussed in section 5, defines a single operation: map. The map function applies an operation to the objects inside a container (polymorphic types can be thought of as containers for values of another type), returning a container of the same shape. These laws apply to map in the class Functor:

map id=id
map (f . g)=map f . map g

These laws ensure that the container shape is unchanged by map and that the contents of the container are not re-arranged by the mapping operation.

The Monad class defines two basic operators: >>= (bind) and return.

class  Monad m  where
    (>>=)            :: m a -> (a -> m b) -> m b
    (>>)             :: m a -> m b -> m b
    return           :: a -> m a

    m >> k           =  m >>= \_ -> k

The bind operations, >> and >>=, combine two monadic values while the return operation injects a value into the monad (container). The signature of >>=, Monad m => m a -> (a -> m b) -> m b, helps us to understand this operation: ma >>= \v -> mb combines a monadic value ma containing values of type a and a function which operates on a value v of type a returns the monadic value mb. The result is to combine ma and mb into a monadic value containing b. The >> function is used when the function does not need the value produced by the first monadic operator. The exact meaning of binding depends of the particular monad being used. For example, in the IO monad, x >>= y performs two actions sequentially, passing the result of the first into the second. For lists, you can think of the monadic functions as dealing with multiple values. Monadic binding takes a set (list) of values and applies a function to each of them, collecting all generated values together. The return function creates a singleton list.

Monadic list operations should already be familiar: list comprehensions (as well as do expressions) are actually syntactic sugar for monadic operators. In fact, in spite of their name, list comprehensions can be used with arbitrary monads. These following three expressions are all different syntax for the same thing:

[(x,y) | x <- [1,2,3] , y <- [4,5,6]]

do x <- [1,2,3]
   y <- [4,5,6]
   return (x,y)

[1,2,3] >>= (\ x -> [4,5,6] >>= (\y -> return (x,y)))

The laws which govern >>= and return are:

return a >>= k=k a
m >>= return=m
map f xs=xs >>= return . f
m >>= (\x -> k x >>= h))=(m >>= k) >>= h

The class MonadZero is used for monads that have a zero element:

class  (Monad m) => MonadZero m  where
    zero             :: m a

The zero element obeys the following laws:

m >>= \x -> zero=zero
zero >>= m=zero

For lists, the zero value is [], the empty list. The I/O monad has no zero element and is not a member of this class.

The class MonadPlus contains monads which have a plus operation which combines the contents of two containers:

class  (MonadZero m) => MonadPlus m  where
    (++)             :: m a -> m a -> m a

Instances of MonadPlus should obey the following laws:

m ++ zero=m
zero ++ m=m

The ++ operator is ordinary list concatenation in the list monad.


The Gentle Introduction To Haskell
back next