Dr Peter Fenwick


Department of Computer Science,
The University of Auckland,
Private Bag 92019,
Auckland,
New Zealand.

Telephone :

- none -

Fax :

+ 64 9 373 7453

email :

p.fenwick@auckland.ac.nz


Room 303 567 (very occasionally),
Mathematical and Physical Sciences Building,
38 Princes St,
Auckland


Current Position

After about 20 years in the Department of Computer Science (and 12 years before that in the Computer Centre) at the University of Auckland, I have taken early retirement from Jan 2005.

My retiring rank as Associate Professor in the Department of Computer Science continues with a position as Honorary Associate Professor until the end of 2007, allowing me to maintain a relationship with the University.
My University email and other addresses (but not telephone) should continue to be available, but will be handled less often ....

Important notice -- I am NOT available for supervising student projects or similar activities.

1 Nov 2007


Interests and publications


Research Interests

My primary interests are in computer architecture (including arithmetic), communications and networks, and text compression.

Recently I have been concentrating on the new "block sorting" text compressor (or block reduction, or Burrows-Wheeler Transform), first described by Burrows and Wheeler in April 1994. Early results on this are reported in Tech Report 111 and Tech Report 120, with a final report in Technical Report 130. Following from that is other work in "symbol ranking" text compression.

The complete source code of a fast constant-order symbol ranking compressor is available as srank.c.

Other recent related work has involved fast text searching and universal or variable-length integer codes based on Fibonacci numbers and prime numbers.

And even more recently I have returned to the work that started me in computing 40 years ago (!) on the solution of the potential distribution in a Hall effect plate for a magnetometer.


Special Recognition

A recent survey identified me as a "Top Scholar in Systems and Software Engineering", based on a publications count in important international journals.
See "An assessment of systems and software engineering scholars and institutions (1998 - 2002)",
R.L. Glass, T.Y. Chen, The Journal of Systems and Software Vol 68 (2003) pp 77 - 84


Some recent book chapters

I have written four chapters in the "Lossless Data Compression Handbook" (ed Khalid Sayood), Academic Press, 2003. ISBN 0-12-620861-1

Return to top


Some recent Burrows Wheeler ideas.

Some of my recent ideas on Burrows Wheeler compression were presented at a workshop at Rutgers University in August 2004, celebrating the 10th anniversary of the Burrows Wheeler Transform. Here are a preliminary report and the presentation slides which include later work and go somewhat beyond the report.

The essence of this material has now been published in
Burrows–Wheeler compression : Principles and reflections” Theoretical Computer Science Vol 387(2007) pp 200–219 (see my publications list below for an abstract). You can read a preprint of this paper; the "preliminary report" above contains material that was omitted to save space when two earlier papers were merged into the final version.

The ideas fall into three groups, in order of decreasing work-done.

  1. Use a BW transform to derive all contexts of some fixed order, say 4, and use these to drive an escape-free PPM compressor/decompressor. This is the main substance of the report and presentation; it worked, but not nearly as well as hoped.

  2. "Break open" the reverse BW transform to generate partial contexts during both compression and decompression, so aiding the operation of the coder and decoder.
    The report starts by developing analogies between the Fourier Transform and its associated spectra, autocorrelations, etc, on the one hand and the the Burrows Wheeler transform and PPM on the other. Invoking ideas of symmetry leads to a significant new understanding of the Burrows Wheeler operation.
    (I have long thought that we did not really understand the Burrows-Wheeler Transform. But the forward transform is essentially simple, at least in concept, and showed little scope for improvement. With these ideas the reverse transform looks the more likely for further study and deeper understanding. At least I hope so.)

  3. Is there some relation between data compression and error correction codes? Compression can be thought of as a very noisy channel (encoding a single symbol into a range of possible symbols) and then generation of the codes to select the correct symbol from the "noise" codes; the encoder and decoder track each other and the "corrections" are what the compressor emits. (The ideas follow most readily from my earlier work on "Symbol Ranking" compression which uses an explicit symbol predictor/corrector mechanism).
    This third idea may provide a useful starting point for somebody with more imagination than me. It leads to two other ideas relating to the possible relationship between Error Correction and compression.

  4. Can we omit some symbols (giving an erasure code) and then use a Viterbi-like trellis to recover feasible decodings. (Whew!)

  5. The relatively new error-correcting "turbo-codes" use two interacting but complementary error-correcting codes to give superlative performance. Can something similar be done for compression? (It also ties in with the previous point.)
    But do we have suitably different compressors? The two compressors must work with the same symbol order, probably precluding a BW and PPM combination. But of those which retain the original "text" order, PPM and LZ-77 were shown to be equivalent by Bell, Cleary & Witten (in their book "Text Compression"). So until somebody invents say a symbol-by-symbol compressor which does not depend on contexts, there may be no suitable complementary candidates!
    If you have ideas, please try them!

Return to top


Some recent papers.

(The links for the electronic journal J.UCS, the "Journal for Universal Computer Science" are to the J.UCS Page for the appropriate month. The journal contains compressed Postscript files which may be copied and printed.)

Most of my papers and reports are available by anonymous ftp from ftp.cs.auckland.ac.nz/pub/staff/peter-f

  • Burrows–Wheeler compression : Principles and reflections” Theoretical Computer Science Vol 387(2007) pp 200–219
    Abstract After a general description of the Burrows Wheeler Transform and a brief survey of recent work on processing its output, the paper examines the coding of the zero-runs from the MTF recoding stage, an aspect with little prior treatment. It is concluded that the original scheme proposed by Wheeler is extremely efficient and unlikely to be much improved.
    The paper then proposes some new interpretations and uses of the Burrows Wheeler transform, with new insights and approaches to lossless compression, perhaps including techniques from error correction.
    A preprint of this paper is available from here

  • "Burrows Wheeler -- Alternatives to Move to Front",
    Authors - Dr Peter Fenwick, Dr Mark Titchener, Michelle Lorenz. Presented at Data Compression Conference DCC-2003.
    Abstract The Burrows Wheeler Transform, first published in 1994, is a relatively new approach to text compression and has already proven to produce excellent results. However, there has been much research directed towards improving the efficiency of the Move to Front algorithm with varying degrees of complexity. This paper examines a relatively simple technique using a cached modeling system proposed by Fenwick and achieves very promising results.
    Information loss during the Burrows Wheeler Transform is also examined using Deterministic Information Theory measuring techniques developed by Titchener. This provides an interesting insight into what is actually occurring during the complete Burrows Wheeler Transform and can confirm whether manipulating the MTF algorithm is the correct approach to refining an already successful compression scheme.
    Full papers in PDF and in PostScript. (Please tell me if you have problems with these files. On my computer the Acrobat Reader has some trouble with fonts, but the hard-copy print is fine.)

  • "Burrows Wheeler Compression with Variable Length Integer Codes", Software - Practice and Experience, Vol 32, No 13, Nov 2002, pp 1307 - 1316.
    Abstract The final coder in Burrows-Wheeler compression is usually either an adaptive Huffman coder (for speed) or a complex of arithmetic coders for better compression. This article describes the use of conventional pre-defined variable length codes or universal codes and shows that they too can give excellent compression. The paper also describes a "sticky Move-to-Front" modification which gives a useful improvement in compression for most files.

  • "Variable-Length Integer Codes Based on the Goldbach Conjecture, and Other Additive Codes", IEEE Trans. Inform. Theory, Vol 48, No 8, pp 2412 - 2417,Aug 2002.
    Abstract This paper introduces a new family of variable length codes for the integers, initially based on the Goldbach conjecture that every even integer is the sum of two primes. For an even integer we decompose the value into its two constituent primes and encode the ordinal numbers of those primes with an Elias gamma code. The method is then elaborated to handle odd integers. The paper then develops a more general method of encoding any integer as the sum of two integers and developing suitable basis sets of integers. Although the codes which are generated by these methods are characterised by widely-varying and unpredictable lengths, they are over some ranges shorter than most other variable length codes.

  • "Zeckendorf Integer Arithmetic", Fibonacci Quarterly Nov 2003, pp 405 - 413.
    For something quite different.... Fibonacci numbers are the basis of the "Zeckendorf" number representation which looks like a binary number but has no consecutive 1 bits. (By sending the Zeckendorf representation least-significant bit first and following its most significant 1-bit with another 1-bit the illegal pair 11 acts as terminator. The result is a very good variable-length code for integers.)
    However, can we perform arithmetic on these Zeckendorf representations? This paper shows that we can, giving procedures for addition, subtraction (and complementing), multiplication and division. "(preprint in PDF format)"

  • "Some Perils of Performance Prediction : a case study on Pattern Matching", Software - Practice and Experience, Vol 31, pp 835-843 April 2001.
    Abstract Recent experience with string searching or pattern matching algorithms revealed wide discrepancies between predicted and observed performance. Further observations on a variety of computers revealed even greater variations between those algorithms on different computers. These observations are collected here as an example of problems in real-world comparison of algorithms.
    This is a companion to the following paper.

  • "Fast string matching for multiple searches", Software - Practice and Experience, Vol 31, pp 815-833 April 2001.
    Abstract We present a string matching or pattern matching method which is especially useful when a single block of text must be searched repeatedly for different patterns. The method combines linking the text according to digrams, searching on the least-frequent digram, and probing selected characters as a preliminary filter before full pattern comparison. Tests on real alphabetic data show that the number of character comparisons may be decreased by two orders of magnitude compared with Knuth-Morris-Pratt and similar searching, but with an initialisation overhead comparable to 5 to 10 conventional searches.
    But see also the previous paper..

  • "Symbol Ranking Text Compressors: Review and Implementation", Software - Practice and Experience Vol 28 No 5 pp 547-559 April 1998.
    This paper reviews several previous text compressors with 'symbol ranking' as a unifying theme. It includes the compressor described in Tech Rep 145.

  • "Compression of Unicode Files", DCC-98 Data Compression Conference, Snowbird, Utah, March 1998.
    [A poster was presented at DCC-98 on the compression of Unicode files, showing that they can behave rather differently from conventional byte-oriented data. Experiments with a compressor using 16-bit data show that it often works better than standard byte compressors for Unicode data.] A pdf version of the paper is also available.

  • "Symbol Ranking Text Compression with Shannon Recodings", J.UCS Vol 3, No 2 pp 70 85 Feb 1997.
    [A general introduction to "symbol ranking" text compression and a description of a proof-of-concept compressor. An extended version of Tech Rep 132]

  • "The Burrows-Wheeler Transform for Block Sorting Text Compression -- Principles and Improvements", The Computer Journal, Vol 39, No 9, pp 731-740, 1996.
    [This is the paper which was written as the definitive summary of my experience with block sorting compression.]

  • " Queue Prediction: an efficient scheduler for fast ATM cell transmission ", ACSC'97, Sydney, Australia Feb 1997.
    [This paper describes a new technique for controlling ATM cell transmission.]

  • "Block Sorting Text Compression", ACSC'96, Melbourne, Australia Jan 1996.
    [This paper is the general overview of block sorting compression.]

  • "Differential Ziv-Lempel Text Compression", J.UCS Vol 1, No 8 pp 587 598 Aug 1995.
    [ An account of an attempt to combine Ziv-Lempel (lossless) compression with Vector Quamtisation (lossy) compression. It includes some predictions on the limits of LZ compression.]

  • "Huffman code efficiencies for extensions of sources", IEEE Trans. Comm., Vol 43 No 2 pp163 165, Feb 1995.
    [Athough coding the extensions of sources is a fundamental concept in Information Theory, there seems to be no work describing what actually happens in practice. This paper arose from a student assignment which showed that increasing the code extension can lead to a poorer code! Also available as Tech Rep 89.]

  • "High-radix Division with Approximate Quotient-digit Estimation", Journal for Universal Computer Science (J.UCS) Vol 1, No 1, pp 2 22 Jan 1995.
    [A major problem in high-radix division is estimating the correct digit (and divisor multiple). While it is easy to get the right answer most of the time, it is much harder to get it right all of the time. This paper shows that allowing a bad estimate to be corrected can lead to much simpler control logic, with very little speed penalty.]

  • "A new data structure for cumulative frequency tables", Software Practice and Experience, Vol 24, No 3, pp 327 336 Mar 1994.
    [This paper describes a new data structure which is especially designed to handle the cumulative frequency tables of arithmetic compression. See also Tech Rep 88 & 110.]

Return to top


And some Technical Reports

  • " Reflections on Burrows Wheeler compression", The University of Auckland, Department of Computer Science Report No 172, Jul. 2004.
    [This report presents some speculations on the nature of the Burrows Wheeler transform, from analogies with the better-known techniques of signal processing. These analogies, plus considerations of symmetry and inverse operations, suggest some possible developments to the Burrows Wheeler transform for lossless compression. ]

  • " A fast, constant-order, symbol ranking text compressor", The University of Auckland, Department of Computer Science Report No 145, Apr. 1997.
    [A symbol ranking text compressor which is designed for speed rather than good compression. A software implementation 3.56 bits/byte on the Calgary Corpus, running at 1 Mbyte/s on a 275MHz DEC Alpha. A version more suited to hardware implementation compresses to 3.95 bits/byte and should be able to run at hundreds of megabytes per second.]

  • " Punctured Elias codes for variable-length coding of the integers",The University of Auckland, Department of Computer Science Report No 137, Dec. 1996.
    [A general discussion of all of the variable length codings I have managed to discover, followed by a variant of the Elias gamma code which is useful for some text compression codings.]
    Later note - these codes are not really that good....

  • "Symbol Ranking Text Compression", The University of Auckland, Department of Computer Science Report No 132, Jun. 1996.
    [This describes an introductory implementation of a text compressor which follows directly from Shannon's 1951 paper on the information content of English text. For each context, the compressor prepares a list of symbols, ranked from most-likely to least likely, and emits the position of the actual symbol in the ordered list.]

  • " Block Sorting Text Compression -- Final Report", The University of Auckland, Department of Computer Science Report No 130, Apr. 1996.
    [The final report on block sorting compression, extending somewhat the results of the ACSC paper.]

  • " Improvements to the Block Sorting Text Compression Algorithm" , The University of Auckland, Department of Computer Science Report No 120, Aug. 1995.
    [Further work on the block sorting compression, intermediate between TR111 and the ACSC paper.]

  • "Experiments with a Block Sorting Text Compression Algorithm", The University of Auckland, Department of Computer Science Report No 111, May 1995. [Initial results and miscellaneous thoughts, on the block sorting algorithm. It includes many experimental results which are not found elsewhere.]

  • " A New Data Structure for Cumulative Probability Tables : an Improved Frequency-to-Symbol Algorithm", The University of Auckland, Department of Computer Science Report No 110, Feb. 1995.
    [Correction to the S P & E paper above, and to Tech Rep 88]

  • "Fast LZ-77 string matching", (with P. Gutmann) The University of Auckland, Department of Computer Science Report No 102, Sep. 1994.

  • "Huffman code efficiencies for extensions of sources", The University of Auckland, Department of Computer Science Report No 89, Feb. 1994.

  • "A new data structure for cumulative frequency tables", The University of Auckland, Department of Computer Science Report No 88, May 1993. [Essentially the S P & E paper.]


    My Hall Effect Project

    For this I simply reproduce the abstract and introduction from my report. Read further if you are interested!!

          Abstract

    This report describes the solution of a problem which arose about 40 years ago and seemed eminently suitable for computer solution. But numerical limitations made it quite unsuitable for complete solution on the computers then available. It is only now that a solution is feasible on generally-available machines.

    The problem itself involves the solution of potentials over a Hall-effect plate driven with a constant voltage instead of the more usual constant current. The change from current-drive to voltage-drive greatly reduces the temperature sensitivity of the Hall output voltage, but at the cost of non-linear response at high magnetic fields. The problem is to find the shape of the response.

          Background

    This is the problem that introduced me to computing. For reasons that will become apparent, its solution languished and it is only now, 40 years later, that I have at last completed it.

    In the summer of 1963-4 I was working as a casual laboratory technician in the University of Auckland Physics Department. Much of my work seemed to be with magnets for Advanced Laboratory experiments - I remember working with two large electromagnets, both weighing 10s of kilogrammes. One produced a high field of about 1 Tesla (10,000 gauss for the old-fashioned) for studying the Zeeman effect. The other produced a lower field, about 0.3 T, but very uniform, for demonstrating Nuclear Magnetic Resonance.

    Also working with things magnetic I built a Hall-effect fluxmeter to measure magnetic fields. The Hall element was driven by a constant voltage, using AC at about 1 kHz, with a switched gain amplifier and synchronous rectification. Its maximum full-scale sensitivity was about 100 microTesla, or 1 gauss, comparable to the Earth's field.

    But I knew that the output saturated at high fields (say B>0.1T) and wondered just how the output varied with a high magnetic field. The University had then just got its first computer, an IBM 1620 with all of 20k decimal digits of storage and a FP multiply time of 10 ms. Murray Johns suggested that I solve my problem on the computer as it seemed a good one for numerical solution. So I taught myself Fortran (with Format).

    But you may read the complete report as a PDF file....

    Return to top