To an extent that would have been astonishing a century ago, DNA copying and transcription turns out to be a digital process. The copying mechanism allows tiny amounts of DNA to be amplified; the base-pairing of the double helix lets us read out a sequence, usually by adding fluorescent tags to the DNA bases. With three billion letters of genome, much of which is poorly understood, computers need to do nearly all the work. I will talk about three areas of IT involved.
- DNA sequencing is a massively parallel operation: the current standard technology produces results in snippets as short as 36 letters. Assembly is the process of combining billions of short reads into a single genome: it would be computationally difficult if there were no errors; there are errors.
- Annotation, looking up what is known or can be guessed about a stretch of genome, is a traditional database task, complicated by the lack of centralised control and the variable quality of information.
- Association studies relate differences in the DNA sequence to differences in biology and health. These require DNA and medical data on tens or hundreds of thousands of people, usually divided across multiple research groups and countries. They require careful design of the statistical computing in order to use data and analyst time efficiently.
Thomas Lumley is Professor of Biostatistics at the University of Auckland. He has a PhD in Biostatistics from the University of Washington, an MSc in Applied Statistics from the University of Oxford, and a BSc(Hons) in Pure Mathematics from Monash University. He worked at the University of Washington for 12 years and moved to Auckland in 2010. At the University of Washington, Professor Lumley became closely involved with the CHARGE Consortium of large-scale genetic cohort studies, which has published over 400 papers on genetic associations in humans. He is a Fellow of the American Statistical Association and of the Royal Society of New Zealand, and member of the R Core Development Team
NOTE: Drinks and nibbles will be served from 6pm on Level 1 of the Owen G Glenn building