Atlas home || Conferences | Abstracts | about Atlas

Australasian Biometrics and New Zealand Statistical Association Joint Conference 2001
December 10-13, 2001
Park Royal Hotel
Christchurch, New Zealand

Organizers
David Baird, Dave Saville, Harold Henderson, Peter Johnstone, Marco Reale, Irene Hudson, Julian Visch, Roger Littlejohn

View Abstracts
Conference Homepage

A strategy for sampling cDNA libraries for detecting 'novel' EST seqences
by
H. Nihal De Silva
The Horticulture and Food Research Institute of New Zealand Ltd.
Coauthors: Alistair J. Hall, William Laing

Scanning EST databases for sequence homology has proved to be an extremely useful approach for discovering novel genes. Genomic data on Expressed Sequence Tags (EST) are generated from sequencing a sample of colonies of a cDNA library. Most often there is a high degree of redundancy in the raw EST data. Usually, the data is further processed to provide datasets containing singletons or contigs before blast searching against public databases. In our definition a novel (or rare) EST is considered as one that matches with no other sequence in a list of databases. Initially, the proportion of ‘novel’ ESTs in a library will drop sharply as more libraries are sequenced. Considering that some libraries are more ‘novel’ than others the question asked is: What criterion should we use to decide whether to start a new library or continue to sequence the same library?

A natural approach to solve the problem is to minimise the expected cost per novel sequence found. We let the proportion of novel ESTs in any library be denoted by the rv \Pi, which we assume is beta distributed. For a specific parameter value \Pi = \pi for a sampled library we assume the number of novel ESTs in a sample of size n to be a random variable X with a binomial distribution. Given x novel ESTs are found in a sequential sample of size n1, we use Bayes’ theorem to calculate the posterior distribution. We let the cost of creating a new library be Cnew, and the marginal cost of continued sequencing C1. Based on these costs and the expected number of novel ESTs at any given time of sampling a library, we provide a decision rule whether to continue sequencing the old library or create a new one.

Date received: August 21, 2001


Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cahg-20.