|
Organizers |
Two-stage imputation of missing genotype-by-environment data
by
A. Jonathan R. Godfrey
Massey University, Palmerston North
Large horticultural trials programmes that develop over seasons lead to incomplete genotype-by-environment (G×E) yield matrices. Standard G×E analyses require complete data, leaving the analyst with a choice between:
1. Removing incomplete rows or columns from the matrix (this is clearly undesirable if it reduces the matrix to a small, and therefore, ineffectual size), and 2. Finding a suitable means of imputing missing values in the G×E matrix, thus allowing standard G×E analyses to occur.
In this talk we begin by briefly discussing how cluster analysis can be used to impute missing G×E data. Clustering itself then relies on distance measures that are capable of handling incomplete data. We demonstrate how this can be done by partitioning Euclidean distance into two component distance measures; one reflects the difference in level of genotype performances, while the other reflects the difference in G×E interaction profiles. Clustering based on similarity of G×E interaction profiles is used to adjust available data from similar genotypes, to give imputed values.
This two-stage imputation method is easily implemented; furthermore, it is not reliant on selection of an appropriate model for the existing G×E interaction. Also, it does not rely on algorithm convergence, the situation which arises in the use of the EM algorithm.
Simulation results will be discussed. These show promise when compared to alternative imputation methods based on clustering of, or proximity to, other observations in the data
Date received: August 30, 2001
Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cahg-58.