|
Organizers |
Challenges of indexing into large datasets with the aim of similarity search
by
Aleksandar Stojmirovic
Victoria University of Wellington
Coauthors: Vladimir Pestov
We survey the existing indexing schemes into large datasets with the purpose of similarity-based information retrieval and analyse the problem of dimensionality curse in the context of an interplay between the phenomenon of concentration of measure and complexity. In particular, we stress the difference between `inner' similarity search (when the query points all belong to the indexed dataset) and the `outer' search (when the space of potential query points is so large as to defy attempts at creating a precomputed index structure). We illustrate the main points using the authors' attempt to index into datasets of protein fragments currently under way.
http://www.mcs.vuw.ac.nz/~aleksand
Date received: October 10, 2001
Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cahf-23.