© Maria del Rosario Girardi
Ph.D. Thesis No. 2782
Computer Science Department
University of Geneva, Switzerland
Its main contributions are a mechanism for the automatic classification of software using their descriptions in natural language, a mechanism for an effective retrieval of software through queries in free text, a knowledge-based internal representation of software components and associated indexing information and a similarity model to compute the closeness between user's queries and software components in a software base in order to establish an order of the retrieved candidates. A browsing mechanism based on the similarity model and on a clustering technique is also proposed for exploratory search.
The classification mechanism is supported by several linguistic strategies based on a software case formalism which provides an interpretation of each sentence in a software description into a set of semantic and nominal cases in a frame-based internal representation. The similarity model consists of a set of measures based on the lexical, syntactic and semantic information available in the internal representation of queries and software components and on the conceptual distance between simple terms.
Four case studies have been developed with different software collections to evaluate the classification strategies, the retrieval effectiveness and the usefulness of the browsing approach to identify reuse opportunities, using a prototype implemented in Prolog by BIM.
For those who cannot print Postscript documents in A4 format, a letter size format has been added.
Site Hosting: Bronco