Searching for information on the Web can be frustrating. One of the reasons is the ambiguity of words. The work presented in this thesis concentrates on how the effectiveness of standard information retrieval systems can be enhanced with semantic technologies like ontologies. Ontologies are knowledge models that can represent knowledge of any universe of discourse by describing how concepts of a domain are related. Creating and maintaining ontologies can be tedious and costly. However, we focus on reusing ontologies, rather than engineering, and on their applicability to improve the retrieval effectiveness of existing search systems.
The aim of this work is to find an effective approach for applying ontologies to existing search systems. The basic idea is that these ontologies can be used to tackle the problem of ambiguous words and hence improve the retrieval effectiveness. Our approach to semantic search builds on feature vectors (FV). The basic idea is to connect the (standardised) domain terminology encoded in an ontology to the actual terminology used in a text corpus. Therefore, we propose to associate every ontology entity (classes and individuals are called entities in this work) with a FV that is tailored to the actual terminology used in a text corpus like the Web. These FVs are created off-line and later used on-line to filter (i.e. to disambiguate search) and re-rank the search results from an underlying search system. This pragmatic approach is applicable to existing search systems since it only depends on extending the query and presentation components, in other words there is no need to alter either the indexing or the ranking components of the existing systems.
A set of experiments have been carried out and the results report on improvement by more than 10%. Furthermore, we have shown that the approach is neither dependent on highly specific queries nor on a collection comprised only of relevant documents. In addition, we have shown that the FVs are relatively persistent, i.e. little maintenance of the FVs is required.
In this work, we focus on the creation and evaluation of these feature vectors. As a result, a part of the contribution of this work is a framework for the construction of FVs. Furthermore, we have proposed a set of metrics to measure the quality of the created FVs. We have also provided a set of guidelines for optimal construction of feature vectors for different categories of ontologies.