Change search
ReferencesLink to record
Permanent link

Direct link
Efficient Top-K Fuzzy Interactive Query Expansion While Formulating a Query: From a Performance Perspective
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2013 (English)MasteroppgaveStudent thesis
Abstract [en]

Interactive query expansion and fuzzy search are two efficient techniques for assisting a user in an information retrieval process. Interactive query expansion helps the user refine a query by giving suggestions on how a query might be extended to further specify the actual information need of the user. Fuzzy search, on the other hand, supports the user by including results for terms that approximately equals the query string. This avoids reformulating queries with slight misspellings and will retrieve results for indexed terms not spelled as expected. This study will look at the performance aspects of combining these concepts to give the user real time suggestions on how to complete query as the query is formulated letter by letter. These suggestions will be a set of terms from the index that are fuzzy matches of the query string terms, and are chosen based on the individual rank of the term, the semantic correlation between the individual term and the edit distance between the query and the suggestion. The combination of these techniques is challenging from a performance aspect because each of them requires a lot of computation, and their relationship is such that these computations will be multiplicative when combined. Giving suggestions letter by letter as the user types requires a lookup for each letter and fuzzy search will expand each of these lookups with the fuzzy matches of the prefix to match against the index. For each of these different completions of the fuzzy matched prefixes, we will need to calculate the semantic correlation it has to the previous matched terms. This study will present three algorithms to give top-k suggestions for the single term case and then extend these in three ways to handle multi term queries. These algorithms will use a trie based term index with some extensions to enable fast lookup of top-k terms that match a given prefix and to assess the semantic correlation between the terms in the suggestion. The performance review will demonstrate that our approach will be viable to use for presenting the user with suggestions in real time even with a fairly large number of terms.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2013. , 83 p.
URN: urn:nbn:no:ntnu:diva-23010Local ID: ntnudaim:9966OAI: diva2:655644
Available from: 2013-10-12 Created: 2013-10-12 Last updated: 2013-10-12Bibliographically approved

Open Access in DiVA

fulltext(1331 kB)212 downloads
File information
File name FULLTEXT01.pdfFile size 1331 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)7 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 212 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 48 hits
ReferencesLink to record
Permanent link

Direct link