Change search
ReferencesLink to record
Permanent link

Direct link
Modeling Static Caching in Web Search Engines
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2012 (English)In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, ISSN ISSN 0302-9743, EISSN 1611-3349, Vol. 7224, 436-446 p.Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2012. Vol. 7224, 436-446 p.
URN: urn:nbn:no:ntnu:diva-20054DOI: 10.1007/978-3-642-28997-2_37OAI: diva2:602215

The original publication is available at

Available from: 2013-01-31 Created: 2013-01-31 Last updated: 2013-03-05Bibliographically approved
In thesis
1. Efficient Query Processing in Distributed Search Engines
Open this publication in new window or tab >>Efficient Query Processing in Distributed Search Engines
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, which determine the revenue of the search engine. In this context, any improvement in query processing efficiency can reduce the operational costs and improve user satisfaction, hence improve the overall benefit. In this thesis, we elaborate on query processing efficiency, address several problems within partitioned query processing, pruning and caching and propose several novel techniques: First, we look at term-wise partitioned indexes and address the main limitations of the state-of-the-art query processing methods. Our first approach combines the advantage of pipelined and traditional (non-pipelined) query processing. This approach assumes one disk access per posting list and traditional term-at-a-time processing. For the second approach, we follow an alternative direction and look at document-at-a-time processing of sub-queries and skipping. Subsequently, we present several skipping extensions to pipelined query processing, which as we show can improve the query processing performance and/or the quality of results. Then, we extend one of these methods with intra-query parallelism, which as we show can improve the performance at low query loads. Second, we look at skipping and pruning optimizations designed for a monolithic index. We present an efficient self-skipping inverted index designed for modern index compression methods and several query processing optimizations. We show that these optimizations can provide a significant speed-up compared to a full (non-pruned) evaluation and reduce the performance gap between disjunctive (OR) and conjunctive (AND) queries. We also propose a linear programming optimization that can further improve the I/O, decompression and computation efficiency of Max-Score. Third, we elaborate on caching in Web search engines in two independent contributions. First, we present an analytical model that finds the optimal split in a static memory-based two-level cache. Second, we present several strategies for selecting, ordering and scheduling prefetch queries and demonstrate that these can improve the efficiency and effectiveness of Web search engines. We carefully evaluate our ideas either using a real implementation or by simulation using real-world text collections and query logs. Most of the proposed techniques are found to improve the state-of-the-art in the conducted empirical studies. However, the implications and applicability of these techniques in practice need further evaluation in real-life settings.

Place, publisher, year, edition, pages
NTNU: NTNU-trykk, 2013
Doctoral theses at NTNU, ISSN 1503-8181 ; 2013:22
National Category
Information and communication technology
urn:nbn:no:ntnu:diva-20206 (URN)978-82-471-4134-2 (printed ver.) (ISBN)978-82-471-4135-9 (electronic ver.) (ISBN)
Public defence
2013-02-05, 00:00
Available from: 2013-02-19 Created: 2013-02-19 Last updated: 2013-02-20Bibliographically approved

Open Access in DiVA

fulltekst(288 kB)290 downloads
File information
File name FULLTEXT01.pdfFile size 288 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text
By organisation
Department of Computer and Information Science
In the same journal
Lecture Notes in Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 290 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 55 hits
ReferencesLink to record
Permanent link

Direct link