Change search
ReferencesLink to record
Permanent link

Direct link
CLIRch, an extensible open source framework for query translation: evaluated for use on the Norwegian/Spanish language pair.
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2012 (English)MasteroppgaveStudent thesis
Abstract [en]

CLIR, Cross-Lingual Information Retrieval, is a field of research that can be highly useful in web search and for several other applications. Extensive research has been done on possible CLIR implementations, but as of yet there are no open source frameworks or applications readily available. The thesis focuses on building such a framework and evaluating it for use on the Norwegian/Spanish language pair. The framework implemented uses query translation to submit queries to existing information retrieval (IR) implementations, and the framework itself holds no low-level IR algorithms. Experiments were performed on a small parallel corpus of Norwegian and Spanish texts, using the Xapian and PostgreSQL IR implementations. A comprehensive comparison of possible configurations was done, and certain measures were shown to be effective when searching for documents in either language. The framework is implemented in a modular architecture, allowing the suggested additions and amendments to be implemented as add-on components. This is the main intent of the framework, and eases the process of building support for additional languages as well. For easing the adoption of the framework, additional components and data may be beneficial. Some improvements are also possible for the tested language pair, through obtaining larger data sets or implementing certain language specific algorithms. Of particular interest is implementing effective decompounding of Norwegian compound words and phrase translation support. Suggestions are also made for how the system can be used to perform CLIR tasks in other languages.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2012. , 64 p.
Keyword [no]
ntnudaim:5665, MIT informatikk, Kunstig intelligens og læring
URN: urn:nbn:no:ntnu:diva-18357Local ID: ntnudaim:5665OAI: diva2:565861
Available from: 2012-11-08 Created: 2012-11-08

Open Access in DiVA

fulltext(645 kB)301 downloads
File information
File name FULLTEXT01.pdfFile size 645 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)26 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(32 kB)24 downloads
File information
File name ATTACHMENT01.zipFile size 32 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 301 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 141 hits
ReferencesLink to record
Permanent link

Direct link