Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Resource Lenient Approaches to Cross Language Information Retrieval: Using Amharic
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2011 (Engelska)Doktorsavhandling, monografi (Övrigt vetenskapligt)
Abstract [en]

Information Retrieval (IR) deals with finding and presenting information from a collection of documents/data that are relevant to an information need (a query) expressed by a user. Cross Language Information Retrieval (CLIR) is a subfield of IR where queries are posed in a different language than that of the document collection. Computational linguistic tools and resources are essential to accomplish the tasks in CLIR and to date, CLIR research is dominated by a very limited number of languages for which such tools and resources are available. In order to facilitate global information sharing, it is important to enable access to information using as many languages as possible. This requires an investigation into the feasibility of CLIR for languages with a limited set of computational linguistic resources.

Amharic is a well-studied language with a rich history and culture, but has very limited computational linguistic tools and resources. This dissertation provides an in depth investigation into a CLIR system for Amharic (against English and French document collections). Scalable techniques were developed to accomplish Amharic CLIR tasks and each task was evaluated individually as a stand alone experiment. Large scale IR experiments were then conducted in order to evaluate the effect of three parameters, namely, transliteration, word sense discrimination, and term selection based on part of speech tags, on the overall IR performance. The effects were measured by individually tuning each of these parameters through a series of benchmarking experiments, geared towards optimizing retrieval precision as well as recall. The results give an insight into the performance of the chosen approaches, the challenges, and their impact on the overall IR performance.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2011. , 152 s.
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 11-003
Nyckelord [en]
Cross language information retrieval, Amharic, stemming, MRD based query translation, transliteration, named entity detection, translation term selection, sense discrimination, POS tagging
Nationell ämneskategori
Systemvetenskap
Forskningsämne
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-57267ISBN: 978-91-7447-236-3 (tryckt)OAI: oai:DiVA.org:su-57267DiVA: diva2:414936
Disputation
2011-06-13, sal C, Forum 100, Isafjordsgatan 39, Kista, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2011-05-12 Skapad: 2011-05-05 Senast uppdaterad: 2011-05-13Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas

Sök vidare i DiVA

Av författaren/redaktören
Argaw, Atelach Alemu
Av organisationen
Institutionen för data- och systemvetenskap
Systemvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 421 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf