• 51. Hansen, Preben
Effects of Foreign Language and Task Scenario on Relevance Assessment2005Inngår i: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 61, nr 5, s. 623-639Artikkel i tidsskrift (Fagfellevurdert)

Purpose ? This paper aims to investigate how readers assess relevance of retrieved documents in a foreign language they know well compared with their native language, and whether work-task scenario descriptions have effect on the assessment process. Design/methodology/approach ? Queries, test collections, and relevance assessments were used from the 2002 Interactive CLEF. Swedish first-language speakers, fluent in English, were given simulated information-seeking scenarios and presented with retrieval results in both languages. Twenty-eight subjects in four groups were asked to rate the retrieved text documents by relevance. A two-level work-task scenario description framework was developed and applied to facilitate the study of context effects on the assessment process. Findings ? Relevance assessment takes longer in a foreign language than in the user first language. The quality of assessments by comparison with pre-assessed results is inferior to those made in the users’ first language. Work-task scenario descriptions had an effect on the assessment process, both by measured access time and by self-report by subjects. However, effects on results by traditional relevance ranking were detectable. This may be an argument for extending the traditional IR experimental topical relevance measures to cater for context effects. Originality/value ? An extended two-level work-task scenario description framework was developed and applied. Contextual aspects had an effect on the relevance assessment process. English texts took longer to assess than Swedish and were assessed less well, especially for the most difficult queries. The IR research field needs to close this gap and to design information access systems with users’ language competence in mind.

Effects of Foreign Language and Task Scenario on Relevance Assessment2005Inngår i: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 61, nr 5, s. 623-639Artikkel i tidsskrift (Fagfellevurdert)

Purpose – This paper aims to investigate how readers assess relevance of retrieved documents in a foreign language they know well compared with their native language, and whether work-task scenario descriptions have effect on the assessment process. Design/methodology/approach – Queries, test collections, and relevance assessments were used from the 2002 Interactive CLEF. Swedish first-language speakers, fluent in English, were given simulated information-seeking scenarios and presented with retrieval results in both languages. Twenty-eight subjects in four groups were asked to rate the retrieved text documents by relevance. A two-level work-task scenario description framework was developed and applied to facilitate the study of context effects on the assessment process. Findings – Relevance assessment takes longer in a foreign language than in the user first language. The quality of assessments by comparison with pre-assessed results is inferior to those made in the users' first language. Work-task scenario descriptions had an effect on the assessment process, both by measured access time and by self-report by subjects. However, effects on results by traditional relevance ranking were detectable. This may be an argument for extending the traditional IR experimental topical relevance measures to cater for context effects. Originality/value – An extended two-level work-task scenario description framework was developed and applied. Contextual aspects had an effect on the relevance assessment process. English texts took longer to assess than Swedish and were assessed less well, especially for the most difficult queries. The IR research field needs to close this gap and to design information access systems with users' language competence in mind.

Interactivity and interaction1998Konferansepaper (Fagfellevurdert)
Texts and Language – Interactivity and Context2009Rapport (Annet vitenskapelig)

This technical report collects three years of experimentation in interactive cross-language information retrieval by SICS in the annual Cross-language Evaluation Forum (CLEF) evaluation campaigns 2003, 2004, and 2005. We varied simulated task context and measured user performance in document assessment task to find that choice of language and task context indeed have effects on the amount of efforts users need to expend to achieve task completion.

Cooperation, bookmarking, and thesaurus in interactive bilingual question answering2004Inngår i: Multilingual Information Access for Text, Speech and Images (5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers), Springer , 2004, 1, , s. 5s. 343-347Kapittel i bok, del av antologi (Fagfellevurdert)

The study presented involves several different contextual aspects and is the latest in a continuing series of exploratory experiments on information access behaviour in a multi-lingual context [1, 2]. This year’s interactive cross-lingual information access experiment was designed to measure three parameters we expected would affect the performance of users in cross-lingual tasks in languages in which the users are less than fluent. Firstly, introducing new technology, we measure the effect of topic-tailored term expansion on query formulation. Secondly, introducing a new component in the interactive interface, we investigate - without measuring by using a control group - the effect of a bookmark panel on user confidence in the reported result. Thirdly, we ran subjects pair-wise and allowed them to communicate verbally, to investigate how people may cooperate and collaborate with a partner during a search session performing a similar but non-identical search task.

User-Centered Interface Design for Cross-Language Information Retrieval2002Inngår i: Proceedings of the Twenty-fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002, 1, , s. 2Konferansepaper (Fagfellevurdert)

This paper reports on the user-centered design methodology and techniques used for the elicitation of user requirements and how these requirements informed the first phase of the user interface design for a Cross-Language Information Retrieval System. We describe a set of factors involved in analysis of the data collected and, finally discuss the implications for user interface design based on the findings.

Creating Bilingual Lexica Using Reference Wordlists for Alignment of Monolingual Semantic Vector Spaces2005Konferansepaper (Fagfellevurdert)

This paper proposes a novel method for automatically acquiring multi-lingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small {\em reference word list} of manually chosen reference points taken from available bi-lingual dictionaries. Other words can then be related to these reference points first in the one language and then in the other. In the present experiments, we apply the proposed method to comparable but non-parallel English-German data. The resulting bi-lingual lexicon is evaluated using an online English-German lexicon as gold standard. The results clearly demonstrate the viability of the proposed methodology.

• 58. Hulth, Anette
RISE, Swedish ICT, SICS.
Merging classifiers for improved information retrieval1999Konferansepaper (Fagfellevurdert)
SICS.
Automatic Keyword Extraction Using Domain Knowledge2008Inngår i: Computational Linguistics and Intelligent Text Processing, Berlin / Heidelberg: Springer , 2008, 1Kapittel i bok, del av antologi (Fagfellevurdert)

Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchi- cally organised domain specific thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously manually assigned keywords. In the presented ex- periment, the combination of the evidence from frequency analysis and the hierarchically organised thesaurus was done using inductive logic programming.

Automatic Keyword Extraction Using Domain Knowledge2008Inngår i: Computational Linguistics and Intelligent Text Processing, Berlin / Heidelberg: Springer , 2008, 1, , s. 10Kapittel i bok, del av antologi (Fagfellevurdert)

Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchi- cally organised domain specific thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously manually assigned keywords. In the presented ex- periment, the combination of the evidence from frequency analysis and the hierarchically organised thesaurus was done using inductive logic programming.

Spatial or narrative: a study of the Agneta and Frida system1999Konferansepaper (Fagfellevurdert)

We propose that analysing interviews with subjects who have been exposed to anthropomorphic characters from a metaphorical point of view can provide insights into how characters in the interface are perceived. In a study of the Agneta & Frida system (two characters that comment contents of web pages in an ironic, humorous manner) we found that subjects who used Agneta & Frida used more narrative verbs and adverbs than users who only browsed the web pages. In the latter case, more spatial verbs and adverbs were used. This may imply that normal web browsing is perceived as navigation through a space, while Agneta & Frida provides for a more narrative experience.

Some Principles for Route Descriptions Derived from Human Advisers1991Inngår i: Proceedings of the 13th Annual Meeting of the Cognitive Science Society, 1991Konferansepaper (Fagfellevurdert)
Some principles for route descriptions derived from human advisers1991Rapport (Annet vitenskapelig)

There is a need to make the interface of Route Guidence systems more flexible, so that they can adapt to the specific driver needs. Today's systems are primarily aimed at tourists, and interfaces for drivers that have more experience of a city have not been investigated. In this paper we describe a study with very experienced driver-navigators, where we have deduced principles as to how route descriptions are constructed and expressed by humans. Some of these principles are implementable, and a rough outline of a program is presented. Given a plan of how to go to A to B in a city, the program produces a verbal description of that plan. The goal is to incorporate verbal descriptions in Route Guidence systems, primarily aimed at driver-navigators with some knowledge of the city.

Inferring complex plans1993Inngår i: 1st International Workshop on Intelligent User Interfaces, 1993Konferansepaper (Fagfellevurdert)

We examine the need for plan inference in intelligent help mechanisms. We argue that previous approaches have drawbacks that need to be overcome to make plan inference useful. Firstly, plans have to be inferred - not extracted from the users? help requests. Secondly, the plans inferred must be more than a single goal or solitary user command.

Inferring complex plans1993Inngår i: Proceedings of the International Workshop on Intelligent User Interfaces, 1993, 1Konferansepaper (Fagfellevurdert)

We examine the need for plan inference in intelligent help mechanisms. We argue that previous approaches have drawbacks that need to be overcome to make plan inference useful. Firstly, plans have to be inferred - not extracted from the users’ help requests. Secondly, the plans inferred must be more than a single goal or solitary user command.

A glass box approach to adaptive hypermedia1996Inngår i: User Modeling and User-Adapted Interaction, Vol. 6, s. 157-184Artikkel i tidsskrift (Fagfellevurdert)

Utilising adaptive interface techniques in the design of systems introduces certain risks. An adaptive interface is not static, but will actively adapt to the perceived needs of the user. Unless carefully designed, these changes may lead to an unpredictable, obscure and uncontrollable interface. Therefore the design of adaptive interfaces must ensure that users can inspect the adaptivity mechanisms, and control their results. One way to do this is to rely on the user’s understanding of the application and the domain, and relate the adaptivity mechanisms to domain-specific concepts. We present an example of an adaptive hypertext help system POP, which is being built according to these principles, and discuss the design considerations and empirical findings that lead to this design.

A glass box approach to adaptive hypermedia1996Inngår i: User modeling and user-adapted interaction, ISSN 0924-1868, E-ISSN 1573-1391, Vol. 6, s. 157-184Artikkel i tidsskrift (Fagfellevurdert)

Utilising adaptive interface techniques in the design of systems introduces certain risks. An adaptive interface is not static, but will actively adapt to the perceived needs of the user. Unless carefully designed, these changes may lead to an unpredictable, obscure and uncontrollable interface. Therefore the design of adaptive interfaces must ensure that users can inspect the adaptivity mechanisms, and control their results. One way to do this is to rely on the user's understanding of the application and the domain, and relate the adaptivity mechanisms to domain-specific concepts. We present an example of an adaptive hypertext help system POP, which is being built according to these principles, and discuss the design considerations and empirical findings that lead to this design.

The language of smell: Connecting linguistic and psychophysical properties of odor descriptors2018Inngår i: Cognition, ISSN 0010-0277, E-ISSN 1873-7838, Vol. 178, s. 37-49Artikkel i tidsskrift (Fagfellevurdert)

The olfactory sense is a particularly challenging domain for cognitive science investigations of perception, memory, and language. Although many studies show that odors often are difficult to describe verbally, little is known about the associations between olfactory percepts and the words that describe them. Quantitative models of how odor experiences are described in natural language are therefore needed to understand how odors are perceived and communicated. In this study, we develop a computational method to characterize the olfaction-related semantic content of words in a large text corpus of internet sites in English. We introduce two new metrics: olfactory association index (OAI, how strongly a word is associated with olfaction) and olfactory specificity index (OSI, how specific a word is in its description of odors). We validate the OAI and OSI metrics using psychophysical datasets by showing that terms with high OAI have high ratings of perceived olfactory association and are used to describe highly familiar odors. In contrast, terms with high OSI have high inter-individual consistency in how they are applied to odors. Finally, we analyze Dravnieks's (1985) dataset of odor ratings in terms of OAI and OSI. This analysis reveals that terms that are used broadly (applied often but with moderate ratings) tend to be olfaction-unrelated and abstract (e.g., “heavy” or “light”; low OAI and low OSI) while descriptors that are used selectively (applied seldom but with high ratings) tend to be olfaction-related (e.g., “vanilla” or “licorice”; high OAI). Thus, OAI and OSI provide behaviorally meaningful information about olfactory language. These statistical tools are useful for future studies of olfactory perception and cognition, and might help integrate research on odor perception, neuroimaging, and corpus-based linguistic models of semantic organization.

The language of smell: Connecting linguistic and psychophysical properties of odor descriptors2018Inngår i: Cognition, ISSN 0010-0277, E-ISSN 1873-7838, Vol. 178, s. 37-49Artikkel i tidsskrift (Fagfellevurdert)

The olfactory sense is a particularly challenging domain for cognitive science investigations of perception, memory, and language. Although many studies show that odors often are difficult to describe verbally, little is known about the associations between olfactory percepts and the words that describe them. Quantitative models of how odor experiences are described in natural language are therefore needed to understand how odors are perceived and communicated. In this study, we develop a computational method to characterize the olfaction related semantic content of words in a large text corpus of internet sites in English. We introduce two new metrics: olfactory association index (OAI, how strongly a word is associated with olfaction) and olfactory specificity index (OSI, how specific a word is in its description of odors). We validate the OAI and OSI metrics using psychophysical datasets by showing that terms with high OM have high ratings of perceived olfactory association and are used to describe highly familiar odors. In contrast, terms with high OSI have high inter-individual consistency in how they are applied to odors. Finally, we analyze Dravnieks's (1985) dataset of odor ratings in terms of OAI and OSI. This analysis reveals that terms that are used broadly (applied often but with moderate ratings) tend to be olfaction-unrelated and abstract (e.g., heavy or light; low OAI and low OSI) while descriptors that are used selectively (applied seldom but with high ratings) tend to be olfaction-related (e.g., vanilla or licorice; high OM). Thus, OAI and OSI provide behaviorally meaningful information about olfactory language. These statistical tools are useful for future studies of olfactory perception and cognition, and might help integrate research on odor perception, neuroimaging, and corpus-based linguistic models of semantic organization.

Report on the Fifth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’12): CIKM WORKSHOP REPORT2013Inngår i: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 47, nr 1, s. 38-45Artikkel i tidsskrift (Fagfellevurdert)

There is an increasing amount of structure on the web as a result of modern web lan- guages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. These meaningful, semantic, annotations hold the promise to significantly en- hance information access, by enhancing the depth of analysis of today’s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valu- able semantic cues can be put to fruitful use. To complicate matters, standard text search excels at shallow information needs expressed by short keyword queries, and here semantic annotation contributes very little, if anything. The main questions for the workshop are how to leverage the rich context currently available, especially in a mobile search scenario, giving powerful new handles to exploit semantic annotations. And how can we fruitfully combine information retrieval and knowledge intensive approaches, and for the first time work actively toward a unified view on exploiting semantic annotations.

There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, there is a need for further integration of symbolic and statistical methods with each adopting parts of the other’s strengths, by focusing on types of annotations that are informed by and meaningful for the task at hand, and relying on automatic information extraction and annotation based on web scale observations. Second, the discussion contributed to the creation of a concrete shared corpus with state of the art semantic annotation—in particular a web crawl annotated with Freebase concepts—that will benefit research in this area for years to come.

Report on the Third Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR)2011Inngår i: SIGIR Forum, Vol. 45, s. 33-41Artikkel i tidsskrift (Fagfellevurdert)

There is an increasing amount of structure on the Web as a result of modern Web lan- guages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by en- hancing the depth of analysis of today’s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, there is a framework emerging to view annota- tion as (1) a linking procedure, connecting (2) an analysis of information objects with (3) a semantic model of some sort, expressing relations that contribute to (4) a task of interest to end users. Third, we should look at complex tasks that cannot be comprehensible articulated in a few keywords, and embrace interaction both to incrementally refine the search request and to explore the results at various stages, guided by the semantic structure.

Report on the Third Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR), Toronto, Canada2011Inngår i: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 45, nr 1, s. 33-41Artikkel i tidsskrift (Fagfellevurdert)

There is an increasing amount of structure on the Web as a result of modern Web lan- guages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by en- hancing the depth of analysis of today?s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, there is a framework emerging to view annota- tion as (1) a linking procedure, connecting (2) an analysis of information objects with (3) a semantic model of some sort, expressing relations that contribute to (4) a task of interest to end users. Third, we should look at complex tasks that cannot be comprehensible articulated in a few keywords, and embrace interaction both to incrementally refine the search request and to explore the results at various stages, guided by the semantic structure.

Third workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR)2010Konferansepaper (Fagfellevurdert)

There is an increasing amount of structure on the Web as a result of modern Web languages, user tagging and annotation, and emerg- ing robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by enhancing the depth of analysis of today’s systems. Currently, we have only started exploring the possibilities and only begin to un- derstand how these valuable semantic cues can be put to fruitful use. Unleashing the potential of semantic annotations requires us to think outside the box, by combining the insights of natural lan- guage processing (NLP) to go beyond bags of words, the insights of databases (DB) to use structure efficiently even when aggregating over millions of records, the insights of information retrieval (IR) in effective goal-directed search and evaluation, and the insights of knowledge management (KM) to get grips on the greater whole. The Workshop aims to bring together researchers from these dif- ferent disciplines and work together on one of the greatest chal- lenges in the years to come. The desired result of the workshop will be concrete insight into the potential of semantic annotations, and in concrete steps to take this research forward; synchronize related research happening in NLP, DB, IR, and KM, in ways that combine the strengths of each discipline; and have a lively, interactive work- shop were everyone contributes and that inspires attendees to think “outside the box.”

Computing with large random patterns2001Inngår i: Foundations of Real-World Intelligence, Stanford, California: CSLI Publications , 2001, 1, s. 251-311Kapittel i bok, del av antologi (Fagfellevurdert)

We describe a style of computing that differs from traditional numeric and symbolic computing and is suited for modeling neural networks. We focus on one aspect of neurocomputing,'' namely, computing with large random patterns, or high-dimensional random vectors, and ask what kind of computing they perform and whether they can help us understand how the brain processes information and how the mind works. Rapidly developing hardware technology will soon be able to produce the massive circuits that this style of computing requires. This chapter develops a theory on which the computing could be based.

En rekommenderad svensk språkteknologisk terminologi2016Inngår i: Proc. Sixth Swedish Language Technology Conference, Umeå: Svenska språkteknologitermgruppen , 2016Konferansepaper (Fagfellevurdert)

In 2014 the Swedish Language Technology Terminology Group was created, with representatives from different parts of the language technology community, both higher education and research, industry and governmental agencies. In 2016 we have recommended Swedish terms for the 270 language technological concepts in the Bank of Finnish Terminology in Arts and Sciences. The language technology terms are published on folkets-lexikon.csc.kth.se/LTterminology, where anyone can lookup Swedish and English terms interactively and read the full list of terms. We also try to enter the most important Swedish terminology into the Swedish Wikipedia. We encourage use of these Swedish terms and welcome suggestions for improvements of the Swedish terminology.

Practical Issues in Information Access System Evaluation2017Inngår i: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 51, nr 1, s. 67-72Artikkel i tidsskrift (Fagfellevurdert)

This paper is a report from a workshop on Evaluation of Information Systems in Commercial Settings, inspired by the industrial day at SIGIR 2016. Small and medium size enterprises often lack the resources needed to develop proper evaluation infrastructures, but also to follow the research development in the field of evaluation. Similarly,academics lag behind in (a) understanding real practical issues raised when it comes to the evaluation of real systems - e.g. even depth-k pooling is often infeasible when an SME has a single ranking algorithm developed, and (b) sensing the breadth of applications and tasks on which systems require evaluation and the challenges of them. Large enterprises with the necessary resources and the data sets and flows to work with are hesitant to make their tests public, for both commercial and legal reasons.This workshop brought together representatives from technology companies, large and small, media houses, industrial consultants and academic research in information access for a discussion on practical issues and solutions to these issues.

Dilemma - An Instant Lexicographer1994Inngår i: Proceedings of the 15th International Conference on Computational Linguistics, 1994, 1, Vol. 1, s. 82-84Konferansepaper (Fagfellevurdert)

Dilemma is a lexicography component in a tool kit for translation. Dilemma presents on the request of a text writer relevant lexical information extracted from previously translated parallel texts by statistical processing. Dilemma is currently used in the ongoing translation of EC legislation into the languages of candidate member countries.

A Computer Program for Recognizing Blazons1988Independent thesis Basic level (degree of Bachelor), 20hpOppgave

This candidate of philosophy thesis describes a computer program which analyzes so called blazons, i.e., classic descriptions of heraldic coats-of-arms. If an expression is recognized as an acceptable blazon, the program produces a graphic representation of the coat-of-arms in question on screen.

Affect, appeal, and sentiment as factors influencing interaction with multimedia information2009Inngår i: Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, 2009, 1, , s. 4s. 8-11Konferansepaper (Fagfellevurdert)
An algebra for recommendations: Using reader data as a basis for measuring document proximity1990Rapport (Annet vitenskapelig)

A measure for proximity between documents is defined, based on data from readers. This proximity measure can be further investigated as a tool document retrieval, and as to provide data for concept formation experiments.

Assessed Relevance and Stylistic Variation1996Konferansepaper (Fagfellevurdert)

Texts exhibit considerable stylistic variation. This paper reports an experiment where a large corpus of documents is analyzed using various simple stylistic metrics. A subset of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied.

Changing the subject; one way of measuring trust in information2008Konferansepaper (Fagfellevurdert)

For the purposes of two recent student projects hosted at SICS, we defined a target notion based on trust in lieu of topical relevance. Given controversial search task topics that interested them, subjects performed experiments with enthusiasm and reported that the experiment had influenced their state of mind. This forms an implicit test of trust in the retrieved material. While the respondents reported a medium, to low-medium range of trust in the materials, and did not believe they had found all pertinent facets of opinion pertaining to the topic, they still adjusted their opinions on the matter to some extent and reported having learned about the topic.

CHORUS Deliverable 4.1: Report of the 1st CHORUS Workshop2007Annet (Annet vitenskapelig)

Minutes of Rocquencourt Workshop – INRIA March 13, 2007

CHORUS Deliverable 4.3c: Affect, appeal, and sentiment as factors influencing interaction with multi media information2009Annet (Annet vitenskapelig)

The 7th CHORUS workshop on “Affect, Appeal, and Sentiment as Factors Influencing Interaction with Multimedia Information” was held on May 28, 2009, Brussels, immediately following the Third CHORUS Conference, hosted by the European Commission at their Avenue Beaulieu premises. Participation was limited to invited speakers, and comprised sixteen researchers from fourteen research institutes in eight countries.

CHORUS Deliverable 4.4: Report of the 2nd CHORUS Conference2008Annet (Annet vitenskapelig)

The Second CHORUS Conference and third Yahoo! Research Workshop on the Future of Web Search was held during April 4-5, 2008, in Granvalira, Andorra to discuss future directions in multi-medial information access and other specialised topics in the near future of retrieval. Attendance was at capacity, with 97 participants from 11 countries and 3 continents.

Compound terms and their constituent elements in information retrieval2005Konferansepaper (Fagfellevurdert)

Compounds, especially in languages where compounds are formed by concatenation without intervening whitespace between elements, pose challenges to simple text retrieval algorithms. Search queries that include compounds may not retrieve texts where elements of those compounds occur in uncompounded form; search queries that lack compounds will not retrieve texts where the salient elements are buried inside compounds. This study explores the distributional characteristics of compounds and their constituent elements using Swedish, a compounding language, as a test case. The compounds studied are taken from experimental search topics given for CLEF, the Cross-Language Evaluation Forum and their distributions are related to relevance assessments made on the collection under study and evaluated in terms of divergence from expected random distribution over documents. The observations made have direct ramifications on e.g. query analysis and term weighting approaches in information retrieval system design.

Conventions and mutual expectations — understanding sources for web genres2010Inngår i: Genres on the Web: Computational Models and Empirical Studies, Springer Verlag , 2010, 8Kapittel i bok, del av antologi (Fagfellevurdert)

Genres can be understood in many different ways. They are often perceived as a primarily sociological construction, or, alternatively, as a stylostatistically observable objective characteristic of texts. The latter view is more common in the research field of information and language technology. These two views can be quite compatible and can inform each other; this present investigation discusses knowledge sources for studying genre variation and change by observing reader and author behaviour rather than performing analyses on the information objects themselves.

From boxes and arrows to conversation and negotiation: or how research should be amusing, awful, and artificial2006Inngår i: ICT for People: 40 Years of Academic Development in Stockholm, Stockholm, Sweden: Department of Computer and Systems Sciences; Stockholm University and the Royal Institute of Technology , 2006, 2, , s. 6Kapittel i bok, del av antologi (Fagfellevurdert)

The story of how a graduate student went from formalism to data, a brief tale of how engineering without tradition can lead thought in the right direction, and a mild caution of how intellectual skepticism is worth little without a corresponding dose of intellectual enthusiasm.

Geoblockering lätt att kringgå2015Inngår i: Medieormen - Journalistisk utveckling och mediedebattArtikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))

Onsdag 6 maj förväntas EU-kommissionen presentera en handlingsplan för den digitala marknaden och en fråga som seglat upp för debatt är den om geoblockering, något som kommissionens vice ordförande Andrus Ansip tydligt har tagit ställning emot.   Jussi Karlgren, professor i språkteknologi vid KTH, reder ut vad debatten handlar om.

Hur kan informationsystem fås att söka på flera språk?2001Inngår i: Språkbitar, Stockholm, Sverige: Svenska förlaget , 2001, 3, , s. 185Kapittel i bok, del av antologi (Fagfellevurdert)
Information Retrieval Systems: Statistics and Linguistics2005Inngår i: Legal management of information systems : incorporating law in e-solutions, Lund: Studentlitteratur , 2005, 1, , s. 530s. 295-336Kapittel i bok, del av antologi (Fagfellevurdert)

Organizing a document collection so that documents can be found easily is difficult, especially if more than one reader is expected to be able to use the collection. This text gives a brief overview of existing automatic methods for indexing and retrieval of text documents and identifies some directions for future research.

Informationsåtkomst på flera språk1999Inngår i: Språk i Norden / [ed] Lindgren, Birgitta, Svenska språknämnden , 1999Kapittel i bok, del av antologi (Annet vitenskapelig)

Att hitta information kan vara knivigt. Det kan vara s? att den som s?ker information vet exakt vad den vill ha fram, men inte har precis klart f?r sig var det finns; det kan ocks? vara s? att den som s?ker inte riktigt vet vad som finns men har en k?nsla av att n?gon sorts hj?lp finns att f?, bara fr?gan ?r r?tt st?lld. De senaste millennierna har m?nniskor lagrat information p? externa lagringsmedia av olika slag: det finns mer och mer information att tillg?, men av skiftande kvalitet, otydliga ?garf?rh?llanden, oklar provenans och det ?r mindre och mindre tydligt vem l?saren kan fr?ga till r?ds f?r att hitta r?tt. Det finns en m?ngd olika tekniker f?r att hj?lpa folk hitta information. Hyllor och ordentligt markerade bokryggar ?r ett gott f?rsta steg, alfabetisk eller n?gon annan systematisk hyllordning ett ytterligare, kortkataloger f?r tillexempel ?mnesordsregister med handskrivna nyckelord som ger andra sorteringskriterier ?n hyllorna ett tredje. Ju fler olika sorters index, desto l?ttare att hitta grejerna, och desto arbetsammare att adminstrera och uppr?tth?lla. Det ?r naturligtvis h?r datorer kommer in. Biblioteken arbetar idag med tekniska hj?lpmedel f?r kataloghantering, och informationsteknologin anv?nds just f?r det den ?r b?st p?: att adminstrera stora m?ngder information och sprida den med v?ldigt l?g marginalkostnad - allt vilket oftast anses vara bra.

Informationsåtkomst på flera språk1999Inngår i: Språk i Norden, Stockholm: Svenska språknämnden , 1999, 4, , s. 174Kapittel i bok, del av antologi (Fagfellevurdert)

Att hitta information kan vara knivigt. Det kan vara så att den som söker information vet exakt vad den vill ha fram, men inte har precis klart för sig var det finns; det kan också vara så att den som söker inte riktigt vet vad som finns men har en känsla av att någon sorts hjälp finns att få, bara frågan är rätt ställd. De senaste millennierna har människor lagrat information på externa lagringsmedia av olika slag: det finns mer och mer information att tillgå, men av skiftande kvalitet, otydliga ägarförhållanden, oklar provenans och det är mindre och mindre tydligt vem läsaren kan fråga till råds för att hitta rätt. Det finns en mängd olika tekniker för att hjälpa folk hitta information. Hyllor och ordentligt markerade bokryggar är ett gott första steg, alfabetisk eller någon annan systematisk hyllordning ett ytterligare, kortkataloger för tillexempel ämnesordsregister med handskrivna nyckelord som ger andra sorteringskriterier än hyllorna ett tredje. Ju fler olika sorters index, desto lättare att hitta grejerna, och desto arbetsammare att adminstrera och upprätthålla. Det är naturligtvis här datorer kommer in. Biblioteken arbetar idag med tekniska hjälpmedel för kataloghantering, och informationsteknologin används just för det den är bäst på: att adminstrera stora mängder information och sprida den med väldigt låg marginalkostnad - allt vilket oftast anses vara bra.

Meaningful models for information access systems2005Inngår i: Inquiries into Words, Constraints and Contexts: Festschrift in the Honour of Kimmo Koskenniemi on his 60th Birthday, Stanford, California: CSLI Publications , 2005, 1, s. 241-248Kapittel i bok, del av antologi (Fagfellevurdert)
Mumbling - User-Driven Cooperative Interaction1994Rapport (Annet vitenskapelig)

This paper suggests a scheme for raising the cooperativeness of natural language interfaces without changing either modality or system linguistic competence, but by heightening the level of interactivity and by aiding the user in maintaining the responsibility for the discourse. In short: hands-off-pragmatics at the computer interface.

New Measures to Investigate Term Typology by Distributional Data2013Inngår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), May 22–24, 2013, Oslo University, Norway. NEALT Proceedings Series 16 / [ed] Stephan Oepen, Kristin Hagen, Janne Bondi Johannessen, Linköping: Linköping University Electronic Press, 2013Konferansepaper (Fagfellevurdert)

This report describes a series of exploratory experiments to establish whether terms of different semantic type can be distinguished in useful ways in a semantic space constructed from distributional data. The hypotheses explored in this paper are that some words are more variant in their distribution than others; that the varying semantic character of words will be reflected in their distribution; and this distributional difference is encoded in current distributional models, but that the information is not accessible through the methods typically used in application of them. This paper proposes some new measures to explore variation encoded in distributional models but not usually put to use in understanding the character of words represented in them. These exploratory findings show that some proposed measures show a wide range of variation across words of various types.

New Text - New Conversations in the Media Landscape2006Inngår i: ERCIM News, Vol. 66Artikkel i tidsskrift (Fagfellevurdert)

New text - that is, new forms of textual communication - such as blogs, instant messages, and Wikis contrast with traditional textual genres in some respects and remain true to them in others. This calls for new research methodologies and provides new challenges for text research.

Newsgroup Clustering Based On User Behavior - A Recommendation Algebra1994Rapport (Annet vitenskapelig)

User models are a tool for guiding system behavior in interactive systems, and their utility and properties, desirable and undesirable, have been investigated in this context. There are several ways of utilizing information about the user that have NOT been implemented, however. In this paper a scheme for users to peek at other users' user models to extract information is proposed, in an information retrieval or information filtering domain. The material used for the study is a set of .newsrc files.

Non-topical factors in information access1999Inngår i: Proceedings of the 4th World Conference on the WWW and Internet, 1999, 1Konferansepaper (Fagfellevurdert)

Research in information retrieval has traditionally concentrated on making assumptions about the content of documents based on very shallow semantic analysis through word occurrence statistics of various kinds. But texts are more than bags of words, and the semantic analysis information retrieval systems typically used is overly simple. There is ample reason to try to broaden the view of what text is and why. Better content analysis alone will not be enough. Texts are more than their meaning. Texts have structure, they have context, they are written in a style conformant or discordant to a genre they are to be understood in, they may be carefully written or hastily thrown together, they are written by various types of agent for various reasons. Besides information to be found in the text or from the author, texts are used by readers of various backgrounds, for various reasons, and with varying degree of satisfaction. This paper outlines a framework within which to find more knowledge from texts than an approximation of their topic, and gives examples of how to use this knowledge to design useful tools for information access.

Open Research Questions for Linguistics in Information Access.2000Inngår i: Text- and Speech-Triggered Information Access, 8th ELSNET Summer School, Chios Island, Greece, July 15-30, 2000, Revised Lectures, Springer , 2000, 1, Vol. 2705, s. 182-191Kapittel i bok, del av antologi (Fagfellevurdert)

Information access systems based on standard mechanisms can be improved. Not because of any obvious drawbacks in the mechanisms themselves: they provide consistent and stable results, with variation from system to system surprisingly small; the reason to continue work is that the stable results are not only consistent but consistently mediocre. This paper claims linguistic research has a important role to play in the future of information access.

