Change search
ReferencesLink to record
Permanent link

Direct link
Graph-Based Representations for Textual Case-Based Reasoning
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2011 (English)MasteroppgaveStudent thesis
Abstract [en]

This thesis presents a graph-based approach to the problem of text representation. The work is motivated by the need for better representations for use in textual Case-Based Reasoning (CBR). In CBR new problems are solved by reasoning based on similar past problem cases. When the cases are represented in free text format, measuring the similarity between a new problem and previously solved problems become a challenging task. The case documents need to be re-represented before they can be compared/matched. Textual CBR (TCBR) addresses this issue. We investigate automatic re-representation of textual cases, in particular measuring the salience of features (entities in the text) towards this end. We use the classical vector space model in Information Retrieval (IR) but investigate whether graph-representation and salience inference using graphs can improve on the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) measures, emph{bag of words} approaches predominant in IR. Our special focus is whether, and possibly how, the co-occurrence and the syntactic dependency relations between terms have an impact on feature weighting. We measure salience through the notion of graph centrality. We experiment with two types of application tasks, classification and case retrieval. Although classification is not a typical TCBR task, it is easier to find datasets for this application, and the centrality measures we have studied are not specific to TCBR. The experiments on this task are therefore relevant to the second application task which is our ultimate target. We test various centrality metrics described in the literature, make a distinction between local and global weighting measures and compare them for both application tasks. In general, our graph-based salience inference methods perform better than TF and TF-IDF.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2011. , 143 p.
Keyword [no]
ntnudaim:5757, MTDT datateknikk, Intelligente systemer
URN: urn:nbn:no:ntnu:diva-13575Local ID: ntnudaim:5757OAI: diva2:440510
Available from: 2011-09-13 Created: 2011-09-13

Open Access in DiVA

fulltext(1748 kB)714 downloads
File information
File name FULLTEXT01.pdfFile size 1748 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(47 kB)28 downloads
File information
File name COVER01.pdfFile size 47 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(35635 kB)36 downloads
File information
File name ATTACHMENT01.zipFile size 35635 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 714 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 81 hits
ReferencesLink to record
Permanent link

Direct link