Change search
ReferencesLink to record
Permanent link

Direct link
Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2012 (English)MasteroppgaveStudent thesis
Abstract [en]

Systematic literature reviews are an important tool in Evidence-based Software Engineering, but require a large amount of effort and time from the researchers. Data extraction is an important step in these reviews, but current practice requires the researchers to manually extract large amounts of data. This thesis investigates the possibility of developing a prototype for automatic extraction, so to reduce the time spent on manually extracting this data. By reviewing related research, and experimenting with different features and machine learning models, two different models were implemented in the prototype: Conditional Random Fields for information extraction and Maximum Entropy for text classification. The models achieved average F1 performance score of 67.02% and 73.82%, respectively. These results can be characterized as good results, and show that it is possible to automate the data extraction process, by annotating a small part of the dataset and training machine learning models to perform the extraction.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2012. , 90 p.
Keyword [no]
ntnudaim:6040, MIT informatikk, Informasjonsforvaltning
URN: urn:nbn:no:ntnu:diva-18415Local ID: ntnudaim:6040OAI: diva2:565909
Available from: 2012-11-08 Created: 2012-11-08

Open Access in DiVA

fulltext(1893 kB)944 downloads
File information
File name FULLTEXT01.pdfFile size 1893 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)14 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(20991 kB)12 downloads
File information
File name ATTACHMENT01.zipFile size 20991 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 944 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 54 hits
ReferencesLink to record
Permanent link

Direct link