Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews
Systematic literature reviews are an important tool in Evidence-based
Software Engineering, but require a large amount of effort and time from the
researchers. Data extraction is an important step in these reviews, but current
practice requires the researchers to manually extract large amounts of
data. This thesis investigates the possibility of developing a prototype for
automatic extraction, so to reduce the time spent on manually extracting this
data. By reviewing related research, and experimenting with different features
and machine learning models, two different models were implemented in the
prototype: Conditional Random Fields for information
extraction and Maximum Entropy for text classification. The models achieved
average F1 performance score of 67.02% and 73.82%, respectively.
These results can be characterized as good results, and show that it is possible
to automate the data extraction process, by annotating a small part of the dataset
and training machine learning models to perform the extraction.
Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2012. , 90 p.
ntnudaim:6040, MIT informatikk, Informasjonsforvaltning
IdentifiersURN: urn:nbn:no:ntnu:diva-18415Local ID: ntnudaim:6040OAI: oai:DiVA.org:ntnu-18415DiVA: diva2:565909
Ramampiaro, Herindrasana, Førsteamanuensis