Enhancing Relevant Region Classifying
Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
In this thesis we present a new way of extracting relevant data from texts. We use the method presented in the paper by Patwardhan and Rilo (2007), with improvements of our own.
Our approach modifes the input to the support vector machine, to construct a self-trained relevant sentence classi er. This classffer is used to identify relevant sentences on the MUC-4 terrorism corpus.
We modify the input by removing stopwords, converting words to its stem and only using words that occur at least three times in the corpus. We also changed how each word is weighted, using TF x IDF as weighting function.
By using the relevant sentence classiffer together with domain relevant extraction patterns, we achieved higher performance on the MUC-4 terrorism corpus than the original model.
Place, publisher, year, edition, pages
2011. , 54 p.
Natural Language processing, Information Extraction, Support Vector Machine, Pattern Extraction
IdentifiersURN: urn:nbn:se:kth:diva-32661OAI: oai:DiVA.org:kth-32661DiVA: diva2:411324
Ayani, Rassul, Professor