Change search
ReferencesLink to record
Permanent link

Direct link
Extracting Keyphrases from Individual News Articles
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2011 (English)MasteroppgaveStudent thesis
Abstract [en]

Extraction of keyphrases from individual documents is a research area in which one try to extract a small set of keyphrases that describe the content of a single document. The advantages with this form of extraction is that it retains most of the semantic context from the document. In this thesis we focus on the news article domain and use the structure of a news article to improve the quality of the extracted keyphrases. An existing individual document keyphrase extraction algorithm is used as the basis. This algorithm is enhanced by implementing a weighting system based upon the structure of news articles. In addition some other common methods for keyword extraction is applied. The effects of these changes are tested extensively in the evaluation phase. In the evaluation of the implemented prototype we find that the introduction of a weight based system yields results that are equal to the basic algorithm and that few improvements can be made. We do however find that an automatically generated stopword list based on the corpus improves the results by 1-2%.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2011. , 80 p.
Keyword [no]
ntnudaim:6047, MTDT datateknikk, Program- og informasjonssystemer
URN: urn:nbn:no:ntnu:diva-14225Local ID: ntnudaim:6047OAI: diva2:449016
Available from: 2011-10-19 Created: 2011-10-19

Open Access in DiVA

fulltext(1087 kB)245 downloads
File information
File name FULLTEXT01.pdfFile size 1087 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(47 kB)18 downloads
File information
File name COVER01.pdfFile size 47 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(82578 kB)13 downloads
File information
File name ATTACHMENT01.zipFile size 82578 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 245 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 53 hits
ReferencesLink to record
Permanent link

Direct link