Change search
ReferencesLink to record
Permanent link

Direct link
Automatic web page categorizationusing text classication methods
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Automatisk kategorisering av webbsidor medtextklassificeringsmetoder (Swedish)
Abstract [en]

Over the last few years, the Web has virtually exploded with an enormous amount of web pages of dierent types of content. With the current size of Web, it has become cumbersome to try and manually index and categorize all of its content. Evidently, there is a need for automatic web page categorization.

This study explores the use of automatic text classication methods for categorization of web pages. The results in this paper is shown to be comparable to results in other papers on automatic web page categorization, however not as good as results on pure text classication.

Abstract [sv]

Under de senaste åren så har Webben exploderat i storlek, med miljontals webbsidor av vitt skilda innehåll. Den enorma storleken av Webben gör att det blir ohanterligt att manuellt indexera och kategorisera allt detta innehåll. Uppenbarligen behövs det automatiska metoder för att kategorisera webbsidor.

Denna studie undersöker hur metoder för automatiskt textklassicering kan användas för kategorisering av hemsidor. De uppnådda resultatet i denna rapport är jämförbara med resultat i annan litteratur på samma område, men når ej upp till resultatet i studier på ren textklassicering.

Place, publisher, year, edition, pages
National Category
Computer Science
URN: urn:nbn:se:kth:diva-142424OAI: diva2:700316
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2014-03-11 Created: 2014-03-04 Last updated: 2014-03-11Bibliographically approved

Open Access in DiVA

fulltext(911 kB)452 downloads
File information
File name FULLTEXT01.pdfFile size 911 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 452 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 134 hits
ReferencesLink to record
Permanent link

Direct link