Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction of the Type for Web Page: A Practical Application in Classification
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Information Systems.
2013 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

As more and more data are generated in daily life, traditional data analysis methods reach their bottoms and often fail to discover unknown factors deep inside the data, which cause the adoption of data mining. Classification means mapping data into known groups, and it is one of primary tasks of data mining. The study in this thesis is about finding an automated solution to predict whether the value of the web page doesn’t decrease as time goes on, in other words evergreen or not. When recommending web pages to users according to their interests, it is valuable to know which pages are evergreen. There is no doubt this study belongs to the area of classification. In order to solve this problem, the knowledge and techniques involved in machine learning and web text mining are required to implement the solution. A number of models or classifiers are built during the implementation based on different features and optimizations, and they are evaluated by a method called cross validation. The best solution in this thesis is an ensemble of some simple models, which achieves highest accuracy in prediction. Moreover, limitations of solution are also presented and future improvements are suggested.

Place, publisher, year, edition, pages
2013. , 40 p.
National Category
Social Sciences Information Systems, Social aspects
Identifiers
URN: urn:nbn:se:uu:diva-212963OAI: oai:DiVA.org:uu-212963DiVA: diva2:679885
Subject / course
Information Systems
Educational program
Master Programme in Social Sciences
Presentation
2013-12-13, A311, Ekonomikum (plan 3), Kyrkogårdsg. 10, Uppsala, 16:00 (English)
Supervisors
Examiners
Available from: 2013-12-17 Created: 2013-12-17 Last updated: 2013-12-17Bibliographically approved

Open Access in DiVA

fulltext(908 kB)746 downloads
File information
File name FULLTEXT01.pdfFile size 908 kBChecksum SHA-512
dbd52658f73f1cb949a040e877a91f0e6937bda7479a2ccc602b2768d08ab6c63b1efb2e24a575551e8a6cca0f66eab13ca44dc1c06718c7c845f7a0c3649858
Type fulltextMimetype application/pdf

By organisation
Information Systems
Social SciencesInformation Systems, Social aspects

Search outside of DiVA

GoogleGoogle Scholar
Total: 746 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 579 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf