Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Filtrering av e-post: Binär klassifikation med naiv Bayesiansk teknik
University of Borås, Swedish School of Library and Information Science.
University of Borås, Swedish School of Library and Information Science.
2007 (Swedish)Independent thesis Advanced level (degree of Master (One Year))Student thesisAlternative title
Filtering e-mail : Binary classification with naïve Bayesian technique (English)
Abstract [en]

In this thesis we compare how different strategies in choosing attribute values affects junk mail filtering. We used two different variants of a naïve Bayesian junk mail filter. The first variant classified an e-mail by comparing it to a feature vector containing all attribute values that were found in junk mails in the part of the e-mail collection we used for training the filter. The second variant compared an e-mail to a feature vector that consisted of the attributes that was found in ten or more junk mails in the part of the e-mail collection we used for training the filter. We used an e-mail collection that consisted of 300 e-mails, 210 of these were junk mails and 90 were legitimate e-mails. We measured the results in our study using; SP, SR and F1 and to be able to compare the two different strategies we cross validated them. The results we got in our study showed that the first strategy got higher average F1 values than our second strategy. Despite of this we believe that the second strategy is the better one. Instead of comparing the e-mail to a feature vector containing all attribute values found in junk mails, the results will be better if the filter compares the e-mail to a feature vector that contains a limited amount of attribute values.

Place, publisher, year, edition, pages
University of Borås/Swedish School of Library and Information Science (SSLIS) , 2007.
Series
Magisteruppsats i biblioteks- och informationsvetenskap vid institutionen Biblioteks- och informationsvetenskap, ISSN 1654-0247 ; 2007:132
Keywords [en]
automatisk klassifikation, bayesianskt filter
Keywords [sv]
skräppost, filtrering
National Category
Social Sciences
Identifiers
URN: urn:nbn:se:hb:diva-18675Local ID: 2320/2902OAI: oai:DiVA.org:hb-18675DiVA, id: diva2:1310609
Note
Uppsatsnivå: DAvailable from: 2019-04-30 Created: 2019-04-30

Open Access in DiVA

fulltext(270 kB)22 downloads
File information
File name FULLTEXT01.pdfFile size 270 kBChecksum SHA-512
7a100161a71f58c4c7c29904391a610fddfa96901f213284a032bcada198e41d0ca1ff24857bf260f32e235273e3a5760f27f6a32bc646be4927f5c9b8626cf4
Type fulltextMimetype application/pdf

By organisation
Swedish School of Library and Information Science
Social Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 22 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf