Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Skräppost eller skinka?: En jämförande studie av övervakade maskininlärningsalgoritmer för spam och ham e-mailklassifikation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media. Gotland University, Department of Software Engineering.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media. Gotland University, Department of Software Engineering.
2019 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Spam or ham? : A comparative study of monitored machine learning algorithms for spam and ham e-mail classification. (English)
Abstract [en]

Spam messages in the form of e-mail is a growing problem in today's businesses. It is a problem that costs time and resources to counteract. Research into this has been done to produce techniques and tools aimed at addressing the growing number on incoming spam e-mails. The research on different algorithms and their ability to classify e-mail messages needs an update since both tools and spam e-mails have become more advanced. In this study, three different machine learning algorithms have been evaluated based on their ability to correctly classify e-mails as legitimate or spam. These algorithms are naive Bayes, support vector machine and decision tree. The algorithms are tested in an experiment with the Enron spam dataset and are then compared against each other in their performance. The result of the experiment was that support vector machine is the algorithm that correctly classified most of the data points. Even though support vector machine has the largest percentage of correctly classified data points, other algorithms can be useful from a business perspective depending on the task and context.

Place, publisher, year, edition, pages
2019. , p. 27
Keywords [sv]
Maskininlärning, Spam e-mail, Textklassificering, Spam e-mailklassificering
National Category
Other Engineering and Technologies not elsewhere specified
Identifiers
URN: urn:nbn:se:uu:diva-389384OAI: oai:DiVA.org:uu-389384DiVA, id: diva2:1336831
Educational program
Bachelor programme in Information Systems
Supervisors
Available from: 2019-09-04 Created: 2019-07-10 Last updated: 2019-09-04Bibliographically approved

Open Access in DiVA

fulltext(316 kB)8 downloads
File information
File name FULLTEXT01.pdfFile size 316 kBChecksum SHA-512
7dca19dffb160c3ada1703422f2a5ead479f87cda4925850354ac9a95632de8afaa571674b309b0103230ae55726ee771b020a0fe7fdce15164f787bb857a799
Type fulltextMimetype application/pdf

By organisation
Department of Informatics and MediaDepartment of Software Engineering
Other Engineering and Technologies not elsewhere specified

Search outside of DiVA

GoogleGoogle Scholar
Total: 8 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 69 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf