Change search
ReferencesLink to record
Permanent link

Direct link
Using Naive Bayes and N-Gram for Document Classification Användning av Naive Bayes och N-Gram för dokumentklassificering
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The purpose of this degree project is to present, evaluate and improve probabilistic machine-learning methods for supervised text classification. We will explore Naive Bayes algorithm and character level n-gram, two probabilistic methods. The two methods will then be compared. Probabilistic algorithms like Naive Bayes and character level n-gram are some of the most effective methods in text classification, but to get accurate results they need a large training set. Because of too simple assumptions, Naive Bayes is a poor classifier. To rectify the problem, we will try to improve the algorithm, by using some transformed word and n-gram counts.

Abstract [sv]

Syftet med det här examensarbetet är att presentera, utvärdera och förbättra probabilistiska maskin-lärande metoder för övervakad textklassificering. Vi ska bekanta oss med Naive Bayes och tecken-baserad n-gram, två probabilistiska metoder. Vi ska sedan jämföra metoderna. Probabilistiska algoritmerna är bland de mest effektiva metoder för övervakad textklassificering, men för att de ska ge noggranna resultat behövs det att de tränas med en stor mängd data. På grund av antaganden som görs i modellen, är Naive Bayes en dålig klassificerare. För att åtgärda problemet, ska vi försöka förbättra algoritmerna genom att modifiera ordfrekvenserna i dokumentet.

Place, publisher, year, edition, pages
Keyword [en]
National Category
Computer Science
URN: urn:nbn:se:kth:diva-170757OAI: diva2:839705
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2016-10-31 Created: 2015-07-03 Last updated: 2016-10-31Bibliographically approved

Open Access in DiVA

fulltext(744 kB)45 downloads
File information
File name FULLTEXT01.pdfFile size 744 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 45 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 73 hits
ReferencesLink to record
Permanent link

Direct link