Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Text to features for Swedish text
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In text mining, texts are usually transformed into numerical vectors or feature vectors, before they are given to a machine learning algorithm for text classification. In this project, a set of features for classifying tweets in Swedish was created. The following classification tasks were selected: gender, age and political party prediction, sentiment analysis and authorship attribution, which is the task of determining if a text was written by a particular author or not. Relevant previous studies were researched and a suitable subset of features used in those studies were chosen. A tool was developed that preprocesses the tweets and calculates, for each tweet, values for the features in the feature set. Experiments were run on a data set consisting of tweets written by Swedish politicians. The output of the tool was given to a machine learning algorithm that created classification models. While the first four classification tasks were unsuccessful, some of the authorship attribution models managed to produce an F-score between 80 and 90%. For the failed classification tasks, the features need to be tested on a different data set or new features have to be created

Place, publisher, year, edition, pages
2019. , p. 42
Series
IT ; 19038
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-396578OAI: oai:DiVA.org:uu-396578DiVA, id: diva2:1368316
Educational program
Bachelor Programme in Computer Science
Supervisors
Examiners
Available from: 2019-11-06 Created: 2019-11-06 Last updated: 2019-11-06Bibliographically approved

Open Access in DiVA

fulltext(1279 kB)2 downloads
File information
File name FULLTEXT01.pdfFile size 1279 kBChecksum SHA-512
48bff953d12090dcfc4fd7eef9b1ccbb604fb01b6c1fe6a2844969a7774e4290b1655b2d93e5a1d0486117cfac4b46bc797aea1e847e50dd8acc5396c61042cf
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 2 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf