Change search
ReferencesLink to record
Permanent link

Direct link
Tweet Collect: short text message collection using automatic query expansion and classification
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2013 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [id]

The growing number of twitter users create large amounts of messages that contain valuable information for market research. These messages, called tweets, which are short, contain twitter-specific writing styles and are often idiosyncratic give rise to a vocabulary mismatch between typically chosen keywords for tweet collection and words used to describe television shows.

A method is presented  that uses a new form of query expansion that generates pairs of search terms and takes into consideration the language usage of twitter to access user data that would otherwise be missed. Supervised classification, without manually annotated data, is used to maintain precision by comparing collected tweets with external sources. The method is implemented, as the Tweet Collect system, in Java utilizing many processing steps to improve performance.

The evaluation was carried out by collecting tweets about five different television shows during their time of airing and indicating, on average, a 66.5% increase in the number of relevant tweets compared with using the title of the show as the search terms and 68.0% total precision. Classification gives a, slightly lower, average increase of 55.2% in number of tweets and a greatly increased 82.0% total precision.

The utility of an automatic system for tracking topics that can find additional keywords is demonstrated. Implementation considerations and possible improvements are discussed that can lead to improved performance.

Place, publisher, year, edition, pages
UPTEC IT, ISSN 1401-5749 ; 13 003
National Category
Engineering and Technology
URN: urn:nbn:se:uu:diva-194961OAI: diva2:606687
Educational program
Master of Science Programme in Information Technology Engineering
Available from: 2013-02-20 Created: 2013-02-20 Last updated: 2013-02-20Bibliographically approved

Open Access in DiVA

fulltext(1238 kB)3042 downloads
File information
File name FULLTEXT01.pdfFile size 1238 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 3042 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 560 hits
ReferencesLink to record
Permanent link

Direct link