Mining Online Text Data for Sentiment and News Impact Analysis
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
As continuous growth of Internet, an ever increasing amount of information becomesavailable on the World Wide Web (WWW). Information on the WWW has never been soexploded that search engines using traditional keyword-based searching strategies hardlymeet people’s needs to retrieve knowledge from online massive text data. The motivationof this thesis comes from the great demands on discovering implicit knowledge and richsemantics from online documents.This thesis focuses on analyzing online business news, a representative of objective information,and online customer reviews, a representative of subjective information. Foronline business news, a topic driven impact analysis model is proposed that quantifies theimpact of topic of a news article. With the proposed topic driven impact analysis model,an explorative visual analysis system called ImpactWheel is developed to help users betternavigate and understand topic-specific companies’ impact relationships through miningrich information source of online business news.For online customer reviews, both document overall sentiment classification and attributedbasedsentiment analysis are performed. In the regard of document overall sentiment classification,taking advantages of high frequency of Co-occurring Term (CoT) patterns incustomer reviews, a frequency-based algorithm is proposed to generate complex featureswhich benefits sentiment classifiers. In order to search for effective features and ignoreuseless ones produced by the frequency-based complex feature generation algorithm, anEffective Feature Search (EFS) framework is proposed, which makes a novel connectionbetween feature candidate generation and a Stochastic Local Search process. In theregard of attributed-based sentiment analysis, the concept of Sentiment Ontology Tree isproposed, which organizes a product’s domain specific knowledge as well as sentiments ina tree-like ontology structure. With the concept of SOT, a Hierarchial Learning via SentimentOntology Tree (HL-SOT) approach is proposed to solve the sentiment analysis tasksin a hierarchical classification process. To enhance the classification performance andcomputational efficiency of the HL-SOT approach which encodes texts using a globallyunified index term space, a Localized Feature Selection (LFS) framework is developedwhich generates the customized index term space for each node of SOT. Since that theHL-SOT approach was estimated by a RLS estimator which is not competent enough tofind max class separation and that the statistical linear classifier has been evidently provenits fallibility on classifying sentiment, a more pragmatic Hybrid Hierarchical ClassificationProcess (HHCP) is proposed. The HHCP approach employs a linear classifier thatis capable of maximizing the class separation while minimizing the within-class variancefor attribute detection and turns to a rule-based solution for sentiment orientation.
Place, publisher, year, edition, pages
Trondheim, Norway: Norges teknisk-naturvitenskapelige universitet , 2013. , 200 p.
Doctoral Theses at NTNU, ISSN 1503-8181 ; 2013:256
Information and communication science
IdentifiersURN: urn:nbn:no:ntnu:diva-22864ISBN: 978-82-471-4636-1ISBN: 978-82-471-4637-8OAI: oai:DiVA.org:ntnu-22864DiVA: diva2:653857
2013-10-03, 13:15 (English)
Gulla, Jon Atle
List of papers