Conventional topic modeling schemes, such as Latent Dirichlet Allocation, are known to perform inadequately when applied to tweets, due to the sparsity of short documents. To alleviate these disadvantages, we apply several pooling techniques, aggregating similar tweets into individual documents, and specifically study the aggregation of tweets sharing authors or hashtags. The results show that aggregating similar tweets into individual documents significantly increases topic coherence.
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet Allocation. In the Journal of Machine Learning Research, volume 3, pages 993–1022, MIT, Massachusetts, USA. JMLR. org.
David M. Blei. 2012. Probabilistic Topic Models. In Communications of Association for Computer Machinery, volume 55, New York, NY, USA, April. ACM.
William Boag, Peter Potash, and Anna Rumshisky. 2015. TwitterHawk: A Feature Bucket Based Approach to Sentiment Analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 640–646, Denver, Colorado, June. Association for Computational Linguistics.
Fazli Can and Esen A. Ozkarahan. 1990. Concepts and Effectiveness of the Covercoefficient-based Clustering Methodology for Text Databases. In ACM Transitional Database Systems, volume 15, pages 483–517, New York, NY, USA, December. Association for Computer Machinery.
Jonathan Chang, Sean Gerrish, Chong Wang, Jordan L Boyd-Graber, and David M Blei. 2009. Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems, pages 288–296, Vancouver, British Columbia.
Derek Greene, Derek O’Callaghan, editor="Calders Toon Cunningham, Pádraig", Floriana Esposito, Eyke Hüllermeier, and Rosa Meo, 2014. How Many Topics? Stability Analysis for Topic Models, pages 498–513.
Springer Berlin Heidelberg, Berlin, Heidelberg. Liangjie Hong and Brian D. Davison. 2010. Empirical Study of Topic Modeling in Twitter. In Proceedings of the First Workshop on Social Media Analytics, SOMA ’10, pages 80–88, New York, NY, USA. ACM.
Olessia Koltsova and Sergei Koltcov. 2013. Mapping the public agenda with topic modeling: The case of the Russian LiveJournal. In Policy & Internet, volume 5, pages 207–227, Russia.
Jey Han Lau, Nigel Collier, and Timothy Baldwin. 2012. On-line Trend Analysis with Topic Models:\# Twitter Trends Detection Topic Model Online. In Proceedings of COLING 2012: Technical Papers, pages 1519–1534, pages 1519–1534, Mumbai, India.
David Mimno, Hanna M. Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 262–272, Stroudsburg, PA, USA. Association for Computational Linguistics.
Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In International Conference on Web and Social Media, volume 11, pages 1–2, Washington DC, USA.
Nataliia Plotnikova, Micha Kohl, Kevin Volkert, Andreas Lerner, Natalie Dykes, Heiko Emer, and Stefan Evert. 2015. KLUEless: Polarity Classification and Association. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Erlangen, Germany. Friedrich-Alexander-Universitat Erlangen-Nurnberg.
Xiaojun Quan, Chunyu Kit, Yong Ge, and Sinno Jialin Pan. 2015. Short and sparse text topic modeling via self-aggregation. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 2270–2276. AAAI Press.
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The Author-topic Model for Authors and Documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, pages 487–494, Arlington, Virginia, United States. AUAI Press.
Dionisios N Sotiropoulos, Chris D Kounavis, Panos Kourouthanassis, and George M Giaglis. 2014. What drives social sentiment? An entropic measure-based clustering approach towards identifying factors that influence social sentiment polarity. In Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference, pages 361–373, Chania Crete, Greece. IEEE.
Pranav Waila, VK Singh, and Manish K Singh. 2013. Blog text analysis using topic modeling, named entity recognition and sentiment classifier combine. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on, pages 1166–1171, Mysore, India. IEEE.
Y. Wang, J. Liu, J. Qu, Y. Huang, J. Chen, and X. Feng. 2014. Hashtag Graph Based Topic Model for Tweet Mining. In 2014 IEEE International Conference on Data Mining, pages 1025–1030, Shenzhen, China, Dec.
Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. TwitterRank: Finding Topicsensitive Influential Twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, pages 261–270, New York, NY, USA. ACM.
Zhihua Zhang, Guoshun Wu, and Man Lan. 2015. East China Normal University, ECNU: Multilevel Sentiment Analysis on Twitter Using Traditional Linguistic Features and Word Embedding Features. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Shanghai, China. East China Normal University Shanghai.