Asbjørn Ottesen Steinskog
Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Jonas Foyn Therkelsen
Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Björn Gambäck
Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Download articlePublished in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Linköping Electronic Conference Proceedings 57:10, p. 77-86
NEALT Proceedings Series 29:10, p. 77-86
Published: 2017-05-08
ISBN: 978-91-7685-601-7
ISSN: 1650-3686 (print), 1650-3740 (online)
Conventional topic modeling schemes, such as Latent Dirichlet Allocation, are known to perform inadequately when applied to tweets, due to the sparsity of short documents. To alleviate these disadvantages, we apply several pooling techniques, aggregating similar tweets into individual documents, and specifically study the aggregation of tweets sharing authors or hashtags. The results show that aggregating similar tweets into individual documents significantly increases topic coherence.