Tapio Luostarinen
Comtra Oy, Savonlinna, Finland
Oskar Kohonen
Aalto University School of Science, Department of Information and Computer Science, Finland
Published in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Linköping Electronic Conference Proceedings 85:22, p. 239-251
NEALT Proceedings Series 16:22, p. 239-251
Published: 2013-05-17
ISBN: 978-91-7519-589-6
ISSN: 1650-3686 (print), 1650-3740 (online)
We study content-based recommendation of Finnish news in a system with a very small group of users. We compare three standard methods; Naïve Bayes (NB); K-Nearest Neighbor (kNN) Regression and Regulairized Linear Regression in a novel online simulation setting and in a coldstart simulation. We also apply Latent Dirichlet Allocation (LDA) on the large corpus of news and compare the learned features to those found by Singular Value Decomposition (SVD). Our results indicate that Naïve Bayes is the worst of the three models. K-Nearest Neighbor performs consistently well across input features. Regularized Linear Regression performs generally worse than kNN; but reaches similar performance as kNN with some features. Regularized Linear Regression gains statistically significant improvements over the word-features with LDA both on the full data set and in the cold-start simulation. In the cold-start simulation we find that LDA gives statistically significant improvements for all the methods.
Recommender Systems; Content-Based Recommendation; Topic Models; Latent Dirichlet Allocation; Cold-start
