1 WikiBERT Models: Deep Transfer Learning for Many Languages
Sampo Pyysalo, Jenna Kanerva, Antti Virtanen and Filip Ginter
2 EstBERT: A Pretrained Language-Specific BERT for Estonian
Hasan Tanvir, Claudia Kittask, Sandra Eiche and Kairit Sirts
3 Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model
Per E Kummervold, Javier De la Rosa, Freddy Wetjen and Svein Arne Brygfjeld
4 Large-Scale Contextualised Language Modelling for Norwegian
Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid and Stephan Oepen
5 Extremely low-resource machine translation for closely related languages
Maali Tars, Andre Tättar and Mark Fišel
6 Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students?
Yuri Bizzoni and Ekaterina Lapshinova-Koltunski
7 CombAlign: a Tool for Obtaining High-Quality Word Alignments
Steinþór Steingrímsson, Hrafn Loftsson and Andy Way
8 Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks
Prajit Dhar and Arianna Bisazza
9 Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces
Tuomas Kaseva, Hemant Kumar Kathania, Aku Rouhe and Mikko Kurimo
10 Spectral modification for recognition of children’s speech undermismatched conditions
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku and Mikko Kurimo
11 A Baseline Document Planning Method for Automated Journalism
Leo Leppänen and Hannu Toivonen
12 Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning
Joakim Olsen, Arild Brandrud Næss and Pierre Lison
13 Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency
Lovisa Hagström and Richard Johansson
14 Fine-grained Named Entity Annotation for Finnish
Jouni Luoma, Li-Hsin Chang, Filip Ginter and Sampo Pyysalo
15 Survey and reproduction of computational approaches to dating of historical texts
Sidsel Boldsen and Fredrik Wahlberg
16 Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification
Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen and Veronika Laippala
17 Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered
Mika Hämäläinen, Niko Partanen, Jack Rueter and Khalid Alnajjar
18 CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish
Elena Volodina, Yousuf Ali Mohammed and Therese Lindström Tiedemann
19 Chunking Historical German
Katrin Ortmann
20 Part-of-speech tagging of Swedish texts in the neural era
Yvonne Adesam and Aleksandrs Berdicevskis
21 De-identification of Privacy-related Entities in Job Postings
Kristian Nørgaard Jensen, Mike Zhang and Barbara Plank
22 Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification
Synnøve Bråten, Wilhelm Wie and Hercules Dalianis
23 Applying and Sharing pre-trained BERT-models for Named Entity Recognition and Classification in Swedish Electronic Patient Records
Mila Grancharova and Hercules Dalianis
24 An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish
Quan Duong, Mika Hämäläinen and Simon Hengchen
25 Learning to Lemmatize in the Word Representation Space
Jarkko Lagus and Arto Klami
26 Synonym Replacement based on a Study of Basic-level Nouns in Swedish Texts of Different Complexity
Evelina Rennes and Arne Jönsson
27 SuperSim: a test set for word similarity and relatedness in Swedish
Simon Hengchen and Nina Tahmasebi
28 NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance
Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis and Jörg Tiedemann
29 Finnish Paraphrase Corpus
Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón and Otto Tarkka
30 Negation in Norwegian: an annotated dataset
Petter Mæhlum, Jeremy Barnes, Robin Kurtz, Lilja Øvrelid and Erik Velldal
31 What Taggers Fail to Learn, Parsers Need the Most
Mark Anderson and Carlos Gómez-Rodríguez
32 Investigation of Transfer Languages for Parsing Latin: Italic Branch vs. Hellenic Branch
Antonia Karamolegkou and Sara Stymne
33 Towards cross-lingual application of language-specific PoS tagging schemes
Hinrik Hafsteinsson and Anton Karl Ingason
34 Exploring the Importance of Source Text in Automatic Post-Editing for Context-Aware Machine Translation
Chaojun Wang, Christian Hardmeier and Rico Sennrich
35 Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng Han, Gareth Jones, Alan Smeaton and Paolo Bolzoni
36 Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages
Juho Leinonen, Sami Virpioja and Mikko Kurimo
37 Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation
Mikko Aulamo, Sami Virpioja, Yves Scherrer and Jörg Tiedemann
38 Building a Swedish Open-Domain Conversational Language Model
Tobias Norlund and Agnes Stenbom
39 It’s Basically the Same Language Anyway: the Case for a Nordic Language Model
Magnus Sahlgren, Fredrik Carlsson, Fredrik Olsson and Love Börjeson
40 Decentralized Word2Vec Using Gossip Learning
Abdul Aziz Alkathiri, Lodovico Giaretta, Sarunas Girdzijauskas and Magnus Sahlgren
41 Multilingual ELMo and the Effects of Corpus Sampling
Vinit Ravishankar, Andrey Kutuzov, Lilja Øvrelid and Erik Velldal
42 Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?
Tim Isbister, Fredrik Carlsson and Magnus Sahlgren
43 Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language
Timo Johner, Abhik Jana and Chris Biemann
44 Grammatical Error Generation Based on Translated Fragments
Eetu Sjöblom, Mathias Creutz and Teemu Vahtola
45 Creating Data in Icelandic for Text Normalization
Helga Svala Sigurðardóttir, Anna Björk Nikulásdóttir and Jón Guðnason
46 The Danish Gigaword Corpus
Leon Strømberg-Derczynski, Manuel Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Jens Madsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm and Daniel Varab
47 DanFEVER: claim verification dataset for Danish
Jeppe Nørregaard and Leon Derczynski
48 The Icelandic Word Web: A language technology-focused redesign of a lexicosemantic database
Hjalti Daníelsson, Jón Hilmar Jónsson, Þórður Arnar Árnason, Alec Shaw, Einar Freyr Sigurðsson and Steinþór Steingrímsson
49 Getting Hold of Villains and other Rogues
Manfred Klenner, Anne Göhring and Sophia Conrad
50 Talrómur: A large Icelandic TTS corpus
Atli Sigurgeirsson, Þorsteinn Gunnarsson, Gunnar Örnólfsson, Eydís Magnúsdóttir, Ragnheiður Þórhallsdóttir, Stefán Jónsson and Jón Guðnason
51 NorDial: A Preliminary Corpus of Written Norwegian Dialect Use
Jeremy Barnes, Petter Mæhlum and Samia Touileb
52 The Swedish Winogender Dataset
Saga Hansson, Konstantinos Mavromatakis, Yvonne Adesam, Gerlof Bouma and Dana Dannélls
53 DaNLP: An open-source toolkit for Danish Natural Language Processing
Amalie Brogaard Pauli, Maria Barrett, Ophélie Lacroix and Rasmus Hvingelby
54 HB Deid - HB De-identification tool demonstrator
Hercules Dalianis and Hanna Berg