Vinsamlegast notið þetta auðkenni þegar þið vitnið til verksins eða tengið í það: https://hdl.handle.net/1946/46224
In this paper, we experiment with sentiment classification models for Icelandic that leverage machine-translated data for training. We machine translated 50,000 English IMDb reviews, that have been labeled positive and negative, into Icelandic. We evaluate if sentiment effectively carries across after machine translation and, moreover, the accuracy of the classification on native Icelandic text. We analyse the difference between three types of baseline classifiers, Support Vector Machines, Logistic Regression and Naive Bayes, when trained on translated data generated by Google Translate and Miðeind Vélþýðing (e. Translate). Furthermore, we fine-tune and evaluate three pre-trained transformer-based models, RoBERTa, IceBERT and ELECTRA on both the original English texts and the translated texts. Our results indicate that the transformer models perform better than the baseline classifiers on all datasets. Moreover, our evaluation shows that the transformer models can be used to effectively classify sentiment on native Icelandic movie reviews.
Skráarnafn | Stærð | Aðgangur | Lýsing | Skráartegund | |
---|---|---|---|---|---|
FinalReport.pdf | 425,74 kB | Opinn | Heildartexti | Skoða/Opna | |
ProgressReport.pdf | 846,58 kB | Opinn | Viðauki | Skoða/Opna |