is Íslenska en English

Lokaverkefni (Bakkalár)

Háskólinn í Reykjavík > Tæknisvið / School of Technology > BSc Tölvunarfræðideild / Department of Computer Science >

Vinsamlegast notið þetta auðkenni þegar þið vitnið til verksins eða tengið í það: https://hdl.handle.net/1946/46224

Titill: 
  • Titill er á ensku Evaluating Icelandic sentiment analysis models trained on translated data
Námsstig: 
  • Bakkalár
Útdráttur: 
  • Útdráttur er á ensku

    In this paper, we experiment with sentiment classification models for Icelandic that leverage machine-translated data for training. We machine translated 50,000 English IMDb reviews, that have been labeled positive and negative, into Icelandic. We evaluate if sentiment effectively carries across after machine translation and, moreover, the accuracy of the classification on native Icelandic text. We analyse the difference between three types of baseline classifiers, Support Vector Machines, Logistic Regression and Naive Bayes, when trained on translated data generated by Google Translate and Miðeind Vélþýðing (e. Translate). Furthermore, we fine-tune and evaluate three pre-trained transformer-based models, RoBERTa, IceBERT and ELECTRA on both the original English texts and the translated texts. Our results indicate that the transformer models perform better than the baseline classifiers on all datasets. Moreover, our evaluation shows that the transformer models can be used to effectively classify sentiment on native Icelandic movie reviews.

Samþykkt: 
  • 16.1.2024
URI: 
  • http://hdl.handle.net/1946/46224


Skrár
Skráarnafn Stærð AðgangurLýsingSkráartegund 
FinalReport.pdf425,74 kBOpinnHeildartextiPDFSkoða/Opna
ProgressReport.pdf846,58 kBOpinnViðaukiPDFSkoða/Opna