Allegro Reviews is a sentiment analysis dataset, consisting of 11,588 product reviews written in Polish and extracted from Allegro.pl - a popular e-commerce marketplace. Each review contains at least 50 words and has a rating on a scale from one (negative review) to five (positive review).
We recommend using the provided train/dev/test split. The ratings for the test set reviews are kept hidden. You can evaluate your model using the online evaluation tool available on klejbenchmark.com.
The dataset can be downloaded from here.
To counter slight class imbalance in the dataset, we propose to evaluate models using wMAE, i.e.macro-average of the mean absolute error per class. Additionally, we transform the rating to be between zero and one and report 1 − wMAE to obtain the final score.
Python implementation of the proposed metric:
import pandas as pd
from sklearn.metrics import mean_absolute_error
def ar_score(y_true, y_pred):
ds = pd.DataFrame({
'y_true': (y_true - 1.0)/4.0,
'y_pred': (y_pred - 1.0)/4.0,
})
wmae = ds \
.groupby('y_true') \
.apply(lambda df: mean_absolute_error(df['y_true'], df['y_pred'])) \
.mean()
return 1 - wmae
Model | AR Score |
---|---|
ELMo | 86.15 |
Multilingual BERT | 83.33 |
Slavic BERT | 84.31 |
XLM-17 | 84.52 |
HerBERT | 84.48 |
CC BY-SA 4.0
If you use this dataset, please cite the following paper:
@inproceedings{rybak-etal-2020-klej,
title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.111",
pages = "1191--1201",
}
Dataset was created by the Allegro Machine Learning Research team.
You can contact us at: klejbenchmark@allegro.pl