This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features. All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.
The Models description is available here.
For this code to be functional is needed:
- Python 3.8
- tensorflow 2.0
- Keras 2.4.3
- Freeling 4.1 and python API
- Italian Word Embedding avalilable here
- Once downloaded the word embedding file
(wiki-it.vec)
it must be placed ondata
folder. - Download the weights of the BERT model and place it on `data' folder.
- Train the models.
- Make the predictions over the test files
The models code for predicting each task is locatend on Ensemble
floder, also there is a file train.py which once run save the weights learned with the provided training data.
So the first step for use this classifier is run on the command line:
python ./Ensemble/train.py
The training files are located on data
folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change the source
variable on this train.py
file.
source = "./data/training.txt"
For making predictions run:
python main.py
You should provide the test files by -dp
option. Inside the test_data
folder is the test data provided by the organizers.
The datasets are composed by texts written by multiple users, with possibly multiple posts per user.
The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gender male|female
, and the age range [0,19], [20,29], [30-39], [40-49], [50-100]
. This is a sample:
<doc id="3046" topic="orologi" age="30-39" gender="male" >
<post>
Per quale motivo oggi, il mondo dell'orologeria è così importante per voi?
</post>
<post>
Cosa vi ha spinto a rendervi appassionati così bramosi?
</post>
</doc>