Community notes experiment

Key idea:

First learnings and observations

Vitalik's original implementation was very slow but rewriting it using numpy helped make it ~10 faster.
The outputs I got for the rec-public dataset were fairly encouraging. Looking at the polarisation axis, we can notice that the posts on the left are more libertarian/anti-regulation (promoting individualism) while the posts on the right are more authoritarian/pro-regulation (promoting collectivism).
The quality/polarisation plot doesn't seem really well distributed. It seems that posts on the right side of the plot (the collectivist side) generally get better quality score. I wonder whether this is might be due to the fact that a large majority of participants are more on the collectivist side. Maybe the algorithm could be further improved to somehow compensate the unbalance during the training (e.g. by modifying the cost function to penalize assymetries)? Or alternatively, one might keep the grandiant descend unchanged but correct the final scores at the end (e.g. boosting the score of notes from under-represented side, or boosting the notes that are in the middle of the polarisation axis).

Many things could potentially be improved in the algo:

We may also get better performance by addressing the fact that our Polis matrices are very sparse...

we could cost function to treat "0" and "NaN" differently
we could maybe drop comments with less than two votes to reduce noise
we could try and fill the gaps using various technics (interpolation, knn, collaborative filtering...)

We could also start looking at multi-dimensional polarisation

the user_alignment and note_alignent vectors could be replaced by matrices
the case with two dimensions could be particularly interesting (and easy to plot)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main.py		main.py
plot.py		plot.py