-
Is the script that creates the stats for prot-speeches.csv available somewhere? I'm interested in comparing my parsing of speeches from xml with the way it's done to to create the stats for |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
It's in Counting next attribs will miss speeches contained in a single <u> element (no idea how many that might be) and include text that is not a speech, but was classified as <u> (also not sure of the scale, but we know it happens). |
Beta Was this translation helpful? Give feedback.
-
Yes. We also feel that the quality of the introduction identification is quite good (based on the master thesis by Jesper Mortensen Blomqvist in 2022). |
Beta Was this translation helpful? Give feedback.
It's in
readme/src/generate-markdown.py
in the function starting on line 95count_pages_speeches_words()
. We rely on the assumption that a speech has an introduction, and essentially count the introductions as a proxy for speeches.Counting next attribs will miss speeches contained in a single <u> element (no idea how many that might be) and include text that is not a speech, but was classified as <u> (also not sure of the scale, but we know it happens).