Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TO DO Swedish PM error found in WD by Riksdagens-Corpus #121

Closed
salgo60 opened this issue Apr 24, 2023 · 15 comments
Closed

TO DO Swedish PM error found in WD by Riksdagens-Corpus #121

salgo60 opened this issue Apr 24, 2023 · 15 comments

Comments

@salgo60
Copy link
Owner

salgo60 commented Apr 24, 2023

From #252

Ok Here's a list of Wiki IDs causing the member_of_parliament unit test to fail -- it means they're missing the role wd:Q10655178, wd:Q33071890, or wd:Q81531912.

@salgo60 salgo60 changed the title TO DO Swedish PM pull sss TO DO Swedish PM error found in WD by Riksdagens-Corpus Apr 24, 2023
@BobBorges
Copy link

from catalog.csv:

Q116199773 | Gilljam i Stockholm | Gustaf F | 1:91 | 5:135 | i Stockholm

@salgo60
Copy link
Owner Author

salgo60 commented Apr 24, 2023

@BobBorges looks like correct is WD Q5759618

Band 1:91 --> SPA sj9PGLAlnmUAAAAAABfNfA --> Wikidata
hub.toolforge.org/P4819:sj9PGLAlnmUAAAAAABfNfA?site=wd
= WD Q5759618

image

Band 5:135

image

@BobBorges
Copy link

@salgo60 It's strange about the Q-ID. Do you think it changed or how did we end up with an error?

@salgo60
Copy link
Owner Author

salgo60 commented Apr 24, 2023

Looks like a simple mistake they are father son..

image

@salgo60
Copy link
Owner Author

salgo60 commented Apr 25, 2023

@BobBorges Q5779431 looks wrong matched

Band 3 page 121

image

image

@salgo60
Copy link
Owner Author

salgo60 commented Apr 27, 2023

@BobBorges @MansMeg another mismatch

correct: Q5805039 Hierta i Näsby

@BobBorges
Copy link

Thanks @salgo60 – I've updated the errors you found in our catalog.csv file

@BobBorges
Copy link

Q5933362 updated in known_mps_catalog.csv

@BobBorges
Copy link

Added basic roles for all of these except Q26211952. Some didn't need to have roles added, but the QIDs were redirect QIDs, so once those were replaced the corpus query works properly.

@salgo60
Copy link
Owner Author

salgo60 commented May 1, 2023

Added basic roles for all of these except Q26211952. Some didn't need to have roles added, but the QIDs were redirect QIDs, so once those were replaced the corpus query works properly.

@BobBorges @MansMeg maybe you should add a precheck if a record is redirected.... its just to do a get on an URL and see what is returned,,,,

SPARQL Första/Andra kammaren merges the last 1000 days 88 records / all PMs 192 rec

image


I walk through the above list a little bit slowly...

@BobBorges
Copy link

maybe you should add a precheck if a record is redirected.... its just to do a get on an URL and see what is returned,,,,

@salgo60 I'm working in a somewhat different way, with a python library called pywikibot. I discovered these pages were redirects because my attempt to add a role failed. In data I was working with, there was only a handful, so I just printed the ID to the console when it failed and updated them manually. If it would happen again and there were more instances, I'd definitely automate the fix somehow.

@salgo60
Copy link
Owner Author

salgo60 commented May 1, 2023

@BobBorges there is also a new rest API released, @dpriskorn who does more programming than me will visit me tomorrow and I think he prefer python lib WikibaseIntegrator everyday in the week... let me know if we should have an on.line sessions...

There is also a Notebook instance called PAWS were you also can play and get some code examples....

Let me know if you have questions

@salgo60
Copy link
Owner Author

salgo60 commented May 10, 2023

FYI @BobBorges

  • good or bad way of cite a book?
  • we need to be aware that Q110346241 has more articles about the same person and the information is not identical....
    • I guess dependent how the sources will be used we easily will introduce some noice and confusion for the researcher
      • maybe some guidelines how to quote would make sense dependent how advanced this corpus will be used...

Paper book issues with Q110346241 Two-Chamber Parliament 1867-1970 that they produce more articles not identical for one person...

image

my conclusion from this one sample

  1. Q110346241 Two-Chamber Parliament 1867-1970 more articles for one person are not identical
  2. Q110346241 Two-Chamber Parliament 1867-1970 books in later volumes dont necessary have the more precise information
  3. The way I have added references is assuming that the information is the same if a person has more articles
    1. and I have used as a SPA ref the first one I have found for a person, I havnt checked in the book if the SPA picture is from a specific article for a person
    2. just to make it more complex Omar who scanned the articles has "edit" them so that the photo and the article is on one picture....

@BobBorges
Copy link

Thanks @salgo60. I'll pay closer attention to that in the future.

@salgo60
Copy link
Owner Author

salgo60 commented May 10, 2023

Thanks @salgo60. I'll pay closer attention to that in the future.

You did correct but as you have maybe seen I start to add the SPA ref in the citations and maybe have added the SPA from the wrong article... 😭 the problem I guess be for people consuming the corpus...

I hope your corpus will add better data with better time precision and easy to check with the TEI protocols...


OT the architect of Wikidata Denny did a very nice video of the development of Wikidata - Wikidata the making of 15 min

The history of Wikidata, from its inception to its creation and the first ten years of operation, up to 2022. Paper was presented at The Web Conference 2023, Special Track on the History of the Web, Austin, TX, USA.

  • Paper 10.1145/3543873.3585579 HTML pdf

image

image image

Grafana <-> Wikidata for data quality etc...

  • Wikidata use Grafana for stats and data quality see example #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants