Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

Closed
andrewtavis opened this issue Jan 12, 2022 · 7 comments
Closed

Comments

@andrewtavis
Copy link

andrewtavis commented Jan 12, 2022

Hello :)

I'm trying to use WikidataIntegrator to update a series of Wikidata generated JSONs that I have in the app Scribe - Language Keyboards. Scribe has tons of .sparql files that to now need to be ran independently, and I'm working on an issue where all of these will be ran by a single Python file.

Looking at wdi_core.py, I don't see how I'm able to pass over unsuccessful queries and move onto the next ones. response.raise_for_status() in execute_sparql_query at times returns an error, and in these cases there doesn't seem to be a returned value via results = response.json() that I could then use as a trigger to skip the operation.

If returning None or some other response would be acceptable to you all in these cases, I'd be happy to write a PR to do this. Any other suggestions would also be very appreciated :)

Thanks for your time!

@LeMyst
Copy link
Contributor

LeMyst commented Jan 13, 2022

Hello @andrewtavis

Do you have some code example? Because I think you can try/catch the exception raised by raise_for_status()

@andrewtavis
Copy link
Author

andrewtavis commented Jan 13, 2022

Hello @LeMyst, thanks for the reply :)

Here's the part of Scribe-iOS/Data/update_data.py where I'm using execute_sparql_query:

with open(query_path) as file:  # query_path leads to a .sparql file
    query_lines = file.readlines()

query = wdi_core.WDFunctionsEngine.execute_sparql_query("".join(query_lines))

query_results = query["results"]["bindings"]  # being a dict if working

Maybe a try/catch on the part where I'm running the query? I'm just not sure how to go about dealing with a lack of returned result from execute_sparql_query.

Thanks again :)

@andrewtavis
Copy link
Author

andrewtavis commented Jan 13, 2022

I guess that I could try/catch response.raise_for_status() in a fork of WikidataIntegrator for my own purposes, but then this might be awkward later on when hopefully others will be contributing and adding data to Scribe (one can hope 😊).

@LeMyst
Copy link
Contributor

LeMyst commented Jan 13, 2022

A missing thing in your code is the lack of test if query is None (occur when the max_tries is reached).
You can go outside your for-loop when the exception occur.

with open(query_path) as file:  # query_path leads to a .sparql file
    query_lines = file.readlines()

try:
    query = wdi_core.WDFunctionsEngine.execute_sparql_query("".join(query_lines))
except HTTPError as err:
    print(f'HTTPError with {query_name}: {err}')
    continue

if query:
    query_results = query["results"]["bindings"]  # being a dict if working
    ...
else:
    print(f'Nothing returned by the SPARQL server for {query_name}')

Can you give the the HTTP error code you have with the raise_for_status() ? I think WikidataIntegrator miss some frequent code, and should have something like this instead of just 503:
if response.status_code in (500, 502, 503, 504):

I hope I correctly understood your issue.

@andrewtavis
Copy link
Author

andrewtavis commented Jan 15, 2022

Thanks for your further help!

Issue was that the SPARQL itself was malformed (apologies for not checking more rigorously), but then in this case except HTTPError as err: did not trigger and the function still hung without a result. The SPAQRL was wrong as I was filtering based on a variable I'd renamed, but not in the filter itself. The WDQS result from this is Query timeout limit reached. Explicitly:

  • Say you have this query of Swedish nouns and pronouns
  • If you go to the optional FILTER NOT EXISTS and change lexeme to just l the query will break with the result Query timeout limit reached
  • If this broken query is passed to WikidataIntegrator's execute_sparql_query then there will be no HTTP error, and the result will just hang until a keyboard interruption or other cancellation method (there will be a message printed to the terminal though: Backing off 1.0 seconds afters 1 tries calling function with args followed by the query)

Would it maybe be an improvement if WikidataIntegrator could handle these kinds of issues? Not sure what your thoughts are on this.

@LeMyst
Copy link
Contributor

LeMyst commented Jan 15, 2022

Ok, I can reproduce your issue.

To explain, because the error code 500 (query timeout by the server) is not catch by execute_sparql_query(), the method automatically fallback to backoff/wdi_backoff because of the raise_for_status(). That's why you can't catch it.

By default, wdi_backoff does an infinite amount of retries, you can change that by adding theses lines at the top of your script to change the default behaviour:

from wikidataintegrator.wdi_config import config as wdi_config

wdi_config['BACKOFF_MAX_TRIES'] = 2

@andrewtavis
Copy link
Author

Wonderful :) I just changed it to wdi_config['BACKOFF_MAX_TRIES'] = 1 as that works for my purposes.

For documentation, the import for HTTPError is from requests.exceptions import HTTPError (there are a couple libraries with HTTPError). This does return a 500 error code through the exception, and I've figured out how to get the rest running from there :)

Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants