Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

andrewtavis · 2022-01-12T14:40:17Z

Hello :)

I'm trying to use WikidataIntegrator to update a series of Wikidata generated JSONs that I have in the app Scribe - Language Keyboards. Scribe has tons of .sparql files that to now need to be ran independently, and I'm working on an issue where all of these will be ran by a single Python file.

Looking at wdi_core.py, I don't see how I'm able to pass over unsuccessful queries and move onto the next ones. response.raise_for_status() in execute_sparql_query at times returns an error, and in these cases there doesn't seem to be a returned value via results = response.json() that I could then use as a trigger to skip the operation.

If returning None or some other response would be acceptable to you all in these cases, I'd be happy to write a PR to do this. Any other suggestions would also be very appreciated :)

Thanks for your time!

The text was updated successfully, but these errors were encountered:

LeMyst · 2022-01-13T09:33:17Z

Hello @andrewtavis

Do you have some code example? Because I think you can try/catch the exception raised by raise_for_status()

andrewtavis · 2022-01-13T10:51:08Z

Hello @LeMyst, thanks for the reply :)

Here's the part of Scribe-iOS/Data/update_data.py where I'm using execute_sparql_query:

with open(query_path) as file:  # query_path leads to a .sparql file
    query_lines = file.readlines()

query = wdi_core.WDFunctionsEngine.execute_sparql_query("".join(query_lines))

query_results = query["results"]["bindings"]  # being a dict if working

Maybe a try/catch on the part where I'm running the query? I'm just not sure how to go about dealing with a lack of returned result from execute_sparql_query.

Thanks again :)

andrewtavis · 2022-01-13T10:57:12Z

I guess that I could try/catch response.raise_for_status() in a fork of WikidataIntegrator for my own purposes, but then this might be awkward later on when hopefully others will be contributing and adding data to Scribe (one can hope 😊).

LeMyst · 2022-01-13T18:00:08Z

A missing thing in your code is the lack of test if query is None (occur when the max_tries is reached).
You can go outside your for-loop when the exception occur.

with open(query_path) as file:  # query_path leads to a .sparql file
    query_lines = file.readlines()

try:
    query = wdi_core.WDFunctionsEngine.execute_sparql_query("".join(query_lines))
except HTTPError as err:
    print(f'HTTPError with {query_name}: {err}')
    continue

if query:
    query_results = query["results"]["bindings"]  # being a dict if working
    ...
else:
    print(f'Nothing returned by the SPARQL server for {query_name}')

Can you give the the HTTP error code you have with the raise_for_status() ? I think WikidataIntegrator miss some frequent code, and should have something like this instead of just 503:
if response.status_code in (500, 502, 503, 504):

I hope I correctly understood your issue.

andrewtavis · 2022-01-15T09:08:27Z

Thanks for your further help!

Issue was that the SPARQL itself was malformed (apologies for not checking more rigorously), but then in this case except HTTPError as err: did not trigger and the function still hung without a result. The SPAQRL was wrong as I was filtering based on a variable I'd renamed, but not in the filter itself. The WDQS result from this is Query timeout limit reached. Explicitly:

Say you have this query of Swedish nouns and pronouns
If you go to the optional FILTER NOT EXISTS and change lexeme to just l the query will break with the result Query timeout limit reached
If this broken query is passed to WikidataIntegrator's execute_sparql_query then there will be no HTTP error, and the result will just hang until a keyboard interruption or other cancellation method (there will be a message printed to the terminal though: Backing off 1.0 seconds afters 1 tries calling function with args followed by the query)

Would it maybe be an improvement if WikidataIntegrator could handle these kinds of issues? Not sure what your thoughts are on this.

LeMyst · 2022-01-15T10:37:26Z

Ok, I can reproduce your issue.

To explain, because the error code 500 (query timeout by the server) is not catch by execute_sparql_query(), the method automatically fallback to backoff/wdi_backoff because of the raise_for_status(). That's why you can't catch it.

By default, wdi_backoff does an infinite amount of retries, you can change that by adding theses lines at the top of your script to change the default behaviour:

from wikidataintegrator.wdi_config import config as wdi_config

wdi_config['BACKOFF_MAX_TRIES'] = 2

andrewtavis · 2022-01-15T11:08:34Z

Wonderful :) I just changed it to wdi_config['BACKOFF_MAX_TRIES'] = 1 as that works for my purposes.

For documentation, the import for HTTPError is from requests.exceptions import HTTPError (there are a couple libraries with HTTPError). This does return a 500 error code through the exception, and I've figured out how to get the rest running from there :)

Thanks so much for your help!

andrewtavis closed this as completed Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

andrewtavis commented Jan 12, 2022 •

edited

Loading

LeMyst commented Jan 13, 2022

andrewtavis commented Jan 13, 2022 •

edited

Loading

andrewtavis commented Jan 13, 2022 •

edited

Loading

LeMyst commented Jan 13, 2022

andrewtavis commented Jan 15, 2022 •

edited

Loading

LeMyst commented Jan 15, 2022

andrewtavis commented Jan 15, 2022

Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

Unable to handle errors from WDFunctionsEngine.execute_sparql_query #189

Comments

andrewtavis commented Jan 12, 2022 • edited Loading

LeMyst commented Jan 13, 2022

andrewtavis commented Jan 13, 2022 • edited Loading

andrewtavis commented Jan 13, 2022 • edited Loading

LeMyst commented Jan 13, 2022

andrewtavis commented Jan 15, 2022 • edited Loading

LeMyst commented Jan 15, 2022

andrewtavis commented Jan 15, 2022

andrewtavis commented Jan 12, 2022 •

edited

Loading

andrewtavis commented Jan 13, 2022 •

edited

Loading

andrewtavis commented Jan 13, 2022 •

edited

Loading

andrewtavis commented Jan 15, 2022 •

edited

Loading