Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unnecessary requests when accessing fields that are not set #106

Closed
rensbaardman opened this issue Jun 14, 2022 · 5 comments · Fixed by #113
Closed

Prevent unnecessary requests when accessing fields that are not set #106

rensbaardman opened this issue Jun 14, 2022 · 5 comments · Fixed by #113

Comments

@rensbaardman
Copy link

rensbaardman commented Jun 14, 2022

When accessing fields which are not set in the Discogs database (such as the real_name for an artist which is not a person), this can lead to an unnecessary extra request.

Take this example, where we first access the name, triggering the first request. Then we access the real_name (which is not set for The Beatles, since it is not a person), triggering the second request to the same endpoint.

import discogs_client
import logging
logging.basicConfig(level=logging.DEBUG)

client = discogs_client.Client('ExampleApplication/0.1')
artist = client.artist(82730)
print(f'{artist.name=}')
print(f'{artist.real_name=}')

This results in the following:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.discogs.com:443
DEBUG:urllib3.connectionpool:https://api.discogs.com:443 "GET /artists/82730 HTTP/1.1" 200 None
artist.name='The Beatles'
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.discogs.com:443
DEBUG:urllib3.connectionpool:https://api.discogs.com:443 "GET /artists/82730 HTTP/1.1" 200 None
artist.real_name=None

I think the problem lies in the fact that the Discogs API does not return a value (e.g. None) when it is not set. And discogs_client only sets/updates a value when it is explicitly returned by the API.

I can see two solutions:

  • specifying which fields can be returned when fetching from a specific endpoint, and then setting them to None (or similar) when they are not returned
  • cache the response of the first request
@rensbaardman rensbaardman changed the title Prevent unnecessary requests when accessing field that are not set Prevent unnecessary requests when accessing fields that are not set Jun 14, 2022
@AnssiAhola
Copy link
Contributor

Hi @rensbaardman

Thanks for filing an issue for this.

The current implementation checks for the given key/property in "cached" data from the first request.
If it isn't found, the client tries to refresh and fetch the data again using the resource url.
If STILL it isn't found then that key/property will be marked as a "known invalid key".

def fetch(self, key, default=None):
if key in self._known_invalid_keys:
return default
try:
# First, look in the cache of pending changes
return self.changes[key]
except KeyError:
pass
try:
# Next, look in the potentially incomplete local cache
return self.data[key]
except KeyError:
pass
# Now refresh the object from its resource_url.
# The key might exist but not be in our cache.
self.refresh()
try:
return self.data[key]
except:
self._known_invalid_keys.append(key)
return default

One solution might be that we check if the original endpoint matches the resource url of that object, then we can assume that refreshing wont suddenly populate that field and just return Null/default value, thus preventing the extra/unnecessary request.
Problem with this is to figure out how/where to check for the original endpoint.

I can look into this at some point this week. That is if you don't want to take a crack at it?

Cheers!
-Anssi

@rensbaardman
Copy link
Author

Hey @AnssiAhola, thanks for the explanation and quick reply.

I already checked out the code to see if I could fix it, but I don't feel comfortable enough yet with the code base to do it myself. I do believe your proposed solution would solve the problem.

By the way, I really like the library! It is a joy to work with, especially with the automatic rate limiting, and the smart 'lazy-fetching'. It has proved very useful so far, so thank you very much!

@AnssiAhola AnssiAhola linked a pull request Jun 29, 2022 that will close this issue
@JOJ0
Copy link
Contributor

JOJ0 commented Aug 4, 2022

@rensbaardman have a look at the linked PR. @AnssiAhola might have improved the request behaviour. It would help us a lot if you find the time to test the branch. If you need assistance with installing from that feature branch I can help. Thanks and all the best!

@rensbaardman
Copy link
Author

rensbaardman commented Aug 15, 2022

Thanks for taking the time to implement a solution! This solves the problem I originally had, so that is great.

Unfortunately, I think the implementation has a bug: since a new request will look at the previous_request from the client, this means that recreating the same artist won't get the correct data:

import discogs_client
import logging
logging.basicConfig(level=logging.DEBUG)

client = discogs_client.Client('ExampleApplication/0.1')

artist = client.artist(82730)
print(f'{artist.name=}')

same_artist = client.artist(82730)
print(f'{same_artist.name=}')

returns

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.discogs.com:443
DEBUG:urllib3.connectionpool:https://api.discogs.com:443 "GET /artists/82730 HTTP/1.1" 200 1789
artist.name='The Beatles'
same_artist.name=None

As you see, the second time we try to fetch the artist, this won't trigger a new download since the client.previous_request already points to this artists endpoint.

I think changing self.previous_request = client.previous_request to self.previous_request = None in the __init__ should fix this:

class PrimaryAPIObject(APIObject):
"""A first-order API object that has a canonical endpoint of its own."""
def __init__(self, client, dict_):
self.data = dict_
self.client = client
self._known_invalid_keys = []
self.changes = {}
self.previous_request = client.previous_request

And I believe saving the previous_request in client has no purpose then, so it is probably better to remove that part in client.

@AnssiAhola
Copy link
Contributor

Good catch! Fixed it and removed the previous_request from client.
Also added a simple test to prevent stuff like this from slipping through in the future 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants