Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON API package queries returning empty response "randomly" #10387

Closed
j-martin opened this issue Nov 18, 2021 · 14 comments
Closed

JSON API package queries returning empty response "randomly" #10387

j-martin opened this issue Nov 18, 2021 · 14 comments
Labels

Comments

@j-martin
Copy link

j-martin commented Nov 18, 2021

Describe the bug
When querying the JSON API, the body of the response is sometimes empty (as-in b''). Querying the same endpoint/package after returns a proper JSON response.

Looks like a caching issue on Pypi's side.

I have submitted this PR to poetry to work around the issue.

Got empty response from PyPI for pypi/cffi/json. Retrying...
Got empty response from PyPI for pypi/charset-normalizer/json. Retrying...

Maybe pure pip is better at handling these as poetry queries those endpoints before invoking pip for the actual installation.

Expected behavior

The API should return a valid JSON response the first time it is queried.

To Reproduce

  1. Clear local pip cache
  2. poetry add <a-package-with-a-lot-of-dependencies>
  3. Hope one (or not) of the API response for a package returns a b''

My Platform
macOS BigSur and Monterey Intel python 3.9.7
macOS Monterey arm64 python 3.9.7

Additional context

@di
Copy link
Member

di commented Nov 18, 2021

@pradyunsg and @uranusjr, has pip experienced anything similar here?

@di
Copy link
Member

di commented Nov 18, 2021

@j-martin, any way you can provide us with the complete HTTP response, including headers?

@j-martin
Copy link
Author

@di I have added some extra logging to my local poetry install, but I could not get the issue to happen again. Most likely because another coworker hit (and resolved after a few retries) the same issue before me.

I'll keep the extra logging in, and hopefully, it repeats itself at some point.

@uranusjr
Copy link
Contributor

First ever time I’ve ever heard of this happening.

@pradyunsg
Copy link
Contributor

Well, pip does not use the JSON end point for PyPI (since that’s not standardised + doesn’t have any tamper-checking possibilities). There’s no way any pip user would notice this.

@kapilt
Copy link

kapilt commented Nov 21, 2021

here's some headers i grabbed while pdb'ing into poetry when this issue occurred and trying out a patch to auto retry
python-poetry/poetry#4717 (comment)

   4  ~/.local/share/pypoetry/venv/lib/python3.9/site-packages/poetry/repositories/pypi_repository.py:326 in _get                                                                              
       324│                                                                                    
       325│         import pdb; pdb.set_trace()                                                
     → 326│         json_data = json_response.json()                          
       327│                                                                                    
       328│         return json_data          
(Pdb) json_response.content
b''
(Pdb) pp dict(json_response.headers)
{'Accept-Ranges': 'bytes',
 'Access-Control-Allow-Headers': 'Content-Type, If-Match, If-Modified-Since, '
                                 'If-None-Match, If-Unmodified-Since',
 'Access-Control-Allow-Methods': 'GET',
 'Access-Control-Allow-Origin': '*',
 'Access-Control-Expose-Headers': 'X-PyPI-Last-Serial',
 'Access-Control-Max-Age': '86400',
 'Cache-Control': 'max-age=900, public',
 'Connection': 'keep-alive',
 'Content-Encoding': 'gzip',
 'Content-Length': '9341',
 'Content-Security-Policy': "base-uri 'self'; block-all-mixed-content; "
                            "connect-src 'self' https://api.github.com/repos/ "
                            '*.fastly-insights.com sentry.io '
                            'https://api.pwnedpasswords.com '
                            'https://2p66nmmycsj3.statuspage.io; default-src '
                            "'none'; font-src 'self' fonts.gstatic.com; "
                            "form-action 'self'; frame-ancestors 'none'; "
                            "frame-src 'none'; img-src 'self' "
                            'https://warehouse-camo.ingress.cmh1.psfhosted.org/ '
                            'www.google-analytics.com *.fastly-insights.com; '
                            "script-src 'self' www.googletagmanager.com "
                            'www.google-analytics.com *.fastly-insights.com '
                            "https://cdn.ravenjs.com; style-src 'self' "
                            'fonts.googleapis.com; worker-src '
                            '*.fastly-insights.com',
 'Content-Type': 'application/json',
 'Date': 'Sun, 07 Nov 2021 15:11:35 GMT',
 'ETag': '"JuWbHOCwMq+jjOgbucA39g"',
 'Referrer-Policy': 'origin-when-cross-origin',
 'Server': 'nginx/1.13.9',
 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload',
 'Vary': 'Accept-Encoding',
 'X-Cache': 'HIT',
 'X-Cache-Hits': '1',
 'X-Content-Type-Options': 'nosniff',
 'X-Frame-Options': 'deny',
 'X-Permitted-Cross-Domain-Policies': 'none',
 'X-PyPI-Last-Serial': '8237314',
 'X-Served-By': 'cache-wdc5550-WDC',
 'X-Timer': 'S1636297896.913867,VS0,VE1',
 'X-XSS-Protection': '1; mode=block'}

(Pdb) json_response.request
<PreparedRequest [GET]>
(Pdb) json_response.request.__dict__
{'method': 'GET', 'url': 'https://pypi.org/pypi/cached-property/json', 'headers': {'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'If-None-Match': '"JuWbHOCwMq+jjOgbucA39g"'}, '_cookies': <RequestsCookieJar[]>, 'body': None, 'hooks': {'response': []}, '_body_position': None}

i can reproduce this with some regularity (ubuntu 20.04, python 3.9.4, poetry 1.1.11), happy to provide additional context if helpful or try workarounds.

@di
Copy link
Member

di commented Nov 21, 2021

'Content-Length': '9341',

This indicates that our CDN is expecting the response to have a body, so either this is a bug with Fastly or something Poetry-specific.

I'd suggest attempting to remove all caching in Poetry and see if that alleviates the issue.

@j-martin
Copy link
Author

@di Looks like CacheControl is used here

I'll see if disabling it changes anything.

@kapilt
Copy link

kapilt commented Nov 22, 2021

fwiw, I confirmed that disabling when CacheControl / headers and we don't see any response issues.

note the original solution here was to just keep retrying till pypi/cdn returned content with a body also worked. afaics looking at the poetry code, the use of cache control is tied its ability to use a disk cache of packages, so non trivial to remove without affecting user experience.

[update] actually look at cache disk structure here its just the json response caching here, the package cache itself is managed separately.

@j-martin
Copy link
Author

Thanks for adding more detail @kapilt. I'll close this issue as it is not caused by pypi.

@kapilt
Copy link

kapilt commented Nov 24, 2021

@j-martin let's keep this open, afaics the issue is actually in pypi infrastructure, adding standard http caching headers to a request should not result in randomly broken/empty responses.

ie, poetry should be able to use http caching headers without getting empty responses back from pypi infrastructure. potentially its an issue with CacheControl, but that doesn't seem likely given that simply retrying works, wide spread usage of the package (300k+ daily downloads, used by pip, etc), no known issues there wrt to this.

disabling cache control in poetry is a work around / hack afaics.

@j-martin
Copy link
Author

I see, I thought the issue comes from poetry and not from pypi, but I think you are right, it's more how fastly responds to the request.

@j-martin j-martin reopened this Nov 24, 2021
@di
Copy link
Member

di commented Feb 18, 2022

I'm going to close this because we haven't gotten any reports about this other than the Poetry users in this issue, so I strongly suspect this is Poetry-specific, but feel free to reopen if we have evidence otherwise.

@di di closed this as completed Feb 18, 2022
@kapilt
Copy link

kapilt commented Feb 22, 2022

fwiw, i haven't actually hit this in a while. i think the actual issue was solved on the pypi cdn side, perhaps directly on the fastly side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants