Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib sets wrong Content-Length for pseudo files on Linux #93296

Closed
illia-v opened this issue May 27, 2022 · 5 comments
Closed

urllib sets wrong Content-Length for pseudo files on Linux #93296

illia-v opened this issue May 27, 2022 · 5 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@illia-v
Copy link
Contributor

illia-v commented May 27, 2022

Bug report

A value of the Content-length header returned by urllib.request.FileHandler.open_local_file may mismatch the length of data on Linux.

This happens when a file from a special file system (e.g., procfs or sysfs) is requested.

open_local_file relies on st_size; st_size is equal to zero for pseudo files on Linux.

cpython/Lib/urllib/request.py

Lines 1506 to 1511 in 8a0d9a6

size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(filename)[0]
headers = email.message_from_string(
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
(mtype or 'text/plain', size, modified))

Example

>>> import urllib.request
>>> url = "file:///proc/cpuinfo"
>>> handler = urllib.request.FileHandler()
>>> response = handler.file_open(urllib.request.Request(url))
>>> data = response.read()
>>> headers = response.info()
>>> assert int(headers["Content-length"]) == len(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
>>> print(headers["Content-length"])
0
>>> print(len(data))
18294

Your environment

  • CPython versions tested on: 3.12.0 alpha 0
  • Operating system and architecture: Linux
@illia-v illia-v added the type-bug An unexpected behavior, bug, or error label May 27, 2022
@illia-v illia-v changed the title urllib.request.FileHandler.open_local_file sets Content-length: 0 for non-empty pseudo files urllib sets wrong Content-Length for pseudo files on Linux May 27, 2022
@AA-Turner AA-Turner added the stdlib Python modules in the Lib dir label May 27, 2022
illia-v added a commit to illia-v/cpython that referenced this issue May 27, 2022
@carlbordum
Copy link
Contributor

I'm not sure this really is a bug. What would you expect?

@illia-v
Copy link
Contributor Author

illia-v commented Jun 1, 2022

I'm not sure this really is a bug. What would you expect?

I would expect a value of Content-Length to be equal to the size of the HTTP message body regardless of the peculiarities of a file in this case.

@serhiy-storchaka
Copy link
Member

curl returns the same result:

$ curl --dump-header /dev/stdout file:///proc/cpuinfo
Content-Length: 0
Accept-ranges: bytes
Last-Modified: Fri, 16 Feb 2024 14:54:57 GMT

processor       : 0
...

Header Content-Length: 0 followed by non-empty body.

@encukou
Copy link
Member

encukou commented Mar 26, 2024

AFAIK, a similar discrepancy can happen if the file changes while it's being prepared/sent.

urllib asks the system for size, mtime and contents, and gives them to you.
I don't think it's urllib's job to second-guess the system.

The situation is the same as when a HTTP server sends a mismatched Content-Length, except here it's the kernel, not a server. It's up to you to handle it.

IMO, this is not an issue. If there are no objections, I'll close in a month.

@illia-v
Copy link
Contributor Author

illia-v commented Mar 27, 2024

curl returns the same result:

$ curl --dump-header /dev/stdout file:///proc/cpuinfo
Content-Length: 0
Accept-ranges: bytes
Last-Modified: Fri, 16 Feb 2024 14:54:57 GMT

processor       : 0
...

Header Content-Length: 0 followed by non-empty body.

This is interesting, it used to return no content, but that was changed in 2016 curl/curl#681 (comment)

The situation is the same as when a HTTP server sends a mismatched Content-Length, except here it's the kernel, not a server. It's up to you to handle it.

IMO, this is not an issue. If there are no objections, I'll close in a month.

As a user I'd expect a server to handle known cases, but I don't insist on keeping the issue open.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants