Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob empty return when ulimit -n is reached #103501

Open
Uinelj opened this issue Apr 13, 2023 · 5 comments
Open

glob empty return when ulimit -n is reached #103501

Uinelj opened this issue Apr 13, 2023 · 5 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Uinelj
Copy link

Uinelj commented Apr 13, 2023

Bug report

When having a lot of opened files, glob can return an empty list rather than either raising an error or returning a non-empty list.

With a ulimit set at 256 (ulimit -n 256), and with a file named foo:

from glob import glob

print(glob("*")) # expectedly prints files in current folder
handles = [(x for x in open("foo")) for _ in range(253)] # create 253 file handles on foo
print(len(handles)) # returns 253
print(glob("*")) # prints []

My intuition is that glob should inform the user that it couldn't do its job properly (like open raises OSError if we try to open too many files), because right now there's an uncertainty on whether a given folder is empty or ulimit -n has been reached.

An OSError is caught in glob.py, but not propagated: https://github.com/python/cpython/blob/3.11/Lib/glob.py#L172

Your environment

  • CPython versions tested on: 3.10, 3.11
  • Operating system and architecture: macOS 12.0.1 (for 3.10), Debian 11 (for 3.11, from docker's python:latest), both x86_64

Linked PRs

@Uinelj Uinelj added the type-bug An unexpected behavior, bug, or error label Apr 13, 2023
@Uinelj Uinelj changed the title glob empty return when ulimit is reached glob empty return when ulimit -n is reached Apr 13, 2023
aisk added a commit to aisk/cpython that referenced this issue Oct 19, 2023
aisk added a commit to aisk/cpython that referenced this issue Nov 6, 2023
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 25, 2023
aisk added a commit to aisk/cpython that referenced this issue Jan 11, 2024
@serhiy-storchaka
Copy link
Member

What is the behavior of the pathname expansion in the Unix shell in such case? Errors like broken symlinks, permission deny, too long path are ignored. Maybe all errors are ignored. And glob.glob() conforms this.

Maybe in future we will add an option to make glob.glob() not ignoring all OS errors, or even specify an error handler to ignore only some errors.

@encukou
Copy link
Member

encukou commented Mar 12, 2024

Maybe in future we will add an option to make glob.glob() not ignoring all OS errors

This might be a good use case for ExceptionGroup :)

The current behaviour is that OSError are ignored. See recent discussions in e.g. #104292 and (for pathlib's glob) 104141.

But EMFILE is a bit different than permission issues or disk failures: it's transient, it'll go away if you close a few files. Perhaps it is worth it to treat it specially.
@barneygale, do you have an opinion here? (I assume pathlib's behaviour should match glob's)

@serhiy-storchaka
Copy link
Member

It is not a good case for ExceptionGroup. There might be many thousands files and directories in the tree.

There are three strategies to handle errors:

  • Fail. Raise an exception to signal about error and stop the whole process.
  • Ignore. In the best case the errors can be logged or saved for further analysis.
  • Try to fix and repeat the failed operation.

The "fail" and "ignore" strategies are supported in other complex operations (like os.walk(), shutil.rmtree(), shutil.copytree()) and controlled by specifying a user callbacks or just a boolean flag. But the "retry" strategy is more complex and is not well supported in the current stdlib code. For example, to fix EMFILE in glob() the user needs to close some other file descriptors, the glob() code cannot know what file descriptors can be closed and when they are closed. And then there is a question what part of the code should be repeated after the fix.

@barneygale
Copy link
Contributor

FWIW, pathlib calls stat() on the top-level path and suppresses some (but not all) OSErrors. Any errors from deeper paths are totally suppressed.

@encukou
Copy link
Member

encukou commented Mar 13, 2024

Right. ExceptionGroup only works for “ignore” (for logging the errors).

The 3 strategies can be mixed. Thinking about it, in most of my uses of glob, I'm OK with ignoring PermissionError (not listing inaccessible files), but would rather fail on EMFILE (where I'm losing info about files that are normally accessible).
And that's what's reported in this issue. The trouble is that if we go that way, we'd be expected to have a good default for all errors on all platforms, which probably isn't feasible. The user callback seems better -- even if it doesn't support retrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

5 participants