-
Notifications
You must be signed in to change notification settings - Fork 67
Add close method to ROOTDictionary. Fixes #438 #439
Conversation
|
Yes 🤦♂ thanks 😉 |
In doing this, we must remove because otherwise, the following would cause a segfault: with uproot.open("...") as f:
t = f["sometree"]
t.arrays(...) All the other sources should have good "close" behavior. Somebody needs to review this and make sure it's sane. I just looked now and I can't find the HTTP Somebody needs to "own" the uproot sources. I haven't been "owning" them sufficiently: I've been letting you guys (@tamasgal, @nsmith-, @chrisburr, @PerilousApricot) make changes that benefit downstream applications, because after all, that's what uproot exists for, but without some centralized coordination, what's good for one application might cause harm to another application. It doesn't help that I am not as knowledgeable about HTTP and XRootD as you are. Does anybody have a suggestion about how to make sure the "sources" part of uproot gets pwned? |
Hmm, I am of course aware of the segfault situation, however I usually do not even try things like this, I mean accessing objects created inside a context where memory mapping may happen in the "outer scope". It technically isn't another scope but lazy data will inevitably cause problems. In such cases I usually grab the data inside the context in a way that I know that I am not depending on the resources which are about to be closed. I think transferring the ownership is one way out of this, but it still feels unnatural to me since I would assume that my file is closed outside of the Maybe it's just me 🦆 |
Educating users about proper context management is okay, but only if it happens via an exception with a good error message, not a segfault. The problem with segfaults is that they could mean a wide variety of things, and the general contract with Python is that segfaults simply shouldn't happen—users shouldn't have to debug them, ever. (The only cases I know of that allow segfaults in Python go through NumPy, the standard library So there's another alternative: so if this method can check for a closed file (I don't know if it has a flag, but we can add one), it can raise an exception if the file has been closed, narrowly escaping the segfault. It should also be relatively lightweight, as nearly all file-reading goes through this The issue of setting good, consistent policies for all uproot sources (i.e. "owning" them) is a bigger issue, though. |
Yes, I fully agree that we should avoid segfaults by any means, don't get me wrong! I am not satisfied with my PR either, but I hope we figure out some nice way to get this in since calling the GC manually also feels awkward 😉 |
I pushed another commit which avoids the segfault. Maybe this goes into the right direction? |
Here is it in action:
|
Sorry for the spam, I just read your comment about the |
A property also works; it also protects against users who dig down to the There's a tiny question about performance because this is hit so often, but I'm willing to forego that. Anyway, I don't want to lose the thread about ownership of uproot sources, since they're tweaked so often like this. Maybe this PR isn't the best place to do that—maybe an issue, though "ownership" is not a very concrete thing, unlike a bug. |
I think the ownership is a separate issue since it would only shift the memory problem to another place. I also tried to do |
Well to transfer ownership, it's worth thinking again about the size of the API. I'd re-iterate here that having a almost-trivial buffer interface as the extent of the source API seems like the best plan, where the uproot code would have to find some other mechanism to supply optimization hints to the concrete Source. Regardless, let's make an issue to discuss the Source thing. |
Yes, let's move the question of ownership to an issue. Before we do that, however, I think there's a (very understandable!) confusion about what I meant: not "which object in the code owns references to the data" but "which human being is responsible for making sure that reasonable-looking changes to one thing don't break another thing." I brought it up here because the segfault thing would have been a disaster, and I don't know what other disasters lurk. Another example was that I couldn't even find the All in all, it's not a lot of code: > cd uproot/source
> wc source.py memmap.py chunked.py file.py http.py xrootd.py
46 114 1144 source.py
49 112 1266 memmap.py
150 477 5158 chunked.py
50 148 1751 file.py
59 188 2172 http.py
109 344 4189 xrootd.py
463 1383 15680 total But let's move that to an issue and just make this PR about |
Ok now I see your point. I fully agree with the segfault potentially causing a disaster. ...but yes, the responsibility of a low-level library architect is a quite hot topic... |
Btw. where should I put the tests? I am a bit unsure... |
If you have tests, they wouldn't be too out of place in That said, I don't see much need for a test for something like this, especially now that the |
Alright, let me just add one test for the exception, I think that should be ok. |
When working with multiple files, the GC is taking too long to release some big memory chunks which are created by the
ROOTDictionary
sources due to cyclic reference (see #438). A manual call togc.collect()
is a workaround, however closing thesource
of theROOTDictionary
also triggers the clean-up reliable.This pull request adds the
close()
method toROOTDictionary
which is also called in the__exit__
method when used with a context manager.