-
-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support coalesced fsevents
#734
Support coalesced fsevents
#734
Conversation
So that we can effectively decide if we need to perform additional system calls to figure out what really happened.
@CCP-Aporia, the approach looks good to me. I probably would have implemented this on the Python side. But maybe that's just because I am more comfortable with Python... One more type of coalescing which I have noticed is with I tried to address this in #728 by prioritising created / deleted over modified but realise now that it's probably the wrong approach. Flagging those events as coalesced and emitting multiple events would be better. |
This is unavoidable and already was an issue with the previous implementation (snapshot diffs triggered by FSEvents). |
It's more pythonic, and the `_event_type` implementation wasn't quite usable anyway. NB: the representation is not truly copy/paste python code if there is a double quote inside event.path, but that should be a rare case so we don't add the expensive special case handling there.
Some Python debuggers create additional threads, so we shouldn't assume that there is only one.
It might be even better to check for the emitter class, as opposed to platform
Reduce the amount of 'silent breakage'
Just updated with a new snapshot of the current effort. There are some |
There is potentially a way around this race, but it's not exactly elegant. From within |
If this works, it would do away with unnecessarily long wait times and flaky retry logic. |
Ah yes, of course, a sentinel event is a great idea, and works perfectly. :-) It also means we don't have to expect any events for filesystem operations before we call |
So that we can avoid a race between test setup and fseventsd
Sorry guys, I did the 1.0.2 release. I was trying to add wheels to the current release and it seems the master branch was used ... So let's rebase your work on master. |
Hi @CCP-Aporia, are you still working on this? If yes, I've just had a look at the failing tests and left a few comments inline. |
I've also noticed that for some reason the sentinel event does not register when just using |
Regarding the handling of coalesced events, I can see your inline comment:
Are you proposing to manually cache some / all past events? Say we have created a new watch and a file is rapidly created and deleted -- there would be no way to determine the order of events because we don't have any history yet. Of course, we could start off with a directory snapshot, but those can use a lot of memory for large hierarchies. What is your opinion on just checking if the path exists instead to order created and deleted events? Modified events cannot reasonable be ordered but we can always queue them after creations and before deletions. What I am proposing is essentially a separate logic for coalesced events, that queues events in an order depending on the current state of the path. It's not ideal but a similar approach is taken for instances in the inotify backend to generate event trees for moved directories ( |
I'm still working on this and my local version is a bit ahead of what's in here. But progress was somewhat slow because of the holidays. |
My current local implementation is keeping a dictionary that maps an
Indeed, an existence check is necessary to decide if a file was created or deleted.
After working on the coalesced events a bit more then these seem to be the norm for our test suite since it runs so fast so I haven't worked on two separate code paths. That said, the logic for coalesced events is somewhat complex, and if separate code paths make it a less complicated implementation then I'll transition to it for sure. 🙂 |
True, separating the logic for coalesced and "regular" events only makes sense if it makes the code easier to understand and maintain. As it stands, it is still possible to follow the code flow quite well (but maybe I'm just saying this because I have been involved in the discussions). The worst-case scenario for the cache of From looking at the code, the stat cache is only actually used to suppress false modified events. When do those actually occur? And is suppressing them worth the tradeoff in memory and code complexity? Or will there be other uses for the cache? Finally, something I noticed about the current logic: if a creation and deletion are coalesced into a single event, the current code will only emit the second event instead of emitting both in the "correct" order. Is this deliberate or accidental? Of course, there is no way to disentangle a chain with more than two coalesced events of the same type (for example created -> deleted -> created). In this case, emitting a single created event may indeed be better than emitting deleted + created events. Do we actually get more than two events of the same type (e.g., creations) coalesced into a single event in the test suite? |
These are great comments and questions.
Agreed, and I need to profile this as we use watchdog with a 350.000 file folder.
The reason here is two-fold. One is to filter out erroneous
This was intentional. When I started going through the emitter tests then the delete test would fail without the existence check. However, I just tested this again, and it seems like this is no longer necessary. The latest version should emit an event for creation and deletion. But I think this scenario is missing an explicit test.
|
I think this is ready for review; the CI test failure for py310 is still the issue we've seen before and not related to the changes here. |
@samschott yes, I was not saying we do not care about such scenarios. As I understand it, the patch improves the situation, so it is a good thing :) |
Thank you @CCP-Aporia for the explanations.
Hm, I have just reproduced this myself and I'm a bit puzzled by the behaviour. Why is this additional |
While the `filesystem_view` helps with filtering out additional `FileCreatedEvent`+`DirModifiedEvent` pairs then it also introduces a huge amount of edge cases for synthetic events caused by move and rename operations. On top of that, in order to properly resolve those edge cases we'd have to go back to a solution very similar to the old directory snapshots, with all the performance penalties they suffered from... As such I think it's better to acknowledge the behaviour for coalesced events instead, and thus remove the `filesystem_view` again.
@BoboTiG - is there anything more you'd like to have changed in this PR? It's including @samschott's work from #729 as well (out of necessity to gain confidence in the changes), and the tests on Python 3.10 are failing due to the |
If @samschott is OK with that, I am OK to merge :) |
This looks good to me. Thank you @CCP-Aporia for finally having a fast and reliable macOS emitter! The improvements in performance and memory usage are both significant. The only minor issue lies still with renaming files when changing only the casing. As described in the code comments, this currently emits two |
We can do that, yes. This is something I really would like to be fixed though. Thank you both, that's awesome 💪 |
@BoboTiG, I have just realised that you are also working on desktop client for file syncing (nuxeo desktop). And I can see that you also have watchdog pinned to version 0.10.3. So I expect that our use cases, requirements and tests will be very similar. I'll see what I can do regarding the case changes. It appears that there might be pattern in the native event IDs again but it would be nice if we wouldn't need to rely an undocumented behaviour here. |
Exactly :) |
* Fix installation of Python 3.7.9 on Windows (#705) * [windows] winapi.BUFFER_SIZE now defaults to 64000 (instead of 2048) To handle cases with lot of changes, this seems the highest safest value we can use. Note: it will fail with `ERROR_INVALID_PARAMETER` when it is greater than 64 KB and the application is monitoring a directory over the network. This is due to a packet size limitation with the underlying file sharing protocols. https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-readdirectorychangesw#remarks Also introduced `winapi.PATH_BUFFER_SIZE` (defaults to `2048`) to keep the old behavior with path-realted functions. * [mac] Regression fixes for native fsevents (#717) * [mac] Fix missing event_id attribute in fsevents (#722) * [mac] Return byte paths if a byte path was given in fsevents (#726) * Pin py.test to a version that supports 2.7 * Disable pypy2 tests on Windows They hang and timeout after 6 hours for some unknown reason * [mac] Add compatibility with old macOS versions (#733) * Ensure version macros are available * Uniformize event for deletion of watched dir (#727) * [inotify] Add support for the IN_CLOSE_WRITE event (#747) * [mac] Support coalesced fsevents (#734) * Add `is_coalesced` property to `NativeEvent` So that we can effectively decide if we need to perform additional system calls to figure out what really happened. * Replace `NativeEvent._event_type` with `repr()` support It's more pythonic, and the `_event_type` implementation wasn't quite usable anyway. NB: the representation is not truly copy/paste python code if there is a double quote inside event.path, but that should be a rare case so we don't add the expensive special case handling there. * Allow running tests with debugger attached Some Python debuggers create additional threads, so we shouldn't assume that there is only one. * Request notifications for watched root * Expect events on macOS instead of using `time.sleep()` It might be even better to check for the emitter class, as opposed to platform * Add exception handling to FSEventsEmitter Reduce the amount of 'silent breakage' * Use sentinel event when setting up tests on macOS So that we can avoid a race between test setup and fseventsd * Improve handling of coalesced events * Revert accidental platform check change * Fix renaming_top_level_directory test on macOS * Generate sub events for move operations * Remove `filesystem_view` again While the `filesystem_view` helps with filtering out additional `FileCreatedEvent`+`DirModifiedEvent` pairs then it also introduces a huge amount of edge cases for synthetic events caused by move and rename operations. On top of that, in order to properly resolve those edge cases we'd have to go back to a solution very similar to the old directory snapshots, with all the performance penalties they suffered from... As such I think it's better to acknowledge the behaviour for coalesced events instead, and thus remove the `filesystem_view` again. * [mac] Improve handling of rename events (#750) * drop support for macOS 10.12 and lower Co-authored-by: SamSchott <ss2151@cam.ac.uk> Co-authored-by: Boris Staletic <boris.staletic@gmail.com> Co-authored-by: Mickaël Schoentgen <contact@tiger-222.fr> Co-authored-by: Dustin Ingram <di@users.noreply.github.com> Co-authored-by: ysard <ysard@users.noreply.github.com> Co-authored-by: Lukas Šupienis <l.supienis@ncs.lt> Co-authored-by: Lukas Šupienis <lukas.supienis@gmail.com>
Starting this as a draft PR to get some early feedback.
The general idea is to expose an
is_coalesced
property onNativeEvent
instances. This allows theNativeEvent
->FileSystemEvent
conversion to perform additional checks onNativeEvent.path
if necessary. With these additional checks theNativeEvent
->FileSystemEvent
mechanism then has a chance to execute "a chain of properly sorted if-clauses instead of if-else clauses".Do note that in case of coalesced events it is impossible to know how many times
NativeEvent.path
was created, removed or renamed. It is only possible to know that such an event occurred at least once within anfseventsd
update interval.@samschott - please let me know if you think the list of "coalesced event" bit masks is sufficient or not. As far as I can imagine right now then the tricky bits with coalesced events are around creation, removal, and rename of an item in an observed path.