Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for returning frames partially within span in filter and… #277

Merged

Conversation

KennethNielsen
Copy link
Contributor

… wire it up for aggregate

This is an attempt at a fix for #248. It adds an include_partial_frames argument to Frames.filter which if set True, and given a span, will return pseudo frames for frames that cross the span boundaries or for frames that envelope the entire span. The pseudo frames are copies of the original frames but where the start and/or endpoint has been edited to show the part of the frame that is within the span.

The include_partial_frames argument, is a keyword argument with default value False, which means that backwards compatibility should be kept for all other function until it can be decided if they need partial frames as well. For now, it is only wired up for aggregate.

Critique welcome:

  • I'm not quite happy with the name of the argument, suggestions welcome
  • This somewhat removed the elegance of the old implementation of the filter function and it is possible that the partially in semantics can be done cleverer. Let me know.

I'm happy to try and do a test for it, but I want to know if you are happy with the form before I do that.

watson/frames.py Outdated
def filter(self, projects=None, tags=None, span=None,
include_partial_frames=False):
for frame in self._rows:
if not (projects is None or frame.project in projects):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not (projects is None or frame.project in projects):
if projects is not None and frame.project not in projects:

Would be easier to read IMO :) .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was so focused on creating a minimal change, that I forgot about readability.

watson/frames.py Outdated
for frame in self._rows:
if not (projects is None or frame.project in projects):
continue
if not (tags is None or any(tag in frame.tags for tag in tags)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same spirit:

Suggested change
if not (tags is None or any(tag in frame.tags for tag in tags)):
if tags is not None and not any(tag in frame.tags for tag in tags)):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, as above

watson/frames.py Outdated
yield frame._replace(start=span.start)
elif ((span.start < frame.start < span.stop) and
frame.stop > span.stop):
yield frame._replace(stop=span.stop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would work as well (replacing my previous suggestion as well):

if span is not None and if frame not in span:
    if include_partial_frames:
        if (span.start < frame.start < span.stop or span.start < frame.stop < span.stop):
            frame = frame._replace(start=max(frame.start, span.start), stop=min(frame.stop, span.stop))
    continue

yield frame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k4nar I think that if you don't yield the result of frame._replace() you're going to lose it, because of the continue after it. Also, I'm not sure if the if inside the first condition is what you intended (...and if frame not...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using @davidag overlaps suggestion, I think it has good readability

Copy link
Collaborator

@k4nar k4nar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wonder if this couldn't be solved using the timeframe argument of the Span object: https://github.com/TailorDev/Watson/blob/2d861a41e29859a685916a37240c950b2de67d0c/watson/frames.py#L62 . But I haven't looked much into it though.

@davidag
Copy link
Contributor

davidag commented May 7, 2019

Actually I wonder if this couldn't be solved using the timeframe argument of the Span object:

https://github.com/TailorDev/Watson/blob/2d861a41e29859a685916a37240c950b2de67d0c/watson/frames.py#L62
. But I haven't looked much into it though.

I couldn't come up with anything useful using timeframe :(

Span could have another method to check if there is an overlap:

class Span(object):
    def __init__(self, start, stop, timeframe='day'):
        self.timeframe = timeframe
        self.start = start.floor(self.timeframe)
        self.stop = stop.ceil(self.timeframe)

    def overlaps(self, frame):
        return frame.start <= self.stop and frame.stop >= self.start

    def __contains__(self, frame):
        return frame.start >= self.start and frame.stop <= self.stop

I believe that with this new method, the proposed filter() code could be simplified.

@KennethNielsen
Copy link
Contributor Author

@k4nar @davidag thanks for the feedback and sorry for taking so long to get back to this.

As I mentioned I dropped the ball a little on readability, with being way to focussed on a small change. I took most of @k4nar suggestions and used @davidag overlaps suggestion. I think the result now has good readability. I rewrote filter into a very flat form using `continue and I think it works really well.

Let me know what you think and if more changes are needed. Also, if you want I can refactor into a single commit later.

@davidag
Copy link
Contributor

davidag commented Sep 13, 2019

This looks good to me @KennethNielsen ☺️

Let me add some suggestions for the next steps:

  1. Rebase your branch, resolve conflicts and (automatically) verify tests still pass.
  2. Add new tests for the changes in this PR.
  3. Add changelog entry.

@KennethNielsen KennethNielsen force-pushed the aggregate_events_across_dayboundaries branch from 5843834 to 57b04e2 Compare September 14, 2019 07:17
@KennethNielsen
Copy link
Contributor Author

@davidag great. I have rebased the branch on top of current master, added a test and updated the changelog. 2 minor things though.

  1. The rebase was non-trivial (due to the ignore project and tags changes also in filter), so it would be great if someone would glance this over once more before merging.
  2. flake8 doesn't pass due to a line changed in a test in 581a107. It is trivial to fix, but I didn't know what the policy was on mixing stuff in a PR, so I didn't fix it and therefore flake8 reports error. Let me know and I can just include it if need be.

Copy link
Contributor

@davidag davidag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing a bit with Watson using this PR code, there is something confusing to me. I'll try to explain myself with an example:

  • I have one frame that crosses days
$ watson log --from 2019-09-13
Friday 13 September 2019 (3h 00m 12s)
        c3d1082  22:54 to 01:54   3h 00m 12s  bonus
  • An aggregate report for the first day shows only the span that falls in that day, which is OK.
$ watson aggregate --from 2019-09-13 --to 2019-09-13
Fri 13 September 2019 - 1h 05m 26s
  bonus - 1h 05m 26s
  • If we extend the report to include both days, we get two entries as expected: OK, except that the sum is not the same by one second (3h 12s != 3h 11s).
$ watson aggregate --from 2019-09-13 --to 2019-09-14
Fri 13 September 2019 - 1h 05m 26s
  bonus - 1h 05m 26s

Sat 14 September 2019 - 1h 54m 45s
  bonus - 1h 54m 45s
  • But if we get a report of the first day, the result is empty, which is what got me confused.
watson report --from 2019-09-13 --to 2019-09-13
Fri 13 September 2019 -> Fri 13 September 2019
  • If we extend the timespan to include both days, everything is fine.
$ watson report --from 2019-09-13 --to 2019-09-14
Fri 13 September 2019 -> Sat 14 September 2019
bonus - 3h 00m 12s

I understand the aggregate command was the target of this PR, but it might make sense to make report and aggregate behave the same, even if the issue you reported was only related to the latter. Is there anything I am missing here? 🤔

I've reviewed all changes again, specially the Frames.filter() method, and it looks good to me 👍 👏

CHANGELOG.md Outdated
### Added

- Added support for including partial frames in `Watson.report` and use it in
`cli.aggregate` (#277)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Watson's changelog favors more high level comments, avoiding names of classes/methods, probably targeting Watson's users more than Watson's developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it the next time I get a few minutes for this.

@KennethNielsen
Copy link
Contributor Author

Hi @davidag and thanks for your testing and feedback.

* If we extend the report to include both days, we get two entries as expected: OK, except that the sum is not the same by one second (3h 12s != 3h 11s).

Yes. I noticed the 1s issue. As I recall it originates from the way report uses arrow. Something about when from and to is the same date (which is what aggregate does) then it will ask the arrow date for floor and ceil and ceil is something like ****-**-** 23:59:59 and so it loses a second. It might and probably is fixable, but my feeling is that it would be too much spacial casing for such a small benefit, compared to the really clean implementation there is now.

* But if we get a `report` of the first day, _the result is empty_, which is what got me confused.
watson report --from 2019-09-13 --to 2019-09-13
Fri 13 September 2019 -> Fri 13 September 2019
* If we extend the timespan to include both days, everything is fine.
$ watson report --from 2019-09-13 --to 2019-09-14
Fri 13 September 2019 -> Sat 14 September 2019
bonus - 3h 00m 12s

I understand the aggregate command was the target of this PR, but it might make sense to make report and aggregate behave the same, even if the issue you reported was only related to the latter. Is there anything I am missing here? thinking

No. You are not missing something. I think include_partial_frames should be used in report as well if you think about it as a bugfix, but I was a little wary about suggesting behavioral changes to core functionality like report and I was thinking that I would take discussion in a different PR. If you think it is natural that it should apply to report as well, then I will definitely add it.

I've reviewed all changes again, specially the Frames.filter() method, and it looks good to me +1 clap

Thanks!

@davidag
Copy link
Contributor

davidag commented Sep 16, 2019

Hi @KennethNielsen!

Yes. I noticed the 1s issue. As I recall it originates from the way report uses arrow. Something about when from and to is the same date (which is what aggregate does) then it will ask the arrow date for floor and ceil and ceil is something like ****-**-** 23:59:59 and so it loses a second. It might and probably is fixable, but my feeling is that it would be too much spacial casing for such a small benefit, compared to the really clean implementation there is now.

That seems very reasonable to me. I don't think it's worth the effort and added complexity.

No. You are not missing something. I think include_partial_frames should be used in report as well if you think about it as a bugfix, but I was a little wary about suggesting behavioral changes to core functionality like report and I was thinking that I would take discussion in a different PR. If you think it is natural that it should apply to report as well, then I will definitely add it.

I think it's more coherent if report and aggregate behave the same and return partial intervals when frames partially overlap the requested timespan.

@jmaupetit what do you think? Maybe this could be a complication for people relying on the current report command behavior.

@KennethNielsen
Copy link
Contributor Author

@davidag on the other hand, relying on frames being completely left out is kind of an exotic use case I think

@KennethNielsen KennethNielsen force-pushed the aggregate_events_across_dayboundaries branch from 57b04e2 to 300ebad Compare September 21, 2019 16:17
@KennethNielsen
Copy link
Contributor Author

@davidag actually, the more I thought about it, the more it makes sense to enable this for report per default as well. So I went ahead and made the change and rebased once more (trivial this time) so it is ready for merge.

@KennethNielsen KennethNielsen changed the title Add support for returning frames partially within spand in filter and… Add support for returning frames partially within span in filter and… Sep 21, 2019
@KennethNielsen
Copy link
Contributor Author

Ok. So it appears that this branch now has conflicts again. I will wait to resolve the conflicts until there is a decision on the default behavior mentioned above.

@KennethNielsen
Copy link
Contributor Author

@davidag who would need to approve the default behavior change mentioned?

@davidag
Copy link
Contributor

davidag commented Nov 26, 2019

@KennethNielsen: You might want to rebase first, because there are a couple of conflicts in your PR. After that, @k4nar is the person you're looking for 😉

@KennethNielsen
Copy link
Contributor Author

@davidag thanks. I rebased this already twice (one of them non-trivial) and I don't really think it is necessary to make the decision on the default behavior. Understand that I'm not complaining about the process or anything, I would just like to know that there is a path to having it merge before I put more effort into it.

@k4nar let me know if you have the time to make a decision about the default behavior on what to do with frames that cross intervals in aggregate and then I can do the rebase in no time.

@davidag
Copy link
Contributor

davidag commented Nov 28, 2019

@KennethNielsen: Sure, I understand your concerns. Let's hope @k4nar will come back to this PR and give you some feedback 🤞

@KennethNielsen
Copy link
Contributor Author

@k4nar I resolved the merge conflicts and fixed the tests. I hope you will have the time to look at this at some point.

Copy link
Collaborator

@k4nar k4nar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the code, great job 👍 . However I think this deserves a test or two :) . Maybe someone can help you with that.

@KennethNielsen
Copy link
Contributor Author

@k4nar great. I added a code test but it could definitely use be real tested. I will try and recruit someone from the issue to test it.

@emarthinsen, @brandomr @loonies would any of you 3 mind testing this branch?

k4nar
k4nar previously approved these changes Mar 17, 2020
Copy link
Collaborator

@k4nar k4nar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after re-reading the code I think that the coverage is good enough. People should have time to test this in local before the next release.

@KennethNielsen
Copy link
Contributor Author

Oh you actually meant code coverage. I can see that merging is blocked. Should I do another rebase?

@k4nar
Copy link
Collaborator

k4nar commented Mar 18, 2020

I think it's only blocked because you and I don't have the rights to merge. ping @jmaupetit :)

@jmaupetit
Copy link
Contributor

@KennethNielsen rebase your branch, and I'll merge it! 🙏

@KennethNielsen KennethNielsen force-pushed the aggregate_events_across_dayboundaries branch from a62fade to bad1503 Compare March 18, 2020 13:05
@KennethNielsen
Copy link
Contributor Author

@jmaupetit done

@jmaupetit jmaupetit merged commit 2c7a17e into jazzband:master Mar 20, 2020
@jmaupetit
Copy link
Contributor

Yay 🎉 Thanks again @KennethNielsen! I apologize for my lack of reactivity...

@KennethNielsen
Copy link
Contributor Author

@jmaupetit no problem. Thanks for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants