-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback for item access to album's attributes #2988
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for diving into this! This is a tricky issue, despite being a relatively small diff, and I appreciate your patience while we think through it.
I've left one minor comment inline. I also have a few general discussion points, from minor to major:
- We have an existing mechanism for formatting album-level data for items. This would need to be removed if we move the "merging" behavior lower in the abstraction hierarchy.
- You asked about caching issues. In our current system,
Item
andAlbum
objects are not "live," in that changing them through one reference does not magically change their values in another context. They represent a sort of snapshot of the database, which is why they have aload
method for updating values from the database. In that sense, caching the associated album might be even more reasonable than reloading the values every time. - This kind of change would completely hide item-level data, and eliminate the possibility for items to differ from each other on album-level fields. This is probably OK, but we have to consider the consequences. For example, in beets currently, it's possible to "work around" album-level fields you don't like and make some values different on a per-track basis. For example, if you decide you want
country
to vary across the tracks in a certain album, you can accomplish that if you really want to. With this change, however, those differences would be silently hidden. Again, this is probably OK, but we'd need to be clear about what's changing.
return self[key] | ||
else: | ||
return default | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really just a matter of aesthetics, but one alternative would be to have get
just do the necessary exception handling. That is, __getitem__
would continue to contain all the "real" logic and include the raise
at the end. Then, get
would change to contain:
try:
return self[key]
except KeyError:
return default
That would help keep the docstring for get
sensible, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might have been a bit of premature optimization, but afaik exception handling in Python isn't exactly performant (creation of exception object, traceback, caching of frames etc.) and combining the code paths was pretty trivial. I also liked that I could use this to have a short implementation in Item.get
.
The docstring issue doesn't bother me personally because it properly describes what get
can do. It does have a different function signature than dict.get
, which we need for the with_album
parameter anyway, but it's still compatible. I think I prefer Optionally
over Alternatively
, however.
I would have made those parameters keyword-only, but that is only possible in Python 3.
|
Thanks for catching that! What I'm worried about is not a direct conflict or anything—just that we're implementing the same logic ("fallback" between item and album attributes) twice. If evaluating the expression But as you discovered, there's subtlety about which formatter gets used. Maybe there's an elegant way to provide a merged view without the duplication, but maybe this division of responsibilities is OK.
No,
OK, good point! I had missed that existing values on items take precedence. That means, unless I'm mistaken, that item-level fixed attributes always take precedence—because it's impossible to remove them. Sounds good! |
Yes, I noticed that too, but I believe we still need the formatter for the reasons you mentioned (and also because it performs other tasks such as alias mapping). So, I tried implementing a lazy-loaded and cached |
I see—thanks for giving it a try, and I can see how that would be less than ideal. Anyway, this is shaping up well! It might be a good idea to run a few simple performance tests to make sure we aren't doing something terrible to the time required to run |
Well, not looking so bright. It's a >100% slowdown. This'd need some smart caching, probably. I do wonder why the difference is so high, though. I mean, the ItemFormatter needed to access the item's album before as well. Maybe Also, I should probably add some documentation about this change. Benchmark tool: https://github.com/sharkdp/hyperfine |
Hmm, that is a little worrisome. Let's dig a little deeper and see if we can't mitigate some of the effects. (Thanks for the tip about hyperfine, btw!) |
(I just found out about hyperfine today as I browsed the fd Readme by the same author.) This is probably the point where I would start to look into profiling as I'm still not too familiar with the code base and believe this would provide a good starting point. Have you ever done this in python and have some recommendations for tools or other tips? (I haven't.) |
I think that |
13c868f
to
b6bf829
Compare
Took a look back at this. I used py-spy for some quick effort-less profiling and it was quite obvious that the majority of the time is being spent with database access in I considered the simplest solution forward to be what I suggested earlier:
Let me know what you think. Also, I wasn't sure if I should add a section regarding the API to the changelog. It would mention the fallback of item access on Item and that re-loading is now lazy, although the latter should be transparent. |
Benchmarks: first is with this PR, second is 1.4.7.
|
Utilize album fields for special formatting of doujin releases. Requires a currently unmerged PR to beets. beetbox/beets#2988
Wow! This is extremely cool! Very nice work! You’re right that the “revision” trick is fragile, since it requires us to intervene on all updates of the database and only avoids races because we implement our own internal global lock, but this seems like a great trade-off for the performance win it affords. This seems worth doing independent of the new query behavior you’ve introduced here. As an aside, we’re already doing something similar for memozing %aunique strings, which are otherwise very expensive to recompute: Perhaps we should move this mechanism to reuse the revision mechanism (to detect when it’s time to invalidate the memoization table). I do like the idea of adding a note about the API change “for developers” in the changelog. The new fallback behavior is worth documenting, even if |
I see. Yes, that would probably be useful for the aunique feature and might even warrant a proper implementation (i.e. a "public API"), but I'd rather not do this in here. Regarding the changelog, I can do that. I deliberately made By the way, as you can see from the commit that references this PR, I started using this in production and haven't encountered any issues so far. I probably could have done that before as well, as I wasn't too concerned about the performance, but for this PR it was a must. |
OK, great. Since it’s a sensitive change, it might be wise to put out a call for testing so folks can try it out with funky configurations. I’ll post something to Discourse. |
Before you check: The CI failure is unrelated to the PR and caused by some error with curl when trying to download the Python 3.4 image for flake8 checking. |
(Thanks. I restarted that Travis job and everything’s fine.) |
Any updates on this? Doesn't seem like the discourse thread attracted much attention. |
db9103d
to
29acdd6
Compare
Rebased to fix the merge conflict on the changelog. AppVeyor has some errors in the setup phase with chocolatey. |
Still nobody using this, it seems. 😞 Let me know when you intend to merge this, so I only need to fix the changelog conflict once (or you do it 🤷♂️ ). |
Someone was asking for this a few days ago on IRC, but I missed them and couldn't point towards this PR. Anyway, I've been using this branch for half a year now with exactly 0 issues so far. I don't use the entire feature set of beets, but importing and path styles based on album flexattrs, which is my primary use case, are just fine. I'll try to remember making a new speed comparison since my library grew a bit over time, but I don't expect it to be much different compared to the last time. |
FYI, I ran into a couple of issues with this, mostly relating to types in the fallback, both in path format queries and in |
I wasn't aware of this when I threw together the diff on the discourse thread @kergoth mentioned. I'll just reproduce it here: diff --git a/beets/library.py b/beets/library.py
index 16db1e97..71b6db22 100644
--- a/beets/library.py
+++ b/beets/library.py
@@ -526,7 +526,17 @@ class Item(LibModel):
@classmethod
def _getters(cls):
- getters = plugins.item_field_getters()
+ def atoi(f, ag):
+ def ig(i):
+ a = i.get_album()
+ if a:
+ return ag(a)
+ else:
+ return cls._type(f).null
+ return ig
+ getters = {f: atoi(f, g)
+ for f, g in plugins.album_field_getters().items()}
+ getters.update(plugins.item_field_getters())
getters['singleton'] = lambda i: i.album_id is None
getters['filesize'] = Item.try_filesize # In bytes.
return getters
diff --git a/beets/ui/__init__.py b/beets/ui/__init__.py
index 327db6b0..c3adc72d 100644
--- a/beets/ui/__init__.py
+++ b/beets/ui/__init__.py
@@ -1145,7 +1145,10 @@ def _setup(options, lib=None):
plugins.send("library_opened", lib=lib)
# Add types and queries defined by plugins.
- library.Item._types.update(plugins.types(library.Item))
+ at = plugins.types(library.Album)
+ at.update(library.Item._types)
+ at.update(plugins.types(library.Item))
+ library.Item._types = at
library.Album._types.update(plugins.types(library.Album))
library.Item._queries.update(plugins.named_queries(library.Item))
library.Album._queries.update(plugins.named_queries(library.Album)) This wasn't intended to be a final implementation, but my approach was a little bit different in that I thought the album-item relationship was something beets-specific and therefore should be reflected in I did find that it was necessary to also change Note that we recently picked up a helper for memoisation in another PR: Lines 1037 to 1057 in 909fd1e
|
Thanks for the headsup. I suspect that the problem with ranges is related to me not updating the items' type information, as you did in your diff. I was entirely new to the code base before working on this, so I just never considered that to be relevant. The I'll take a closer look at your getter approach when I find some time to work on this again. (I'd like to mention that I cannot use beets without this feature anymore, so even if there is a huge update going on, I'll continue using my fork until I updated the PR for the changes.) |
Willing to test if still needed, though it seems like there are now merge conflicts.. |
It's primarily the changelog that becomes conflicted every now and then. I recently did a merge of I do believe that the majority of the effort has already been made and what's left is likely comparatively small, but it's still not within my free time budget at the moment. |
This issue (#2797) also bit me while organizing my library. I have the same use case as @radusuciu: wanting to partition my directory structure based on flexible attributes set during import via the My inline and paths config (WIP)album_fields:
topdir: |
def value(f, otherwise):
try: result = f()
except: result = None
return result if result else otherwise
return value(lambda: category, 'Artists')
subdir: |
def value(f, otherwise):
try: result = f()
except: result = None
return result if result else otherwise
topdir = value(lambda: category, 'Artists')
if topdir == 'Soundtracks':
return value(lambda: franchise, '[Unknown]')
return '[Various]' if comp else albumartist
item_fields:
topdir: |
def value(f, otherwise):
try: result = f()
except: result = None
return result if result else otherwise
return value(lambda: category, 'Artists')
subdir: |
def value(f, otherwise):
try: result = f()
except: result = None
return result if result else otherwise
topdir = value(lambda: category, 'Artists')
if topdir == 'Soundtracks':
return value(lambda: franchise, '[Unknown]')
return artist
disc_and_track: |
if disctotal > 9:
return u'%02i-%02i'% (disc, track)
elif disctotal > 1:
return u'%01i-%02i' % (disc, track)
elif tracktotal > 99:
return u'%03i' % (track)
elif tracktotal > 9:
return u'%02i' % (track)
else:
return u'%01i' % (track)
paths:
# My album flexible attributes:
# - avmedia: Comma-separated; e.g. Animation, TV, Video Games, Musicals, Movies
# - nationality: e.g. German, Japanese, Korean
# - franchise: e.g. Final Fantasy
singleton: $topdir/%the{$subdir}/%the{$artist} - $title
comp: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $artist - $title
default: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $title It works well, but there are downsides:
TL;DR: I would love to see this PR make it into 1.5.0! @FichteFoll I'm relatively new to beets, but will try to make time to do some performance profiling in the next few days to see how it affects my own library. |
I rebased this and pushed to ctrueden/beets@0c7c586a. Here's are the benchmark results on my library (48646 items):
I agree that the performance drop is both unexpected and unfortunate. I'll try to dig more soon and report back. |
TL;DR I pushed a fix to my branch: ctrueden/beets@4c5b5084. 🙌 ExplanationHere is where the performance hit is happening:
The for item in lib.items(query):
ui.print_(format(item, fmt)) This invokes the library's formatter function on each item, which constructs a class FormattedItemMapping(dbcore.db.FormattedMapping):
...
def __init__(self, item, for_path=False):
super(FormattedItemMapping, self).__init__(item, for_path)
# We treat album and item keys specially here,
# so exclude transitive album keys from the model's keys.
self.model_keys = item.keys(computed=True, with_album=False)
self.item = item But then the super-constructor does this: def __init__(self, model, for_path=False):
self.for_path = for_path
self.model = model
self.model_keys = model.keys(True) # <== Equivalent to computed=True, with_album=True ! As you can see above, the generic @FichteFoll Is the fix as simple as passing One other questionIn my tests, I noticed that on master, flexible attributes do fall back to album attributes successfully in the format string. Example of flexible attributes falling back to albums
It seems to be only queries that don't fall back:
Is that known/intended behavior? Certainly it's inconsistent. Finishing this PR would address that! Updated benchmarks with bug-fix
|
Wow!! That's truly amazing—excellent work digging into this! 🎉 It would be really, really cool to include this. To answer your second question: yes, that is indeed inconsistent, but it is also the intended behavior. Basically, the backstory is that we knew the templating fallback was useful, and it was straightforward to implement without too much performance impact, so we did it—but the query side of things was harder to do. But here we are?! |
@sampsyo Great! So would you say this latest patch series is ready for merge, then? If so, @FichteFoll, could you please force push the PR accordingly? |
Just to confirm, we want to merge the changes here and your new patch together, right? So maybe the right thing is to open a new PR with all that together? |
@ctrueden thank you a lot for looking into this. I will check out your over the weekend, most likely, and verify it in my environment as well as include your changes. |
@sampsyo wrote:
Right.
@FichteFoll can force-push the branch to update this PR. That would be cleaner IMHO than opening a new PR. 😄 @FichteFoll wrote:
Great! 👍 |
Good work, @ctrueden. This fixes the really bad performance for me as well. 🎉 Glad it was only an oversight on my part and not an inherent flaw in the implementation. I applied your commit pretty much as-is. An alternative way would have been to skip the super contructor from My results with 11171 entries in the database:
PR before the fix
PR with fix
Still slightly slower in the first case, but much more manageable. Note that the comparison of the second case is unfair, since it tests functionality that the current |
Co-Authored-By: Curtis Rueden <ctrueden@wisc.edu>
89721ea
to
2d024d2
Compare
Awesome, thanks @FichteFoll ! 👍 @sampsyo Looks like this PR is good to go! 🏎️ 💨 |
Anything else I can do to help move this forward? |
Just pinging one more time—seems a shame to let this languish now that the performance issue is resolved. I completely understand being too busy, but if there's anything the community can do to help ensure this can merge smoothly, let us know. E.g. if there is a manual testing process to help avoid regressions which others could run through on various platforms, to offload otherwise direct maintainer effort, let's write that up and do it! |
Did a quick merge locally as I continue to use the PR as my main driver. Linting fails because |
@FichteFoll if you merge master in it should be resolved now 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge apologies for the enormous, inexplicable delay on merging this PR. It's amazing work and didn't deserve to languish, but I'm going to hit the green button now!
My only remaining worry about this design is about concurrency: the new caching/revision-number scheme is obviously great, but because it can turn model.load()
into a no-op, it runs the risk of missing updates from other threads and processes that would previously be visible. I can't at the moment think of a case where that kind of update is critical, however (perhaps the web
plugin?), so I think it's time to unleash this change on the world and track down problems as they arise.
Thank you all for the longitudinal team effort. I'm seriously impressed at the clear thinking and solid engineering that went into this change. 👏
Allows queries (especially for pathspecs) based on an album's flexattrs
while operating on items.
Fixes #2797.
I'll leave formulating the changelog up to you.