Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordcloud and classical music #27

Closed
Merkwurdichliebe opened this issue Nov 8, 2022 · 7 comments
Closed

Wordcloud and classical music #27

Merkwurdichliebe opened this issue Nov 8, 2022 · 7 comments

Comments

@Merkwurdichliebe
Copy link

Merkwurdichliebe commented Nov 8, 2022

There are two issues with wordcloud when listening to classical music (cf. attached capture).

  1. Tags tend to have many short words or abbreviations which are meaningless when taken out of context. For example : "iv" (fourth movement), "ma" (as in "allegro ma non troppo"), "d" (as in "Fugue in d minor"), "op" (as in "Opus 2"), or any digit or number, which are very common (e.g. "Symphony no. 5" — also notice the frequency of "no" in the attached capture).
  2. Some terms use accented characters, mainly from French, which are not rendered properly (e.g. "étude" becomes "tude", "exécution" becomes "excution"). Although I haven't checked, this would also be the case with widely used German titles containing accented characters (e.g. "Verklärte Nacht", "Die Zauberflöte", "Götterdämmerung").

Proposed solutions:

  1. Easy: add an option to ignore numbers and words shorter than a given length. More involved: add an option to ignore italian musical terms (cf. http://www.musictheory.org.uk/res-musical-terms/italian-musical-terms.php).
  2. Add support for accented characters/Unicode.

Wonderful stuff otherwise!

cloud

@felhag
Copy link
Owner

felhag commented Nov 8, 2022

Thanks for the issue, sounds like good improvements. I'm kinda surprised the accented characters aren't working out of the box actually.

Regarding your first point, the easiest and probably the best fix is just exclude everything with 1 or 2 (or maybe even 3?) characters. Was just spending some time on the next release anyways, so i'll try to include these fixes as well.

@Merkwurdichliebe
Copy link
Author

Thanks for the quick reply. The problem with excluding strings of a minimum length — which would already be an improvement of course — is that some strings which are very frequent in classical music and which you'd rather exclude because they have no independent meaning (e.g. "sharp" or "major" as in "Trio in G sharp major") are actually the same length or longer than some names you'd like to keep (e.g. "Bach" or "Satie"). I think the ideal solution would be a custom exclusion list in a text file, but this could become too complicated for the average user and would be prone to parsing errors. Maybe an option could be added to switch between "artists wordcloud" and "tracks wordcloud" which, in conjunction with the minimum length parameter, could make the chart more useful. Just my 2c. Appreciate your support!

@felhag
Copy link
Owner

felhag commented Nov 8, 2022

The chart currently already excludes a bunch of words. I could add "sharp" and "major" to this list too of course but it feels a little bit genre-specific. And adding an option to provide a list of custom exclusions is a little bit over engineered for a single chart I'm afraid.

An option to include/exclude artists/albums/tracks would be a nice feature though. Seems consistent with some other charts too which have a similar toggle.

@Merkwurdichliebe
Copy link
Author

I could add "sharp" and "major" to this list too of course but it feels a little bit genre-specific.

You're probably right. The option to "include/exclude artists/albums/tracks" (maybe together with a simple minimum string length option) seems to be the easiest and most generic solution.

felhag added a commit that referenced this issue Nov 8, 2022
@felhag
Copy link
Owner

felhag commented Nov 8, 2022

Just commited a fix for this :). Have to do some more testing since I changed quite a bit under the hood (unrelated to this change) and but I think I'll create a release soon. Will close this issue when the latest version is deployed.

@Merkwurdichliebe
Copy link
Author

Great. Thanks a lot for your quick response on this!

@felhag
Copy link
Owner

felhag commented Nov 13, 2022

Just released 5.0 including some fixes for the wordcloud. I'll close the issue!

@felhag felhag closed this as completed Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants