Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Metadata Rewrite; DublinCore-support; faster; custom (#1266)
uses setattr and setitem. ---- Squashed commit notes: * Initial implementation (enough to have folks take a look at intentions). * Add better comments about the new implementation. * Some tweaks. * Modified xmlToM21.py and m21ToXml.py to use the new Metadata APIs. In the process, added the concept of contributors to the new Metadata. * Save attempted backward compatibility work before restructuring class Hierarchy. * I think I have all the backward compatibility in place. __setattr__ just raises an exception (so I can see if I need it anymore). Minimally tested (using musicxml parser and writer, which I put back the way they had been before, so they needed compatibility). * Backward compatibility tested/fixed by importing using my converter21 Humdrum importer (producing a very rich set of ExtendedMetadata), and then exporting to musicxml, which depends entirely on backward compatible APIs, so it drops a lot of that on the floor, but gets the backward compatible bits right. * Support abbreviations and uniqueNames. Store contributor values as Contributor. Fix bug where unset md.title et al were returning 'None' instead of None. * Add valueType member of PROPERTY to support type coercion during setting/adding of metadata. Do a better job with personal keys (don't attempt to treat them as uniqueNames). Support 'copyright' as 'dcterm:rights'. * Structural rework: MetadataBase and ExtendedMetadata are gone, only a much enhanced Metadata is left, with both old (backward compatible) and new APIs in place. Many other renames, too. * Significant rework, moving things around to make it more readable, making the public APIs more usable, and modifing MusicXML import/export to make use of that usability. * pylint, flake8, mypy. * Regenerate corpus metadata cache, since Metadata's internals are completely different now. Get all the metadata tests passing. * mypy/lint. * pylint/mypy/flake8 * Get the existing tests passing. * Some doctests and some fixes. * pylint * Oops, pylinting broke the test results. * More tests, more fixes. * More metadata testing. * More testing. A little fixing. * Comment out the unused (except during development testing) code in the musicxml converter. * Catch up all the new code with the whole "import typing as t" thing. * Fix tests that return None. Fix "A guide to this new...". * Fix up the merge. * Regenerate the metadata cache (again, because there were changes on master). * Rename the property description stuff, and move the property list to its own file: properties.py. * pylint, mypy, flake8 * Lots of renames. * Make getAll(...) return a Tuple, not a List. * Tuples and Lists are declared a bit differently (yay, mypy). * Get rid of 'music21' namespace, replace with new 'humdrum' namespace. Increment music21 version, since cached parsed files need to be reparsed to produce metadata with the new namespace. * pylint complained about lines too long. * Fix a couple tests I missed. * Redo the main APIs a bit: We had get/getAll/set/setAll/add/addAll, and now we have get/getFirst/set/add, where get always returns a tuple, getFirst is the one that only gets you the first item, and set/add can take either a single item or a list of items. The "custom" APIs are now getCustom/getFirstCustom/setCustom/addCustom, similarly. The parameters to these routines no longer take an optional namespace. They just have a key: str parameter. That key can be 'uniqueName' or 'namespace:name' (and in the case of the custom routines, it of course can be anything. New behaviour: if you pass a non-supported uniqueName or namespace:name to the non-custom APIs, KeyError will be raised. This feels much cleaner. * pylint * Fix the musicxml import/export tests. * "Modernize" MEI import's use of metadata APIs. Other cleanup (mostly comments). * A little cleanup of unused PropertyDescription fields and associated code, plus some docs improvement. * If string is not parseable as DateSingle, just return the unconverted str/Text object. * pylint * humdrum/spineParser.py now uses new metadata APIs, and imports _all_ the metadata. * Rewrite __getattr__ to support any uniqueName, workId, or workId abbreviation. Always return str(first) or None. * Add __getitem__ a.k.a. dict-key-access for uniqueName and namespace:name. Also stop returning 'None': strict backward compatibility is not important enough to have a list of attributes that used to do that. * flake8 * Oops. Return 'MULTIPLE' from __getattr__ if there is more than one item. * More cleanup. Make copyright property work like all the others (returning str). It's not backward compatible anymore, but is consistent. * Add 'subtitle' ('mei:subtitle') property term. Many metadata improvements to converters: Braille writer handles contributors. MEI parser handles multiple titles, multiple subtitles, and all types of contributor roles. MusicXML parser/writer now use the new md[name] accessor. Oh, and Contributor.__str__ has been added, returning Contributor.name (or None). * Test the new accessors. * Enhance/fix ABC and Lilypond metadata processing to use the new metadata APIs. * Remove md.get(), md.getFirst(), and md.getFirstCustom(). * Implement __setitem__ (replaces set()), and __setattr__ (adds property-style setters for all uniqueNames, workIds, and workId abbreviations. Note that __setattr__ overrides differently than __getattr__: __getattr__ is the last thing called after all other ways of getting attributes (bare attributes, property getters) have been exhausted. __setattr__ is the only thing called if you implement it, so we have to call super().__setattr__ for attributes we do not handle. BUT: even if you do that, having __setattr implemented disables all property setters! So you have to remove those setters, and implement them all in __setattr__. * RomanText parsing now produces multiple titles etc if seen. Fixed some comments elsewhere. * More coverage (and a bugfix or two) in metadata/primitives.py. * Change md.all() and md.search() and md.contributors to include all the new stuff as well (previously was super-backward-compatible, only returning the list it used to). all() and search() now also deal with multiple non-contributor metadata items. Some cleanup and test fixes, too. * Simplification and cleanup. Had to regenerate metadata cache. Fixed a bug I introduced in ABC. * Get past the new pylint restrictions (shakes fist at jacobtylerwalls :-). * Respond to various review comments: Remove 'subtitle' for now. Fixes some docs/doctests (much more to come). spineParser.py: updateMetadata always succeeds now. Revert MEI parser improvements (there are still some necessary changes). Renamed some things for screenreader-friendliness. Attributes always return strings (and rename *simpleValue* to *stringValue*). Allow setting fileInfo fields to None without type conversion. Check attribute names before setting from __init__(). * Metadata attributes return constructed strings instead of 'MULTIPLE'. * Fix docs/doctests. Use new ValueType instead of t.Any. * More review feedback incorporated: lilypond export is back to using md.title et al. Return tuples, not lists. Add four new humdrum codes as fully supported, so the humdrum parser can use them: 'humdrum:YOO' == 'originalDocumentOwner' 'humdrum:YOE' == 'originalEditor' 'humdrum:EED' == 'electronicEditor' 'humdrum:ENC' == 'electronicEncoder' * properties.py: first cut at removals, set oldMusic21WorkId only if necessary. Clean up long lists in doctests/descriptions. Remove inaccurate romanText comment. * Cleanup testMetadata.py (and add some new tests). Fix class Text to never put Text in self._data. * Oops, lost an f''. * Better indenting in testMetadata.py, replace multiple checks in __setattr__ with one, Metadata._metadata is now Metadata._contents * Put Metadata.date and Metadata.setWorkId back in place, deprecated. * Organized public vs private APIs a bit, and cleaned up some of those choices. More type hinting. * Add some imprint-related metadata terms (from the humdrum namespace). * Remove 'performer' from spineParser.py, since I removed it from metadata/properties.py. * More review feedback incorporated. * Store the software list in metadata._contents. * Fix test failures. * Big name change (in code and docs) from nsKey to namespaceName. * Added and improved docs/doctests throughout. * Fix small notes; give credit * tiny change noted in docs * Improve search performance (and related work, and a few random fixes). Details below: 1. In search() loop over what is there, rather than over the list of things that could be there (that list is much longer now, and will continue to grow). 2. Do the same optimization in all(). 3. Remove searchAttributes and allUniqueNames, since we no longer use them. listSearchAttributes still works, though. 4. RichMetadata now has derived implementations of a few routines, since it can no longer count on searchAttributes being longer to get the job done. 5. Remove FileInfo class and move the three fields (fileFormat, filePath, fileNumber) into properties and _contents. Namespace is 'm21FileInfo:'. Add int to ValueType and _convertValue for this support. 6. Skip fileInfo when writing files of any format. 7. Setting a metadata value to None removes it from _contents. 8. Stop deleting title from the results of all() when movementName is the same. Do this in MusicXML parsing instead, since it is MusicXML-specific. 9. Added 'analyst' and 'proofreader', since RomanText wants them. Add support for these to RomanText. 10. Remove search of various types of titles from title property. Add new property bestTitle that does that. Now clients can get the actual title, or get the best title that exists. * Metadata's _contents are now keyed by uniqueName (or custom name), not namespaceName (or custom name). * More review responses: 1. Delete some duplicative test output. 2. Use explicit loops to initialize dictionaries in properties.py. 3. Remove getAllNamedValues() and getAllContributorNamedValues(), replacing them with new options on all(): skipNonContributors, returnPrimitives, and returnSorted. 4. Tweak titleSummary return code in bestTitle. 5. (not from review) Remove dead code left over from a bad merge, in hopes it will improve coverage enough. Co-authored-by: Michael Scott Asato Cuthbert <cuthbert@mit.edu>
- Loading branch information