Skip to content

Commit

Permalink
Metadata Rewrite; DublinCore-support; faster; custom (#1266)
Browse files Browse the repository at this point in the history
uses setattr and setitem.

----
Squashed commit notes:

* Initial implementation (enough to have folks take a look at intentions).

* Add better comments about the new implementation.

* Some tweaks.

* Modified xmlToM21.py and m21ToXml.py to use the new Metadata APIs.  In the process, added the concept of contributors to the new Metadata.

* Save attempted backward compatibility work before restructuring class Hierarchy.

* I think I have all the backward compatibility in place. __setattr__ just raises an exception (so I can see if I need it anymore).  Minimally tested (using musicxml parser and writer, which I put back the way they had been before, so they needed compatibility).

* Backward compatibility tested/fixed by importing using my converter21 Humdrum importer (producing a very rich set of ExtendedMetadata), and then exporting to musicxml, which depends entirely on backward compatible APIs, so it drops a lot of that on the floor, but gets the backward compatible bits right.

* Support abbreviations and uniqueNames. Store contributor values as Contributor. Fix bug where unset md.title et al were returning 'None' instead of None.

* Add valueType member of PROPERTY to support type coercion during setting/adding of metadata. Do a better job with personal keys (don't attempt to treat them as uniqueNames).  Support 'copyright' as 'dcterm:rights'.

* Structural rework: MetadataBase and ExtendedMetadata are gone, only a much enhanced Metadata is left, with both old (backward compatible) and new APIs in place.  Many other renames, too.

* Significant rework, moving things around to make it more readable, making the public APIs more usable, and modifing MusicXML import/export to make use of that usability.

* pylint, flake8, mypy.

* Regenerate corpus metadata cache, since Metadata's internals are completely different now.  Get all the metadata tests passing.

* mypy/lint.

* pylint/mypy/flake8

* Get the existing tests passing.

* Some doctests and some fixes.

* pylint

* Oops, pylinting broke the test results.

* More tests, more fixes.

* More metadata testing.

* More testing. A little fixing.

* Comment out the unused (except during development testing) code in the musicxml converter.

* Catch up all the new code with the whole "import typing as t" thing.

* Fix tests that return None.  Fix "A guide to this new...".

* Fix up the merge.

* Regenerate the metadata cache (again, because there were changes on master).

* Rename the property description stuff, and move the property list to its own file: properties.py.

* pylint, mypy, flake8

* Lots of renames.

* Make getAll(...) return a Tuple, not a List.

* Tuples and Lists are declared a bit differently (yay, mypy).

* Get rid of 'music21' namespace, replace with new 'humdrum' namespace. Increment music21 version, since cached parsed files need to be reparsed to produce metadata with the new namespace.

* pylint complained about lines too long.

* Fix a couple tests I missed.

* Redo the main APIs a bit:
We had get/getAll/set/setAll/add/addAll, and now we have get/getFirst/set/add,
where get always returns a tuple, getFirst is the one that only gets you the first
item, and set/add can take either a single item or a list of items. The "custom"
APIs are now getCustom/getFirstCustom/setCustom/addCustom, similarly.
The parameters to these routines no longer take an optional namespace.  They just
have a key: str parameter.  That key can be 'uniqueName' or 'namespace:name' (and
in the case of the custom routines, it of course can be anything.  New behaviour:
if you pass a non-supported uniqueName or namespace:name to the non-custom APIs,
KeyError will be raised.  This feels much cleaner.

* pylint

* Fix the musicxml import/export tests.

* "Modernize" MEI import's use of metadata APIs.  Other cleanup (mostly comments).

* A little cleanup of unused PropertyDescription fields and associated code, plus some docs improvement.

* If string is not parseable as DateSingle, just return the unconverted str/Text object.

* pylint

* humdrum/spineParser.py now uses new metadata APIs, and imports _all_ the metadata.

* Rewrite __getattr__ to support any uniqueName, workId, or workId abbreviation.
Always return str(first) or None.

* Add __getitem__ a.k.a. dict-key-access for uniqueName and namespace:name.  Also stop returning 'None': strict backward compatibility is not important enough to have a list of attributes that used to do that.

* flake8

* Oops.  Return 'MULTIPLE' from __getattr__ if there is more than one item.

* More cleanup.  Make copyright property work like all the others (returning str). It's not backward compatible anymore, but is consistent.

* Add 'subtitle' ('mei:subtitle') property term.  Many metadata improvements to converters:
Braille writer handles contributors.
MEI parser handles multiple titles, multiple subtitles, and all types of contributor roles.
MusicXML parser/writer now use the new md[name] accessor.
Oh, and Contributor.__str__ has been added, returning Contributor.name (or None).

* Test the new accessors.

* Enhance/fix ABC and Lilypond metadata processing to use the new metadata APIs.

* Remove md.get(), md.getFirst(), and md.getFirstCustom().

* Implement __setitem__ (replaces set()), and __setattr__ (adds property-style setters for all uniqueNames, workIds, and workId abbreviations.
Note that __setattr__ overrides differently than __getattr__:  __getattr__ is the last thing called after all other ways of getting attributes (bare attributes, property getters) have been exhausted. __setattr__ is the only thing called if you implement it, so we have to call super().__setattr__ for attributes we do not handle.  BUT: even if you do that, having __setattr implemented disables all property setters!  So you have to remove those setters, and implement them all in __setattr__.

* RomanText parsing now produces multiple titles etc if seen.  Fixed some comments elsewhere.

* More coverage (and a bugfix or two) in metadata/primitives.py.

* Change md.all() and md.search() and md.contributors to include all the new stuff as well (previously was super-backward-compatible, only returning the list it used to).  all() and search() now also deal with multiple non-contributor metadata items.  Some cleanup and test fixes, too.

* Simplification and cleanup.  Had to regenerate metadata cache. Fixed a bug I introduced in ABC.

* Get past the new pylint restrictions (shakes fist at jacobtylerwalls :-).

* Respond to various review comments:
Remove 'subtitle' for now.
Fixes some docs/doctests (much more to come).
spineParser.py: updateMetadata always succeeds now.
Revert MEI parser improvements (there are still some necessary changes).
Renamed some things for screenreader-friendliness.
Attributes always return strings (and rename *simpleValue* to *stringValue*).
Allow setting fileInfo fields to None without type conversion.
Check attribute names before setting from __init__().

* Metadata attributes return constructed strings instead of 'MULTIPLE'.

* Fix docs/doctests. Use new ValueType instead of t.Any.

* More review feedback incorporated:
lilypond export is back to using md.title et al.
Return tuples, not lists.
Add four new humdrum codes as fully supported, so the humdrum parser can use them:
'humdrum:YOO' == 'originalDocumentOwner'
'humdrum:YOE' == 'originalEditor'
'humdrum:EED' == 'electronicEditor'
'humdrum:ENC' == 'electronicEncoder'

* properties.py: first cut at removals, set oldMusic21WorkId only if necessary. Clean up long lists in doctests/descriptions. Remove inaccurate romanText comment.

* Cleanup testMetadata.py (and add some new tests).  Fix class Text to never put Text in self._data.

* Oops, lost an f''.

* Better indenting in testMetadata.py, replace multiple checks in __setattr__ with one, Metadata._metadata is now Metadata._contents

* Put Metadata.date and Metadata.setWorkId back in place, deprecated.

* Organized public vs private APIs a bit, and cleaned up some of those choices.  More type hinting.

* Add some imprint-related metadata terms (from the humdrum namespace).

* Remove 'performer' from spineParser.py, since I removed it from metadata/properties.py.

* More review feedback incorporated.

* Store the software list in metadata._contents.

* Fix test failures.

* Big name change (in code and docs) from nsKey to namespaceName.

* Added and improved docs/doctests throughout.

* Fix small notes; give credit

* tiny change noted in docs

* Improve search performance (and related work, and a few random fixes).  Details below:
    1. In search() loop over what is there, rather than over the list of things that could be there (that list is much longer now, and will continue to grow).
    2. Do the same optimization in all().
    3. Remove searchAttributes and allUniqueNames, since we no longer use them. listSearchAttributes still works, though.
    4. RichMetadata now has derived implementations of a few routines, since it can no longer count on searchAttributes being longer to get the job done.
    5. Remove FileInfo class and move the three fields (fileFormat, filePath, fileNumber) into properties and _contents.  Namespace is 'm21FileInfo:'. Add int to ValueType and _convertValue for this support.
    6. Skip fileInfo when writing files of any format.
    7. Setting a metadata value to None removes it from _contents.
    8. Stop deleting title from the results of all() when movementName is the same. Do this in MusicXML parsing instead, since it is MusicXML-specific.
    9. Added 'analyst' and 'proofreader', since RomanText wants them.  Add support for these to RomanText.
   10. Remove search of various types of titles from title property.  Add new property bestTitle that does that. Now clients can get the actual title, or get the best title that exists.

* Metadata's _contents are now keyed by uniqueName (or custom name), not namespaceName (or custom name).

* More review responses:
1. Delete some duplicative test output.
2. Use explicit loops to initialize dictionaries in properties.py.
3. Remove getAllNamedValues() and getAllContributorNamedValues(), replacing
   them with new options on all(): skipNonContributors, returnPrimitives, and
   returnSorted.
4. Tweak titleSummary return code in bestTitle.
5. (not from review) Remove dead code left over from a bad merge, in hopes
   it will improve coverage enough.

Co-authored-by: Michael Scott Asato Cuthbert <cuthbert@mit.edu>
  • Loading branch information
gregchapman-dev and mscuthbert authored Aug 5, 2022
1 parent d782d4d commit 4d48584
Show file tree
Hide file tree
Showing 23 changed files with 4,406 additions and 896 deletions.
2 changes: 1 addition & 1 deletion music21/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
Changing this number invalidates old pickles -- do it if the old pickles create a problem.
'''

__version_info__ = (8, 0, 0, 'a3') # can be 4-tuple: (7, 0, 5, 'a2')
__version_info__ = (8, 0, 0, 'a9') # can be 4-tuple: (7, 0, 5, 'a2')

v = '.'.join(str(x) for x in __version_info__[0:3])
if len(__version_info__) > 3 and __version_info__[3]: # type: ignore
Expand Down
20 changes: 11 additions & 9 deletions music21/abcFormat/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,24 +402,26 @@ def abcToStreamScore(abcHandler, inputM21=None):
if isinstance(t, abcFormat.ABCMetadata):
if t.isTitle():
if titleCount == 0: # first
md.title = t.data
# environLocal.printDebug(['got metadata title', md.title])
md.add('title', t.data)
# environLocal.printDebug(['got metadata title', t.data])
titleCount += 1
# all other titles go in alternative field
else:
md.alternativeTitle = t.data
# environLocal.printDebug(['got alternative title', md.alternativeTitle])
md.add('alternativeTitle', t.data)
# environLocal.printDebug(['got alternative title', t.data])
titleCount += 1

elif t.isComposer():
md.composer = t.data
md.add('composer', t.data)
# environLocal.printDebug(['got composer', t.data])

elif t.isOrigin():
md.localeOfComposition = t.data
# environLocal.printDebug(['got local of composition', md.localOfComposition])
md.add('localeOfComposition', t.data)
# environLocal.printDebug(['got locale of composition', t.data])

elif t.isReferenceNumber():
md.number = int(t.data) # convert to int?
# environLocal.printDebug(['got work number', md.number])
md.add('number', int(t.data))
# environLocal.printDebug(['got work number', t.data])

partHandlers = []
tokenCollections = abcHandler.splitByVoice()
Expand Down
2 changes: 1 addition & 1 deletion music21/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<class 'music21.base.Music21Object'>
>>> music21.VERSION_STR
'8.0.0a3'
'8.0.0a9'
Alternatively, after doing a complete import, these classes are available
under the module "base":
Expand Down
3 changes: 2 additions & 1 deletion music21/braille/examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -1176,7 +1176,8 @@ def testVoices(self):

demo = corpus.parse('demos/two-voices')
x = objectToBraille(demo, debug=True)
y = '''Movement Name: two-voices.xml
y = '''Composer: Music21
Movement Name: two-voices.xml
Title: Music21 Fragment
---begin segment---
<music21.braille.segment BrailleSegment>
Expand Down
39 changes: 28 additions & 11 deletions music21/braille/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,24 +472,41 @@ def metadataToString(music21Metadata, returnBrailleUnicode=False):
<class 'music21.metadata.Metadata'>
>>> print(translate.metadataToString(mdObject))
Alternative Title: 3.1
Composer: Claudio Monteverdi
Title: La Giovinetta Pianta
>>> print(translate.metadataToString(mdObject, returnBrailleUnicode=True))
⠠⠁⠇⠞⠑⠗⠝⠁⠞⠊⠧⠑⠀⠠⠞⠊⠞⠇⠑⠒⠀⠼⠉⠲⠁
⠠⠉⠕⠍⠏⠕⠎⠑⠗⠒⠀⠠⠉⠇⠁⠥⠙⠊⠕⠀⠠⠍⠕⠝⠞⠑⠧⠑⠗⠙⠊
⠠⠞⠊⠞⠇⠑⠒⠀⠠⠇⠁⠀⠠⠛⠊⠕⠧⠊⠝⠑⠞⠞⠁⠀⠠⠏⠊⠁⠝⠞⠁
'''
allBrailleLines = []
for key in music21Metadata._workIds:
value = music21Metadata._workIds[key]
if value is not None:
n = ' '.join(re.findall(r'([A-Z]*[a-z]+)', key))
outString = f'{n.title()}: {value}'
if returnBrailleUnicode:
outTemp = []
for word in outString.split():
outTemp.append(wordToBraille(word))
outString = alphabet[' '].join(outTemp)
allBrailleLines.append(outString)
for uniqueName, value in music21Metadata.all(returnPrimitives=True, returnSorted=False):
if value is None:
# we don't put None values in braille output
continue

if uniqueName == 'software':
# we don't put software versions in braille output
continue

namespaceName: t.Optional[str] = music21Metadata.uniqueNameToNamespaceName(uniqueName)
if not namespaceName:
# we don't put custom metadata in braille output
continue

if namespaceName.startswith('m21FileInfo:'):
# we don't put fileInfo in braille output
continue

n = ' '.join(re.findall(r'([A-Z]*[a-z]+)', uniqueName))
outString = f'{n.title()}: {value}'
if returnBrailleUnicode:
outTemp = []
for word in outString.split():
outTemp.append(wordToBraille(word))
outString = alphabet[' '].join(outTemp)
allBrailleLines.append(outString)
return '\n'.join(sorted(allBrailleLines))


Expand Down
2 changes: 1 addition & 1 deletion music21/converter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1732,7 +1732,7 @@ def testConversionMXMetadata(self):
a = parse(testFiles.binchoisMagnificat)
self.assertEqual(a.metadata.composer, 'Gilles Binchois')
# this gets the best title available, even though this is movement title
self.assertEqual(a.metadata.title, 'Excerpt from Magnificat secundi toni')
self.assertEqual(a.metadata.bestTitle, 'Excerpt from Magnificat secundi toni')

def testConversionMXBarlines(self):
from music21 import bar
Expand Down
2 changes: 1 addition & 1 deletion music21/converter/subConverters.py
Original file line number Diff line number Diff line change
Expand Up @@ -818,7 +818,7 @@ def parseFile(self,
filePath: t.Union[pathlib.Path, str],
number: t.Optional[int] = None,
**keywords):
# noinspection SpellCheckingInspection
# noinspection SpellCheckingInspection,PyShadowingNames
'''
Open Noteworthy data (as nwctxt) from a file path.
Expand Down
Binary file modified music21/corpus/_metadataCache/core.p.gz
Binary file not shown.
32 changes: 24 additions & 8 deletions music21/corpus/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,17 +362,33 @@ def listSearchFields():
>>> for field in corpus.manager.listSearchFields():
... field
...
'actNumber'
'alternativeTitle'
'ambitus'
'associatedWork'
'collectionDesignation'
'commission'
'abstract'
'accessRights'
'accompanyingMaterialWriter'
...
'composer'
'copyright'
'composerAlias'
'composerCorporate'
'conceptor'
'conductor'
...
'dateCreated'
'dateFirstPublished'
'dateIssued'
'dateModified'
'dateSubmitted'
'dateValid'
...
'tempoFirst'
'tempos'
'textLanguage'
'textOriginalLanguage'
'timeSignatureFirst'
'timeSignatures'
'title'
...
'''
return tuple(sorted(metadata.RichMetadata.searchAttributes))
return metadata.bundles.MetadataBundle.listSearchFields()

# -----------------------------------------------------------------------------

Expand Down
156 changes: 120 additions & 36 deletions music21/humdrum/spineParser.py
Original file line number Diff line number Diff line change
Expand Up @@ -776,9 +776,8 @@ def parseMetadata(self, s=None):
grToRemove = []

for gr in s[GlobalReference]:
wasParsed = gr.updateMetadata(md)
if wasParsed:
grToRemove.append(gr)
gr.updateMetadata(md)
grToRemove.append(gr)

if grToRemove:
s.remove(grToRemove, recurse=True)
Expand Down Expand Up @@ -2727,6 +2726,116 @@ def __init__(self, codeOrAll='', valueOrNone=None):
if '@' in self.code:
self.code, self.language = self.code.split('@')

humdrumKeyToUniqueName: dict = {
# dict value is music21's unique name or '' (if there is no supported equivalent)
# Authorship information:
'COM': 'composer', # composer's name
'COA': 'attributedComposer', # attributed composer
'COS': 'suspectedComposer', # suspected composer
'COL': 'composerAlias', # composer's abbreviated, alias, or stage name
'COC': 'composerCorporate', # composer's corporate name
'CDT': '', # composer's birth and death dates (**zeit format)
'CBL': '', # composer's birth location
'CDL': '', # composer's death location
'CNT': '', # composer's nationality
'LYR': 'lyricist', # lyricist's name
'LIB': 'librettist', # librettist's name
'LAR': 'arranger', # music arranger's name
'LOR': 'orchestrator', # orchestrator's name
'TXO': 'textOriginalLanguage', # original language of vocal/choral text
'TXL': 'textLanguage', # language of the encoded vocal/choral text
# Recording information (if the Humdrum encodes information pertaining
# to an audio recording)
'TRN': 'translator', # translator of the text
'RTL': '', # album title
'RMM': 'manufacturer', # manufacturer or sponsoring company
'RC#': '', # recording company's catalog number of album
'RRD': 'dateIssued', # release date (**date format)
'RLC': '', # place of recording
'RNP': 'producer', # producer's name
'RDT': '', # date of recording (**date format)
'RT#': '', # track number
# Performance information (if the Humdrum encodes, say, a MIDI performance)
'MGN': '', # ensemble's name
'MPN': '', # performer's name
'MPS': '', # suspected performer
'MRD': '', # date of performance (**date format)
'MLC': '', # place of performance
'MCN': 'conductor', # conductor's name
'MPD': '', # date of first performance (**date format)
'MDT': '', # unknown, but I've seen 'em (another way to say date of performance?)
# Work identification information
'OTL': 'title', # title
'OTP': 'popularTitle', # popular title
'OTA': 'alternativeTitle', # alternative title
'OPR': 'parentTitle', # title of parent work
'OAC': 'actNumber', # act number (e.g. '2' or 'Act 2')
'OSC': 'sceneNumber', # scene number (e.g. '3' or 'Scene 3')
'OMV': 'movementNumber', # movement number (e.g. '4', or 'mov. 4', or...)
'OMD': 'movementName', # movement name
'OPS': 'opusNumber', # opus number (e.g. '23', or 'Opus 23')
'ONM': 'number', # number (e.g. number of song within ABC multi-song file)
'OVM': 'volumeNumber', # volume number (e.g. '6' or 'Vol. 6')
'ODE': 'dedicatedTo', # dedicated to
'OCO': 'commission', # commissioned by
'OCL': 'transcriber', # collected/transcribed by
'ONB': '', # free form note (nota bene) related to title or identity of work
'ODT': 'dateCreated', # date or period of composition (**date or **zeit format)
'OCY': 'countryOfComposition', # country of composition
'OPC': 'localeOfComposition', # city, town, or village of composition
# Group information
'GTL': 'groupTitle', # group title (e.g. 'The Seasons')
'GAW': 'associatedWork', # associated work, such as a play or film
'GCO': 'collectionDesignation', # collection designation (e.g. 'Norton Scores')
# Imprint information
'PUB': '', # publication status 'published'/'unpublished'
'PED': '', # publication editor
'PPR': 'firstPublisher', # first publisher
'PDT': 'dateFirstPublished', # date first published (**date format)
'PTL': 'publicationTitle', # publication (volume) title
'PPP': 'placeFirstPublished', # place first published
'PC#': 'publishersCatalogNumber', # publisher's catalog number (NOT scholarly catalog)
'SCT': 'scholarlyCatalogAbbreviation', # scholarly catalog abbrev/number (e.g. 'BWV 551')
'SCA': 'scholarlyCatalogName', # scholarly catalog (unabbreviated) (e.g. 'Koechel 117')
'SMS': 'manuscriptSourceName', # unpublished manuscript source name
'SML': 'manuscriptLocation', # unpublished manuscript location
'SMA': 'manuscriptAccessAcknowledgement', # acknowledgment of manuscript access
'YEP': 'electronicPublisher', # publisher of electronic edition
'YEC': 'copyright', # date and owner of electronic copyright
'YER': 'electronicReleaseDate', # date electronic edition released
'YEM': '', # copyright message (e.g. 'All rights reserved')
'YEN': '', # country of copyright
'YOR': '', # original document from which encoded document was prepared
'YOO': 'originalDocumentOwner', # original document owner
'YOY': '', # original copyright year
'YOE': 'originalEditor', # original editor
'EED': 'electronicEditor', # electronic editor
'ENC': 'electronicEncoder', # electronic encoder (person)
'END': '', # encoding date
'EMD': '', # electronic document modification description (one per modificiation)
'EEV': '', # electronic edition version
'EFL': '', # file number e.g. '1/4' for one of four
'EST': '', # encoding status (free form, normally eliminated prior to distribution)
'VTS': '', # checksum (excluding the VTS line itself)
# Analytic information
'ACO': '', # collection designation
'AFR': '', # form designation
'AGN': '', # genre designation
'AST': '', # style, period, or type of work designation
'AMD': '', # mode classification e.g. '5; Lydian'
'AMT': '', # metric classification, must be one of eight specific names
'AIN': '', # instrumentation; alphabetically ordered list of *I abbrevs, space-delimited
'ARE': '', # geographical region of origin (list of 'narrowing down' names of regions)
'ARL': '', # geographical location of origin (lat/long)
# Historical and background information
'HAO': '', # aural history (lots of text, stories about the work)
'HTX': '', # freeform translation of vocal text
# Representation information
'RLN': '', # Extended ASCII language code
'RNB': '', # a note about the representation
'RWB': '' # a warning about the representation
}

def updateMetadata(self, md):
'''
update a metadata object according to information in this GlobalReference
Expand All @@ -2735,41 +2844,16 @@ def updateMetadata(self, md):
'''
c = self.code
v = self.value
wasParsed = True

contributorNames = {
'COM': 'composer',
'COA': 'attributed composer',
'COS': 'suspected composer',
'COL': 'composer alias',
'COC': 'corporate composer',
'LYR': 'lyricist',
'LIB': 'librettist',
'LAR': 'arranger',
'LOR': 'orchestrator',
'TRN': 'translator',
'YOO': 'original document owner',
'YOE': 'original editor',
'EED': 'electronic editor',
'ENC': 'electronic encoder'
}

if c in contributorNames:
contrib = metadata.Contributor()
contrib.role = contributorNames[c]
contrib.name = v
md.addContributor(contrib)

elif c.lower() in md.workIdAbbreviationDict:
md.setWorkId(c, v)

elif c == 'YEC': # electronic edition copyright.
md.copyright = metadata.Copyright(v)

uniqueName: str = self.humdrumKeyToUniqueName.get(c, '')
if uniqueName:
md.add(uniqueName, v)
elif c in self.humdrumKeyToUniqueName:
# it's a humdrum key, but unsupported
md.addCustom('humdrum:' + c, v)
else:
wasParsed = False

return wasParsed
# it's a free-form key
md.addCustom(c, v)

def _reprInternal(self):
return f'{self.code} {self.value!r}'
Expand Down
Loading

0 comments on commit 4d48584

Please sign in to comment.