Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Rebuild #9

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

reinecke
Copy link
Member

@reinecke reinecke commented Dec 11, 2023

Overview

EDL can be a tricky format to get right in the details - our original parser worked well in controlled situations, but also had issues like not adopting global_start_time support, not fully baked transition support, and brittleness when sources were in frame rates that didn't match the timeline rate.

This refactor separates the reader code into two separate disciplines:

  1. A parser that yields an edl statement per line with parsed fields but does not resolve timing and represents the facts of exactly what's in the EDL as closely as possible.
  2. The Reader which interprets the statements grouped by event into their resolved timing, converts them to OpenTimelineIO schema objects, and applies them to a timeline

Changes and Enhancements

Adapter Args

The adapter has been updated with new controls on parsing behavior based on our more current knowledge of the format:

  • ignore_timecode_mismatch is still present, but is always hard-wired True and raises DeprecationWarning when used. Will be removed in a future release.
  • (NEW!) ignore_invalid_timecode_errors - When a timecode can't be interpreted due to something like Frame rate mismatch. Timecode '19:13:21:28' has frames beyond 23, the start timecode in the source will be rounded up to zero frames in the next second. This helps address issues that can occur from doing things like using a 30 fps source in a 24 fps timeline without retimes.
  • (NEW!) discard_unsupported_events - enables a mode where events that can't be interpreted (like KEY edits) are skipped and the event is noted in timeline.metadata["cmx_3600"]["unsupported_events"].

Note that setting ignore_invalid_timecode_errors and discard_unsupported_events to False (the default) will preserve the old behavior of erroring when correctness can't be trusted.

Parsing

TODOs

  • Write tests around specific cited issues
  • Update README documentation with new adapter args
  • Flesh out docstrings in code
  • Evaluate whether or not we should cull out parts of code targeted at SMPTE 258M EDL
  • Evaluate if using a duration comparison based heuristic (interpret source timecode at all the known timecode rates until we find a source duration that is very close to the timeline duration) we can reliably infer mismatched source rates where we're currently using the ignore_invalid_timecode_errors behavior

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
…from_file

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
… other than a comment

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
…ead of floats

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
… given before clip parsing. This better flags non-edl files

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
…a transition event

Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
@reinecke reinecke force-pushed the feature/parser-rebuild branch from 1ddd739 to 5e86a53 Compare December 11, 2023 23:27
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Signed-off-by: Eric Reinecke <ereinecke@netflix.com>
Copy link
Contributor

@douglascomet douglascomet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a significant improvement over the previous implementation and probably get merged as-is, especially since you mentioned this is already being used in production at Netflix.

Most of my comments are questions on your approach or highlighting small fixes. However, a docstring pass would be greatly appreciated before being merged.

Admittedly the cmx_3600_reader module is very dense. Like several operations are done in a single line or in a comprehension, which could be difficult to debug.

Also, you mentioned several times about m2 issues. Just to check, are you referring to m2 processors or something else? If it is the processors I'm surprised to here this is an issue to have to work around in the first place.

Comment on lines +193 to +195
elif key == "media_reference":
# These are the comments that will drive the media ref
self.handled[key] = statement.data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why handle this if its identical to the else clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair point. I think the idea is to explicitly call-out how we address media_reference so that in future if more needs to happen it already has a place carved out to live.

def parsed_asc_sat(self, sat_data: str) -> float:
return float(sat_data)

def parsed_asc_sop(self, sop_data: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return type annotation

)
marker.marked_range = otio.opentime.TimeRange(
start_time=marker_time,
duration=opentime.RationalTime(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know we could instantiate a RationalTime without specifying any parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a short-cut for zero time. Arguably this should be changed to:
RationalTime(rate=self.edl_rate) for consistency-sake.

) -> Optional[otio.schema.Marker]:
# An example EDL locator line looks like this:
# * LOC: 01:00:01:14 RED ANIM FIX NEEDED
m = re.match(r"(\d\d:\d\d:\d\d:\d\d)\s+(\w*)(\s+|$)(.*)", statement.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat of a style preference to suggest using a more descriptive variable name than m.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-char variables are bad, this is good feedback. For some reason I have this nasty habit with re Match objects.

elif (
statement_identifier is NoteFormStatement.NoteFormIdentifiers.SPLIT
):
# SPLIT notes appy to the following event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
# SPLIT notes appy to the following event
# SPLIT notes apply to the following event

Comment on lines +25 to +74
class SourceMode(Enum):
"""
Which source type the edit applies to.
"""

VIDEO = "V"
AUDIO = "A"
BOTH = "B"
AUDIO_2 = "A2"
AUDIO_2_VIDEO = "A2/Y"
AUDIO_1_AUDIO_2 = "AA"
AUDIO_1_AUDIO_2_VIDEO = "AA/V"


class EffectType(Enum):
"""
Type of effect.
"""

# Single source effects
CUT = "C"
SYNC_ROLL = "R"

# Multiple source effects
DISSOLVE = "D"
WIPE = "W"
KEY = "K"
KEY_BACKGROUND = "KB"
KEY_REMOVE = "KO"
MATTE = "M"
FOREGROUND_FILLER = "F"
QUAD_SPLIT = "Q"
NONADDITIVE_MIX = "N"
AUDIO_MIX = "X"


class SpecialSource(Enum):
# BLACK/BL and BARS are called out as "Special Source Identifiers" in
# the documents referenced here:
# https://github.com/AcademySoftwareFoundation/OpenTimelineIO#cmx3600-edl
AUX = "AX"
BLACK = "BLACK"
BARS = "BARS"

@classmethod
def _missing_(cls, value):
if value == "BL":
return cls.BLACK
elif value == "SMPTEBars":
return cls.BARS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, these could be benefit from being represented as StrEnums and get the benefits of strings and enums.

To account for older Python versions, this could be implemented as follows:

try:
    # Python Built-Ins
    from enum import StrEnum
except ImportError:
    class StrEnum(str, enum.Enum):
        """
        Mixin of ``str`` and ``Enum`` to make a compatible implementation of `StrEnum` for Py2.

        Sourced from :
            * https://www.cosmicpython.com/blog/2020-10-27-i-hate-enums.html
            * https://stackoverflow.com/questions/75040733/is-there-a-way-to-use-strenum-in-earlier-python-versions
        """
        def __str__(self):
            # type: () -> str
            return str.__str__(self)

This is used for all well-known note form statements.
"""

class NoteFormIdentifiers(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion of using StrEnum for this.

Comment on lines +7 to +8
line_number: Optional[int] = None
event_number: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why are these defined in the class scope and populated in the instance scope?

Copy link
Member Author

@reinecke reinecke Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been writing too much Java lately 😄 I should pull these "declarations"

# snap the metadata value back to what it originally was
# We could consider overriding the value in the M2 speed with what
# the TC says, but it's probably best to stay true to what the EDL
# was signalling, who knows what TC rounding methods are used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
# was signalling, who knows what TC rounding methods are used.
# was signaling, who knows what TC rounding methods are used.

Comment on lines +832 to +834
def __init__(
self, tracks, rate, style, global_start_time=None, reelname_len=8
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a parameter to specify the zfill of the event number? If I wanted to do a round trip from a 32 EDL from AVID to OTIO back to EDL, the event number would be different between the 2 EDLs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this pass my goal was to not break the writer, but I didn't focus on addressing issues with it. It might be a good idea to collect some issues about things like this the writer could do a lot better (making "dialects" more of a first-class citizen in the implementation would fall in this category too).

Comment on lines +943 to +953
"""
Timeline timing:
- for each clip in event, use rec start for timeline placement
- If a clip has a rec end time different from the start, use it

Clip Timing:
- After the timeline has been resolved and all event statements have been
processed, back calculate source duration through any time warps from the
timeline timing
- The source of truth for source timing is always the Timecode values
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I left a stray "note to self" comment in here. This should be turned into # comments so they aren't part of the docstring or removed if this is already captured piecemeal in the code below.

@reinecke
Copy link
Member Author

@douglascomet Thanks for the feedback!

The M2 issues are with M2 motion effects - see page 20 in the cmx spec.

A docstring pass is a good idea - I need to make sure I share some of the ideas I had behind this stuff to help future us.

The code density stuff is good feedback - we can think about driving sections of this toward being more approachable.

I'll try to carve some time for this to try and get it out in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants