Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for matroska / webm #66

Merged
merged 48 commits into from
Jan 29, 2022
Merged

Conversation

darksv
Copy link
Contributor

@darksv darksv commented Nov 11, 2021

Hi! This PR adds an initial support for matroska and webm containers that is intended to close #63. It is far from complete, but I decided to share what I have so far to get some feedback.

@darksv darksv marked this pull request as draft November 11, 2021 14:55
Copy link
Owner

@pdeljanov pdeljanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick test by enabling mkv in symphonia-play and I can see it correctly exposing the right number of tracks for the file. This is really fantastic start!

In addition to the issues noted in the review, please also make sure to put yourself as the author in the Cargo.toml file for the demuxer, and add yourself to the CONTRIBUTORS file in the project root.

symphonia-format-mkv/src/ebml.rs Outdated Show resolved Hide resolved
symphonia-core/src/meta.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/ebml.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/ebml.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/ebml.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/ebml.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/element_ids.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/lib.rs Outdated Show resolved Hide resolved
@darksv
Copy link
Contributor Author

darksv commented Nov 17, 2021

I tested it using symphonia-play and it seems to work on a few files already... 😄

symphonia-format-mkv/src/codecs.rs Outdated Show resolved Hide resolved
buffer.push_back((track, reader.read_boxed_slice_exact(frame_size)?));
}
}
_ => unreachable!(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch-all arm shouldn't be needed since the match covers all cases. If we keep it, then any additions to the Lacing will fall through to this case and cause a panic in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I missed it when introducing an enum.

symphonia-format-mkv/src/lib.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/lib.rs Outdated Show resolved Hide resolved
@pdeljanov
Copy link
Owner

Hey @darksv,

The progress is looking good! Sadly, I haven't had much luck in getting anything playing on my end, but I think it's close. I've noted in the review comments where it's getting stuck on my end.

There are also some bugs that I need to fix as well. Particularly, FlacDecoder doesn't parse extra data and instead expects some CodecParameters fields to be set. This is something I will look that will hopefully get more files playing.

@darksv
Copy link
Contributor Author

darksv commented Nov 18, 2021

I added vorbis and remaining AAC codecs. Also fixed two off-by-one errors in extract_frames

@pdeljanov
Copy link
Owner

Having very good results with the latest set of changes! Many of the MKVs I have with Vorbis or AAC audio tracks are working pretty flawlessly, even ones with multiple tracks. Lot's of debug logs though 😅 .

I'll push a commit into the main branch that fixes the issue with FLAC tracks in the next day or so.

@pdeljanov
Copy link
Owner

I've fixed the issue with FlacDecoder, however I need some support in MkvReader to come to the optimal solution.

For the optimal solution, the extra data for FLAC should just contain the Stream Info block. However, the codec private data provided by MKV contains many of the FLAC metadata blocks which are useless for the decoder and may bloat the extra data substantially if there's stuff like album art embedded into the file.

Could you make it so that only the Stream Info block is provided in the extra data?

The following code should extract the Stream Info block from the MKV codec private data which you can then add to the CodecParameters.

use symphonia_utils_xiph::flac::metadata::{MetadataBlockHeader, MetadataBlockType};

fn get_stream_info_from_codec_private(codec_private: &[u8]) -> Result<Box<[u8]>> {
    let mut reader = BufReader::new(codec_private);

    let marker = reader.read_quad_bytes()?;

    if marker != *b"fLaC" {
        return unsupported_error("mkv (flac): missing flac stream marker");
    }

    let header = MetadataBlockHeader::read(&mut reader)?;

    loop {
        match header.block_type {
            MetadataBlockType::StreamInfo => {
                break Ok(reader.read_boxed_slice_exact(header.block_len as usize)?);
            }
            _ => reader.ignore_bytes(u64::from(header.block_len))?,
        }
    }
}

@darksv
Copy link
Contributor Author

darksv commented Nov 20, 2021

I've just pushed the commit with this change.

@pdeljanov
Copy link
Owner

I pushed the changes required for FLAC to work. It should be good the next time you rebase.

@pdeljanov
Copy link
Owner

Hey @darksv,

Let me know when you think this may be ready for an initial merge. I've been a bit busy with life and other bug reports lately so I haven't been able to keep up with these changes.

When you think it may be good to go I can do a more thorough review and suggest changes to get it ready for merging.

@darksv
Copy link
Contributor Author

darksv commented Jan 9, 2022

@pdeljanov I think it might be ready for a more detailed review. Right now it still has some rough edges, but it seems to work with the files I tested.

CRC validation is not implemented yet and I need to improve handling of broken files. I believe it could be done later, after merging this PR.

@pdeljanov
Copy link
Owner

pdeljanov commented Jan 9, 2022

Hey @darksv,

First off, thank you for your hard work on this. I'd love to publish this in the upcoming version v0.5.0 release.

Aside from the review, before I can merge this there are a few issues that need to be resolved:

  1. With larger files, it takes a very long time before playback starts. This doesn't occur when I provide the file via. standard input. I'm guessing the reader is scanning all the clusters if the input is seekable? Can we avoid that or make it more optimized?
  2. The code must compile without any warnings. You can use #[allow(dead_code)] on structs with fields you are planning to use later. Prefixing unused variables with _ is also acceptable.
  3. There's a lot of "Ignored Element" warnings (10000+, it exceeded my terminal's scrollback). Particularly CueRelativePosition, CueDuration, TagLanguage, and TagDefault. These should be debug level logs to avoid spamming the end-user's logs. Just in general, warnings should only be used for things that have a good chance of impacting the user.
  4. Not really an issue, but please fix any clippy lints.

While you look into those, I'll start working on a thorough review.

Thanks again!


EDIT: Don't worry about the rustfmt failure. Symphonia's rustfmt.toml uses unstable features so it requires the nightly toolchain. We can format the project right before we merge.

@darksv
Copy link
Contributor Author

darksv commented Jan 9, 2022

I've fixed all the warnings, including clippy lints. Also changed warnings about ignored elements to debug messages.

With larger files, it takes a very long time before playback starts. This doesn't occur when I provide the file via. standard input. I'm guessing the reader is scanning all the clusters if the input is seekable? Can we avoid that or make it more optimized?

Thanks for pointing that out. I will work on this soon.

Copy link
Owner

@pdeljanov pdeljanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably going to have to do a second round, there's a lot of code here! 😅

Just some general comments:

  1. In the current definition of the API, any error returned by FormatReaderis basically fatal to demuxing (aside from ResetRequired). Please make sure that if you return an error that it's not possible to continue by skipping/ignoring the damaged element, cluster, or track. Of course, there is balance to be had here. We shouldn't complicate the code to try and handle completely messed up files, but if it's possible to be more tolerant of damaged files we should try. I made comments in a few locations where I think this could be done.

  2. Since Symphonia is a library, we should not do anything that can panic (e.g., assert, unwrap, etc.). Anything fatal should return an error. I've added comments where I think this may be a problem. Using debug_assert should be fine, but still should be reserved for truly exceptional circumstances.

Thanks!

symphonia-format-mkv/src/segment.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/demuxer.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/demuxer.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/demuxer.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/demuxer.rs Outdated Show resolved Hide resolved
symphonia-format-mkv/src/ebml.rs Show resolved Hide resolved
header
}
};
assert_eq!(header.etype, E::ID, "reading invalid element");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this would never occur in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this function so it doesn't read the EBML header implicitly and improved messages inside the assertions.

// TODO: ignore crc for now
continue;
}
assert_eq!(header.etype, E::ID);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this would never occur in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with just a log::warn

symphonia-format-mkv/src/ebml.rs Show resolved Hide resolved
Some((ty, _)) => {
assert_eq!(header.data_pos, self.reader.pos(), "invalid stream position");
if let (Some(cur), Some(end)) = (self.current, self.end) {
assert!(cur.pos + cur.len <= end, "invalid stream position");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced this assertion with a DecodeError

@pdeljanov pdeljanov added this to the v0.5 milestone Jan 11, 2022
@pdeljanov
Copy link
Owner

Found a couple more issues:

  1. On end-of-stream I get a DecodeError rather than an IoError.

  2. I've run into some MKVs that don't play at all. I get the following error instead:

    ERROR symphonia_play > file not supported. reason? end of stream

    It's kind of weird because the file belongs to a series and atleast some of the other episodes play fine. The file is pretty large so not sure if I can share it with you. Let me know if there's any information I can provide.

@pdeljanov
Copy link
Owner

  1. I've run into some MKVs that don't play at all. I get the following error instead:

    ERROR symphonia_play > file not supported. reason? end of stream

    It's kind of weird because the file belongs to a series and atleast some of the other episodes play fine. The file is pretty large so not sure if I can share it with you. Let me know if there's any information I can provide.

Oh, funny, after your changes, I now get ERROR symphonia_play > file not supported. reason? malformed stream: mkv: too long element ID for this file.

@darksv
Copy link
Contributor Author

darksv commented Jan 17, 2022

Sorry, last week I wasn't able to work on this. I'll try to get back to fixing remaining issues today.

Oh, funny, after your changes, I now get ERROR symphonia_play > file not supported. reason? malformed stream: mkv: too long element ID for this file.

Interesting... I'll look into it. This probably should be fixed by skipping data in the stream until a valid tag is found.

@darksv
Copy link
Contributor Author

darksv commented Jan 17, 2022

@pdeljanov I've pushed some changes with handling of clusters with unknown sizes. Let me know if this fixes your issue.

@pdeljanov
Copy link
Owner

Hi @darksv,

No problem, take your time!

I get a panic instead of an error now. Please see the backtrace below:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `562413392`,
 right: `562426428`: invalid position', symphonia-format-mkv/src/ebml.rs:286:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:107:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:145:5
   4: symphonia_format_mkv::ebml::ElementIterator<R>::read_header_no_consume
             at ./symphonia-format-mkv/src/ebml.rs:286:9
   5: symphonia_format_mkv::ebml::ElementIterator<R>::read_header
             at ./symphonia-format-mkv/src/ebml.rs:249:22
   6: <symphonia_format_mkv::segment::ClusterElement as symphonia_format_mkv::ebml::Element>::read
             at ./symphonia-format-mkv/src/segment.rs:490:34
   7: symphonia_format_mkv::ebml::ElementIterator<R>::read_element_data
             at ./symphonia-format-mkv/src/ebml.rs:310:23
   8: <symphonia_format_mkv::demuxer::MkvReader as symphonia_core::formats::FormatReader>::try_new
             at ./symphonia-format-mkv/src/demuxer.rs:333:35
   9: <symphonia_format_mkv::demuxer::MkvReader as symphonia_core::probe::QueryDescriptor>::query::{{closure}}
             at ./symphonia-core/src/probe.rs:351:65
  10: core::ops::function::FnOnce::call_once
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ops/function.rs:227:5
  11: symphonia_core::probe::Probe::format
             at ./symphonia-core/src/probe.rs:317:34
  12: symphonia_play::main
             at ./symphonia-play/src/main.rs:129:11
  13: core::ops::function::FnOnce::call_once
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@darksv
Copy link
Contributor Author

darksv commented Jan 22, 2022

@pdeljanov Please check whether recent changes fix this.

@pdeljanov
Copy link
Owner

pdeljanov commented Jan 23, 2022

Nice work @darksv, the file plays now!

@pdeljanov
Copy link
Owner

Hi @darksv,

I think for the most part all the comments have been addressed. There are two issues though:

  1. It's very slow to open a file (for me it's taking ~30 seconds for a moderately sized video).
  2. On end-of-stream the demuxer returns a DecodeError rather than an IoError.

Are those going to be easy to fix?

I'd like to publish v0.5 early next week. However, if you think it would take longer than that to fix those issues, then we can target v0.5.1 for MKV support instead. I feel the first issue may be a show-stopper for many users so it probably wouldn't make sense to include it in v0.5 as-is.

@darksv darksv changed the title [WIP] Add initial support for matroska / webm Add initial support for matroska / webm Jan 29, 2022
@darksv darksv marked this pull request as ready for review January 29, 2022 07:31
@pdeljanov pdeljanov merged commit 9780210 into pdeljanov:master Jan 29, 2022
@pdeljanov
Copy link
Owner

Thank you @darksv! I, and I'm sure many other people, appreciate the work you've put into implementing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Matroska / WebM support
2 participants