-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement IAMF decoder #3709
base: main
Are you sure you want to change the base?
Implement IAMF decoder #3709
Conversation
Also clean up logs and reformat decoder.
std::optional<uint32_t> mix_presentation_id_; | ||
|
||
bool has_valid_config_ = false; | ||
bool binaural_mix_presentation_id_ = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are "has_valid_config_" and "binaural_mix_presentation_id_" used?
|
||
int bytes_read = 0; | ||
bool error = true; | ||
while (bytes_read < 128 && buf[bytes_read] != '\0') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying all chars together should have better performance. We can find '\0' first and then copy the needed chars.
|
||
uint32_t codec_id = 0; | ||
std::memcpy(&codec_id, &buf[buffer_head_], sizeof(uint32_t)); | ||
// Mp4 is in big-endian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's refine the comment here. Shall we always swap the bytes of codec id? And do we need to swap the bytes of any other field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to Kaido offline, and confirmed that Cobalt only supports little endian. As long as this code won't be upstreamed, it's ok to assume that the platform is little endian. However, please comment and DCHECK() somewhere (say in the ctor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a preprocessor check in the ctor
if (kEnableSurroundAudio) { | ||
SbMediaAudioConfiguration out_config; | ||
SbMediaGetAudioConfiguration(0, &out_config); | ||
int channels = std::max(out_config.number_of_channels, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we cap the channels to 2 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only executes if we want to force surround sound - it's 2 channels by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.
But let's consult with the h5 player team to see their preference.
SB_DLOG(INFO) << "Defaulting to stereo output."; | ||
} | ||
|
||
error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SbMediaGetAudioConfiguration() will return the audio output device capabilities. I don't think we should set the output sound system based on the audio output devices directly. For example, if the input is stereo and we have 5.1 audio output system, we'll set output sound system to 5.1 for stereo input here. Is that expected? And what will happen if we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IAMF decoder will upmix the stereo signal to 5.1, though that may not be desirable. I think we'll need some way for the player to signal to Cobalt if it wants 5.1 or stereo, without changing the audio stream.
For demo purposes I'll need a separate build that forces an SbPlayer to be created with 6 channels, since we can't tell based on the stream if the player expects 6 channels or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b/364414955
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response from player:
In this specific case
Cobalt should make a best-possible educated-guess about the physical speaker layout
And the decoder should be configured according to the output configuration
The number of channels in the stream should be ignored -- a big part of IAMF is being able to up/down mix dynamically as needed
|
||
// Decodes an Leb128 value and stores it in |value|. Returns the number of bytes | ||
// read. Returns -1 on error. | ||
int ReadLeb128Value(const uint8_t* buf, uint32_t* value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have unit tests for this, feel free to implement in a pending PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
case kObuTypeCodecConfig: { | ||
sample_rate_ = 0; | ||
uint32_t codec_config_id; | ||
bytes_read = ReadLeb128Value(&buf[buffer_head_], &codec_config_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to double check if there is logic here to ensure that there are enough bytes for ReadLeb128Value()
to read, instead of reading out of bounds, as we don't pass size remaining to ReadLeb128Value()
.
} | ||
} | ||
|
||
SB_CHECK(completed_parsing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash in production, just want to double check if this is intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed_parsing
should be set to true in ReadOBU()
line 229, after the reader finishes parsing the Descriptor OBUs and begins to read the encoded audio data. Though this variable is no longer needed.
const uint8_t* buf = input_buffer->data(); | ||
SB_DCHECK(buf); | ||
|
||
bool completed_parsing = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be indicating that the OBU is the last one, consider naming it something like "is_last_obu", etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this variable.
return false; | ||
} | ||
|
||
int next_obu_pos = buffer_head_ + obu_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that buf
as a parameter and buffer_head_
as a member variable are used together for the parsing head here. Consider refactoring, either by passing buf
and advancing buf
inside ReadOBU()
and ReadOBUHeader()
, or passing buffer_head
as parameter to these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've passed the buffer position as a parameter.
|
||
bool ResetAndRead(scoped_refptr<InputBuffer> input_buffer); | ||
|
||
void Reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
void Reset(); | ||
|
||
bool is_valid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
uint32_t config_size() { return config_size_; } | ||
// TODO: Allow for selection of multiple mix presentation IDs. Currently, | ||
// only the first mix presentation parsed is selected. | ||
bool has_mix_presentation_id() { return mix_presentation_id_.has_value(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::copy(std::begin(input_buffers), std::end(input_buffers), | ||
std::back_inserter(pending_audio_buffers_)); | ||
consumed_cb_ = consumed_cb; | ||
DecodePendingBuffers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's try to make it more straightforward if that's not necessary for IAMF decoder.
SB_DLOG(INFO) << "Defaulting to stereo output."; | ||
} | ||
|
||
error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.
if (kEnableSurroundAudio) { | ||
SbMediaAudioConfiguration out_config; | ||
SbMediaGetAudioConfiguration(0, &out_config); | ||
int channels = std::max(out_config.number_of_channels, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.
But let's consult with the h5 player team to see their preference.
|
||
SB_DCHECK(samples_decoded <= reader.samples_per_buffer()); | ||
|
||
decoded_audio->ShrinkTo(audio_stream_info_.number_of_channels * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same size as at line 164, so it's redundant.
Does samples_decoded
always equal to reader.samples_per_buffer()
? If so, we can remove the shrinking code and add a SbDCheck() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always, during a playback of https://www.youtube.com/watch?v=wDy_YFUOfl4 the first buffer contains 648 samples while the rest contain 960
SB_DCHECK(is_valid()); | ||
|
||
if (input_buffer->size() == 0) { | ||
SB_LOG(ERROR) << "Empty input buffer written to IamfAudioDecoder"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the function returns false, we should report an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
IamfBufferParser::IamfBufferInfo info; | ||
IamfBufferParser().ParseInputBuffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make ParseInputBuffer() static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the class and made it a standalone function
IamfBufferParser().ParseInputBuffer( | ||
input_buffer, &info, kForceBinauralAudio, | ||
kForce6ChannelAudio | kForce8ChannelAudio); | ||
if (!info.is_valid()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseInputBuffer() returns a boolean to indicate if the input buffer can be parsed properly. We can use that returned value instead of a is_valid() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} else { | ||
IAMF_SoundSystem sound_system = SOUND_SYSTEM_A; | ||
if (kForce6ChannelAudio) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding 6 or 8 channels when using surround sound, can we set the output to match the channel count of user audio outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// Decodes an Leb128 value and stores it in |value|. Returns the number of | ||
// bytes read, capped to sizeof(uint32_t). Returns the number of bytes read, | ||
// or -1 on error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the leb128 values used in iamf header always be 4 bytes long? The parsing algorithm here is different comparing to other leb128 parsing algorithm.
bool ParseInputBuffer(const scoped_refptr<InputBuffer>& input_buffer, | ||
IamfBufferInfo* info, | ||
const bool prefer_binaural_audio, | ||
const bool prefer_surround_audio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can |prefer_binaural_audio| and |prefer_surround_audio| both be true? If not, we should combine them into one boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, binaraural audio expects only 2 channels, so it won't work with surround.
// https://aomediacodec.github.io/iamf/v1.0.0-errata.html#paramdefinition | ||
bool SkipParamDefinition(BufferReader* reader) const; | ||
|
||
std::unordered_set<uint32_t> binaural_audio_element_ids_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can store audio element ids as a local variablesin ParseInputBufferInternal() instead of a instance variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
binaural_audio_element_ids_.insert(audio_element_id); | ||
} else if (loudspeaker_layout > IA_CHANNEL_LAYOUT_STEREO && | ||
loudspeaker_layout < IA_CHANNEL_LAYOUT_COUNT) { | ||
surround_audio_element_ids_.insert(audio_element_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use one set for both binaural and surround audio elements, and store both element id and layout info in the set, like std::unordered_set<std::pari<uint32_t, uint8_t>>.
This PR will be closed, the linux implementation is now here: #4221 |
Implements a libiamf-based audio decoder.
Decoder specific code is guarded behind a GN flag
enable_iamf_decode
, which controls theENABLE_IAMF_DECODE
macro.enable_iamf_decode
is off by default, so the new code doesn't build.The decoder is implemented for Linux and Android TV.
b/341792042