Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream/App freeze in DASH Multi-Period Live-Stream #1698

Closed
1 task done
mikef-dk opened this issue Sep 5, 2024 · 27 comments
Closed
1 task done

Stream/App freeze in DASH Multi-Period Live-Stream #1698

mikef-dk opened this issue Sep 5, 2024 · 27 comments
Assignees
Labels

Comments

@mikef-dk
Copy link

mikef-dk commented Sep 5, 2024

Version

Media3 1.4.1

More version details

Issue reproduced in 1.4.1 and 1.4.0 - 1.3.1 is working fine.

After doing some git bisect we were able to pin down the issue to commit d292f809. If we revert this particular commit on version 1.4.1 we cannot reproduce the issue anymore.

We observe the following Logcat messages when the issue occurs. In this particular case the "freeze" started at around 13:20:47 and application resumed at around 12:20:53.

13:20:45.705 AvrcpMediaPlayerList     D  sendMediaUpdate
13:20:45.705 AvrcpMediaPlayerList     I  sendMediaUpdate: Creating a one item queue for a player with no queue
13:20:45.705 AvrcpMediaPlayerList     D  sendMediaUpdate state=PlaybackState {state=3, position=1725535199867, buffered position=0, speed=1.0, updated=15856860, actions=895, custom actions=[], active item id=-1, error=null}
13:20:45.705 AvrcpTargetService       D  onMediaUpdated: track_changed=false state=true queue=false
13:20:45.705 AvrcpNativeInterface     D  sendMediaUpdate: metadata=false playStatus=true queue=false
13:20:45.705 AvrcpTargetJni           D  sendMediaUpdateNative
13:20:45.705 bt_stack                 I  [INFO:avrcp_service.cc(346)] virtual void bluetooth::avrcp::AvrcpService::SendMediaUpdate(bool, bool, bool) track_changed=0 :  play_state=1 :  queue=0
13:20:45.767 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90773,90773], OUT[90753,90751,90751] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:46.292 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90786,90786], OUT[90766,90764,90764] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:46.613 ThermalService           I  CPU temperatures: [73.3]
13:20:46.811 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90796,90796], OUT[90773,90771,90771] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:47.559 WifiVendorHal            E  getWifiLinkLayerStats_1_3_Internal(l.978) failed {.code = ERROR_UNKNOWN, .description = unknown error}
13:20:47.763 g.xxxxx.previe           I  Background young concurrent copying GC freed 442206(17MB) AllocSpace objects, 6(1128KB) LOS objects, 20% free, 58MB/73MB, paused 191us total 284.960ms
13:20:47.889 Choreographer            I  Skipped 47 frames!  The application may be doing too much work on its main thread.
13:20:47.902 OpenGLRenderer           I  Davey! duration=960ms; Flags=0, IntendedVsync=15858100892199, Vsync=15859040892199, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=15859048906269, AnimationStart=15859048909061, PerformTraversalsStart=15859048910769, DrawStart=15859052521144, SyncQueued=15859054356852, SyncStart=15859054752061, IssueDrawCommandsStart=15859054982144, SwapBuffers=15859060168394, FrameCompleted=15859061705727, DequeueBufferDuration=474208, QueueBufferDuration=522000, GpuCompleted=15798429353532, 
13:20:47.995 AudioFlinger             I  BUFFER TIMEOUT: remove(69) from active list on thread 0xf4669030
13:20:47.998 system_server            I  oneway function results will be dropped but finished with status OK and parcel size 4
13:20:47.999 APM::AudioPolicyEngine   W  getDevicesForStrategy() unknown strategy: -1
13:20:47.999 APM::AudioPolicyEngine   W  getDevicesForStrategy() unknown strategy: -1
13:20:48.076 BasicMediaStatusFactory  D  Supported media session actions: 895
13:20:48.077 WargReceiverApp          E  LogEventSink not set.
13:20:48.085 AvrcpMediaPlayerWrapper  V  onPlaybackStateChanged(): our.package.name : PlaybackState {state=3, position=1725535200735, buffered position=0, speed=1.0, updated=15859240, actions=895, custom actions=[], active item id=-1, error=null}
13:20:48.093 AvrcpMediaPlayerWrapper  V  trySendMediaUpdate(): Metadata has been updated for our.package.name
13:20:48.093 AvrcpMediaPlayerList     D  sendMediaUpdate
13:20:48.093 AvrcpMediaPlayerList     I  sendMediaUpdate: Creating a one item queue for a player with no queue
13:20:48.093 AvrcpMediaPlayerList     D  sendMediaUpdate state=PlaybackState {state=3, position=1725535200746, buffered position=0, speed=1.0, updated=15859251, actions=895, custom actions=[], active item id=-1, error=null}
13:20:48.093 AvrcpTargetService       D  onMediaUpdated: track_changed=false state=true queue=false
13:20:48.093 AvrcpNativeInterface     D  sendMediaUpdate: metadata=false playStatus=true queue=false
13:20:48.093 AvrcpTargetJni           D  sendMediaUpdateNative
13:20:48.093 bt_stack                 I  [INFO:avrcp_service.cc(346)] virtual void bluetooth::avrcp::AvrcpService::SendMediaUpdate(bool, bool, bool) track_changed=0 :  play_state=1 :  queue=0
13:20:48.094 BasicMediaStatusFactory  D  Supported media session actions: 895
13:20:48.288 audio_hw_primary         I  alsa format =0x1 delay frames =1664 total frames=760576000
13:20:48.629 thermal2                 I  Sending notification:  Type: CPU Name: soc_thermal CurrentValue: 75.4 ThrottlingStatus: MODERATE
13:20:48.630 ThermalService           I  CPU temperatures: [75.4]
13:20:48.817 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90798,90798], OUT[90773,90771,90771] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:49.264 g.xxxxx.previe           I  Background concurrent copying GC freed 701039(27MB) AllocSpace objects, 7(284KB) LOS objects, 30% free, 54MB/78MB, paused 190us total 686.217ms
13:20:49.478 e                        D  topActivityPackage: our.package.name, appCollectType: 1 , launcherPackage: our.package.name, video: true
13:20:49.516 e                        D  showNotification: It is not in the launcher interface, so it will not be displayed for the time being!
13:20:49.582 AudioTrack               D  getTimestamp_l(45): device stall time corrected using current time 15860741260770
13:20:49.594 AudioTrack               W  restartIfDisabled(45): releaseBuffer() track 0xac5a0a10 disabled due to previous underrun, restarting
13:20:49.595 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90798,90798], OUT[90776,90772,90772] DEC[0->0] => DEVICE[3+0/3->0+4/14] => CLIENT[10]\n"
13:20:49.595 APM::AudioPolicyEngine   W  getDevicesForStrategy() unknown strategy: -1
13:20:49.595 APM::AudioPolicyEngine   W  getDevicesForStrategy() unknown strategy: -1
13:20:49.596 BufferPoolAccessor2.0    D  bufferpool2 0xaf57d888 : 8(65536 size) total buffers - 1(8192 size) used buffers - 2319/2327 (recycle/alloc) - 23/2320 (fetch/transfer)
13:20:49.602 BufferPoolAccessor2.0    D  bufferpool2 0xf3cc33e8 : 6(24576 size) total buffers - 1(4096 size) used buffers - 2311/2317 (recycle/alloc) - 8/2316 (fetch/transfer)
13:20:49.613 system_server            I  oneway function results will be dropped but finished with status OK and parcel size 4
13:20:50.096 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90819,90819], OUT[90799,90797,90797] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:50.578 WifiVendorHal            E  getWifiLinkLayerStats_1_3_Internal(l.978) failed {.code = ERROR_UNKNOWN, .description = unknown error}
13:20:50.617 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90831,90831], OUT[90812,90810,90810] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:50.645 ThermalService           I  CPU temperatures: [75.200005]
13:20:51.133 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90842,90842], OUT[90825,90823,90823] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:51.356 NotificationUtil         I  (NotificationUtil.java:1)#d  
13:20:51.691 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90856,90856], OUT[90839,90836,90836] DEC[0->0] => DEVICE[3+0/3->0+3/14] => CLIENT[11]\n"
13:20:52.218 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90870,90870], OUT[90852,90849,90849] DEC[0->0] => DEVICE[3+0/3->0+3/14] => CLIENT[11]\n"
13:20:52.310 b1                       I  isNetWorkConnected, net: true
13:20:52.310 MQM                      W  testConnect already connected!
13:20:52.661 ThermalService           I  CPU temperatures: [75.8]
13:20:52.724 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90885,90885], OUT[90865,90861,90861] DEC[0->0] => DEVICE[3+0/3->0+4/14] => CLIENT[10]\n"
13:20:52.936 g.xxxxx.previe           I  Background young concurrent copying GC freed 520245(22MB) AllocSpace objects, 13(2524KB) LOS objects, 30% free, 54MB/78MB, paused 274us total 401.408ms
13:20:53.057 Choreographer            I  Skipped 248 frames!  The application may be doing too much work on its main thread.
13:20:53.076 OpenGLRenderer           I  Davey! duration=4994ms; Flags=0, IntendedVsync=15859240890616, Vsync=15864200890616, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=15864216970855, AnimationStart=15864216973188, PerformTraversalsStart=15864226487355, DrawStart=15864227799980, SyncQueued=15864229443771, SyncStart=15864229947355, IssueDrawCommandsStart=15864230172438, SwapBuffers=15864234054021, FrameCompleted=15864235700021, DequeueBufferDuration=763417, QueueBufferDuration=469792, GpuCompleted=15799171707324, 
13:20:53.087 BasicMediaStatusFactory  D  Supported media session actions: 895
13:20:53.090 WargReceiverApp          E  LogEventSink not set.
13:20:53.228 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90897,90897], OUT[90877,90875,90875] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:53.587 WifiVendorHal            E  getWifiLinkLayerStats_1_3_Internal(l.978) failed {.code = ERROR_UNKNOWN, .description = unknown error}
13:20:53.622 audio_hw_primary         I  alsa format =0x1 delay frames =1664 total frames=760832000
13:20:53.730 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90910,90910], OUT[90890,90887,90887] DEC[0->0] => DEVICE[3+0/3->0+3/14] => CLIENT[11]\n"
13:20:54.247 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90923,90923], OUT[90903,90901,90901] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:54.677 ThermalService           I  CPU temperatures: [75.3]
13:20:54.757 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90936,90936], OUT[90915,90913,90913] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:55.027 BufferPoolAccessor2.0    D  bufferpool2 0xaf57d888 : 8(65536 size) total buffers - 1(8192 size) used buffers - 2586/2594 (recycle/alloc) - 31/2587 (fetch/transfer)
13:20:55.039 BufferPoolAccessor2.0    D  bufferpool2 0xf3cc33e8 : 6(24576 size) total buffers - 1(4096 size) used buffers - 2578/2584 (recycle/alloc) - 9/2583 (fetch/transfer)
13:20:55.295 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90948,90948], OUT[90929,90927,90927] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:55.813 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90962,90962], OUT[90942,90940,90940] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:56.324 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90976,90976], OUT[90954,90952,90952] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:56.594 WifiVendorHal            E  getWifiLinkLayerStats_1_3_Internal(l.978) failed {.code = ERROR_UNKNOWN, .description = unknown error}
13:20:56.693 ThermalService           I  CPU temperatures: [75.4]
13:20:56.844 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[90988,90988], OUT[90967,90965,90965] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"
13:20:57.373 VDA                      I  [0]"ServiceDeviceTask: buffer counts: IN[91001,91001], OUT[90981,90979,90979] DEC[0->0] => DEVICE[3+0/3->0+2/14] => CLIENT[12]\n"

I've sent a video of the issue, the adb bugreport as well as the logs via Mail.

Devices that reproduce the issue

Any device. We tested it specifically on AFTMM Fire TV (Android 7) stick and our own Android TV Stick (Android 11)

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Not tested

Reproduction steps

Play a DASH Multi-Period Live Stream

Expected result

Stream plays smoothly without interruption

Actual result

Application freezes (including Audio & Video) for a brief moment

Media

Not applicable

Bug Report

@tonihei
Copy link
Collaborator

tonihei commented Oct 9, 2024

Thanks for sharing the details of the problem. It's unfortunately not quite clear what could cause this problem or how d292f809 can trigger such a behavior. Do you have an actual stream link you could send to android-media-github@google.com for testing? I tested with other multi-period DASH live streams and none of them show any problem similar to yours, so I suspect the problem may only be triggered by a specific property of the stream itself, or is device-specific (I only tested on Pixel 7). Please report back here once you've sent something, thanks!

@Iragne
Copy link

Iragne commented Oct 17, 2024

Just to let you know i'm facing the same issue with this message
getTimestamp_l(89): device stall time corrected using current time 13776132825692
I'm using the public stream of dashif Amazon multi period live dash
https://d24rwxnt7vw9qb.cloudfront.net/v1/dash/e6d234965645b411ad572802b6c9d5a10799c9c1/All_Reference_Streams/6ba06d17f65b4e1cbd1238eaa05c02c1/index.mpd
I'm currently testing without that commit
and a confirm the issue is not here anymore and the playback is smooth

d292f80

@tonihei

@jeanalexandre-iragne-deltatre

@tonihei happy to see that you react on it. FYI we will remove this from our version. it's clearly visible on our playback and if you check the video dev chat we are not the only one. I understand the inconstancy point but the reality is that now we can't use any official release for any SSAI stream.

https://video-dev.slack.com/archives/CJJ0ELF1U/p1725290886523239

@zw4mhzukc2pbm3h
Copy link

We are also affected by this problem. Cannot update to recent versions of Exo.

@blaschio
Copy link

We are facing the same issue unfortunately.
The streams are encrypted, so I cannot share it.

We would really appreciate if this problem could be resolved!

@Klesz
Copy link

Klesz commented Oct 24, 2024

Same issue here. Please fix it.

@Iragne
Copy link

Iragne commented Oct 24, 2024

Hi ExoPlayer team,

Happy to support in the investigation of this issue but look like it's quite impacting a lot of person.
Any update on your side ? @tonihei and @marcbaechinger

Thanks a lot

@tonihei
Copy link
Collaborator

tonihei commented Oct 25, 2024

I tested with the stream provided in the second comment (https://d24rwxnt7vw9qb.cloudfront.net/v1/dash/e6d234965645b411ad572802b6c9d5a10799c9c1/All_Reference_Streams/6ba06d17f65b4e1cbd1238eaa05c02c1/index.mpd), on release 1.4.1 and can't observe any freezing visually and also logging the update intervals that may be affected by d292f809 do not show any unusual gaps.

  • Is the issue consistently reproducible or does it only happen at some period transitions?
  • The original post mentioned an Amazon Fire TV and a TV stick. Does this reproduce for you on non-TV devices? If not, is there anything special about your playback setup that may be relevant for this issue (e.g. are using tunneling or passthrough audio?)
  • There was a mention of SSAI. Does the problem reproduce together with some ads SDK only or also when playing the same stream stand-alone?

If you can reproduce, it may also be helpful to see the output of an attached EventLogger in the bugreport.

@Iragne
Copy link

Iragne commented Oct 25, 2024

Hi @tonihei

Let me give you more information

  1. the issue happen is some period transition not all of them. if you use the aws one, it's happening generaly after AWS ads (with the schema of the cloud or the robots) back to the content.
  2. Yes i have been able to reproduce on y FairPhone3
  3. The problem occur without any additionnal SDK. just using the release branch or the main. after some minutes it's start to happen at the period transition

I will try to extract event logger too. do you want it on the main ?

@tonihei
Copy link
Collaborator

tonihei commented Oct 25, 2024

Okay, I think I've found something odd, but its root cause is a slightly unfortunate update of the DASH stream.

Whenever this test stream switches from the last ad (the silent AWS elemental splash screen) back to the content, the duration of the DASH period and the actual media don't add up. The ad period has segment with 6 seconds of media, but the content period starts about 4.5 seconds after the start of the ad period already. The DASH spec allows this type of mismatch and expects the player to trim the actual media to the declared duration of 4.5 seconds. What I'm unsure about is whether the type of update is allowed by DASH (an update that shortens a period duration from 6 seconds to 4.5 seconds).

ExoPlayer handles the duration mismatch correctly if it knows about it. You can see a much smoother transition when you seek back in the stream to the same point later. However, when playing close to the live edge, the following happens:

  • ExoPlayer loads the last ad. At this point the new content period doesn't exist yet and ExoPlayer loads all 6 seconds of media into its buffer queue.
  • The manifest update arrives and shortens the period to 4.5 seconds.
  • The main player tries to transition back to the content as soon as it can enforce it (~when the output pipeline read all 6 seconds of media, but slightly before it rendered all of it)
  • The video renderer keeps rendering all the old samples and then jumps back in time to the new position.
  • This behavior is slightly different between 1.3.1 and 1.4.1. In 1.3.1 the time difference between the first frame of the ad and the first frame of the content is exactly 6 seconds, whereas in 1.4.1 it's more around 7.5 seconds.
  • Despite the slightly broken setup, having a delay of 7.5 seconds until the next first frame sounds wrong. I wasn't able to pinpoint the change that cause this, but it doesn't seem to be d292f809. I'll keep looking :)

@Iragne: Is that what you are observing too and does that match the problem you saw with your own streams? (=a mismatch of media segment duration and period duration close to the live edge).
And second question - d292f80 does not change the behavior I described above as far as I can see, so I'm slightly confused why it helped in your case. How certain are you this particular change makes a difference?

@Iragne
Copy link

Iragne commented Oct 25, 2024

Look the same use case. To be honest I don't know why this commit cause the issue but we had no issue reverting it ... strange and agree with you

@tonihei
Copy link
Collaborator

tonihei commented Oct 25, 2024

I found the commit that caused the behavior difference I described above: 38b9a5d

@Iragne: Just to verify we are really talking about the same issue - does rolling back this one change also fix the problem for you?

@Iragne
Copy link

Iragne commented Oct 25, 2024

Let me check on my side too

@mikef-dk
Copy link
Author

Hi @tonihei,

thanks for investigating this issue and sorry for the late reply. To answer your questions:

Is the issue consistently reproducible or does it only happen at some period transitions?

Only at some period transitions. I'd say every 3rd to 4th ad.

The original post mentioned an Amazon Fire TV and a TV stick. Does this reproduce for you on non-TV devices?

In our case it's reproducible on all devices we tested. I've ran the test on my Pixel 7 Pro and it has the same issues as the TV-Sticks.

There was a mention of SSAI. Does the problem reproduce together with some ads SDK only or also when playing the same stream stand-alone?

Only on streams that are using SSAI. However we are not using any ads SDK.

Apart from that: I've tried again after reverting 38b9a5d, unfortunately the issue is still reproducible. Regarding d292f80: We've tried back and forth with and without this commit and only after reverting it the issue disappeared. With that being said: I share the confusion with you why this change should be the reason.

Our Stream is ~20 seconds behind the Live Edge. I'll try to figure out if I can observe a mismatch of media segment duration and period duration in our stream as well.

copybara-service bot pushed a commit that referenced this issue Oct 29, 2024
When transitioning to the next media position parameter checkpoint
we estimate the position because the audio processor chain no longer
provides access to the actual playout duration.

The estimate using the declared speed and the last checkpoint may
have drifted over time, so we currently estimate relative to the
next checkpoint, which is closer and presumably provides a better
estimate. However, this assumes that these checkpoint are perfectly
aligned without any position jumps.

The current approach has two issues:
 - The next checkpoint may include a position jump by design, e.g.
   if it was set for a new item in the playlist and the duration of
   the current item wasn't perfectly accurate.
 - The sudden switch between two estimation methods may cause a jump
   in the output position, which is visible when we add new media
   position checkpoints to the queue, not when we actually reach the
   playback position of the checkpoint.

We can fix both issues by taking a slightly different approach:
 - Continuously monitor the estimate using the current checkpoint. If
   it starts drifting, we can adjust it directly. This way the estimate
   is always aligned with the actual position.
 - The change above means we can safely switch to using the estimate
   based on the previous checkpoint. This way we don't have to make
   assumptions about the next checkpoint and any position jumps will
   only happen when we actually reach this checkpoint (which is more
   what a user expects to see, e.g. at a playlist item transition).

Issue: #1698
PiperOrigin-RevId: 690979859
@tonihei
Copy link
Collaborator

tonihei commented Nov 5, 2024

@mikef-dk: Did you figure out if your stream issues also occur at places if a duration mismatch between the period and the sum of all its segments? If not, would you be able to share a test stream with us that would let us see and reproduce the issue in the demo app?

@Iragne: Did you get around to see if reverting 38b9a5d or including the fix in 7c0cffd helps to alleviate the issue you were seeing?

ivanbuper pushed a commit that referenced this issue Nov 5, 2024
When transitioning to the next media position parameter checkpoint
we estimate the position because the audio processor chain no longer
provides access to the actual playout duration.

The estimate using the declared speed and the last checkpoint may
have drifted over time, so we currently estimate relative to the
next checkpoint, which is closer and presumably provides a better
estimate. However, this assumes that these checkpoint are perfectly
aligned without any position jumps.

The current approach has two issues:
 - The next checkpoint may include a position jump by design, e.g.
   if it was set for a new item in the playlist and the duration of
   the current item wasn't perfectly accurate.
 - The sudden switch between two estimation methods may cause a jump
   in the output position, which is visible when we add new media
   position checkpoints to the queue, not when we actually reach the
   playback position of the checkpoint.

We can fix both issues by taking a slightly different approach:
 - Continuously monitor the estimate using the current checkpoint. If
   it starts drifting, we can adjust it directly. This way the estimate
   is always aligned with the actual position.
 - The change above means we can safely switch to using the estimate
   based on the previous checkpoint. This way we don't have to make
   assumptions about the next checkpoint and any position jumps will
   only happen when we actually reach this checkpoint (which is more
   what a user expects to see, e.g. at a playlist item transition).

Issue: #1698
PiperOrigin-RevId: 690979859
(cherry picked from commit 7c0cffd)
@Iragne
Copy link

Iragne commented Nov 7, 2024

Sorry i was on a lot of other issue. i will test it again tomorrow

@Iragne
Copy link

Iragne commented Nov 8, 2024

@tonihei just to let you know i'm running my tests now. I will let you know shortly

@Iragne
Copy link

Iragne commented Nov 8, 2024

@tonihei So i had been able to extract additional result

  1. Look 38b9a5d this is creating a overlap of the audio and result in buffer and issue
  2. d292f80 is clearly reducing the effect. and it's quite not visible
  3. 7c0cffd look to clearly improve the issue and less visible like d292f80

So for me and my test i like following fix 7c0cffd

@linhai326
Copy link

linhai326 commented Nov 13, 2024

@tonihei
Our stream service is impacted by d292f80 as well:
emailed android-media-github@google.com with our test content.
It works with demo app of 1.3.1 and before;
after 1.4.0, playback was stuck from the very beginning.

Once we reverted the d292f80 in media3 1.4.0, playback starts working.

This commit might not be the root cause, but somehow it might exposed some unknown issue during manifest updating.

========= SOME UPDATE ===========
It seems the test content, somehow triggered an infinite loop of onPlaylistUpdateRequested() in every several ms.

========= SOME MORE UPDATE ===========
Found the problem:
https://github.com/androidx/media/blob/release/libraries/exoplayer_dash/src/main/java/androidx/media3/exoplayer/dash/DashMediaSource.java, line 1125:
Screenshot 2024-11-13 at 12 39 34 PM

we should add the condition that: requiredIntervalUs > 0
otherwise, when requiredIntervalUs is negative, the player will be stuck in an infinite loop, which will block all other threads running.

@linhai326
Copy link

@tonihei
Did you get chance to take a look at this infinite loop issue?

@mikef-dk
Copy link
Author

mikef-dk commented Nov 22, 2024

@tonihei Sorry for the late reply. I've sent a test stream to android-media-github@google.com. With it I can reproduce the issue in the sample app running on version 1.4.1 on my Pixel 7 (same on 1.5.0-rc2). The stream itself has an ad-block roughly every 4 minutes during which the issue can be observed (at least once per break).

#edit
Every time the issue occurs I can see in the EventLogger output that there are dozen if not hundred of timeline updates in a really short time

timeline [eventTime=190.23, mediaPos=85.92, window=0, period=0, periodCount=3, windowCount=1, reason=SOURCE_UPDATE
  period [135.16]                                                                                                 
  period [5.92]                                                                                                   
  period [6.00]                                                                                                   
  window [116.48, seekable=true, dynamic=false]                                                                   
] 

followed by the log output that we were observing before as well:

AudioTrack               W  restartIfDisabled(9909): releaseBuffer() track 0xb400007045ad4a20 disabled due to previous underrun, restarting
EventLogger              E  audioTrackUnderrun [eventTime=192.19, mediaPos=83.04, window=0, period=0, 61568, 320, 4488]

@mikef-dk
Copy link
Author

I missed the updated comment of @linhai326 regarding the requiredIntervalUs > 0. We've tried it with this suggested fix and the issue disappeared for us as well.

@tonihei Does this make any sense to you - especially in conjunction with the commit that we blamed initially? (d292f809)

@zw4mhzukc2pbm3h
Copy link

The proposed code change seems to fix the issue on our side, too. Hoping to have a solution included in an official exoplayer release soon!

@Iragne
Copy link

Iragne commented Dec 11, 2024

Me too

@tonihei
Copy link
Collaborator

tonihei commented Dec 18, 2024

Thanks for all the investigation and sorry for the delay in looking in to this. The first part of the issue was solved by 4b6e886, but the second part needs to be solved by an additional check as proposed in #1947.

The reason this infinite update loop is happening is that the manifest ends at the end of the ad and ExoPlayer tries refreshing the manifest to find the next period. But the next period is only added once the live window catches up with the end of the ad period.

@tonihei
Copy link
Collaborator

tonihei commented Jan 6, 2025

#1947 is merged as well and will be released with Media3 1.6.0.

The only remaining task here is to look into the duration change I explained here: #1698 (comment). We should be able to improve the behavior at the last ad-content transition by actually trimming the samples instead of trying to decode and skip all of them. This part is more of an enhancement though and not a bug fix.

copybara-service bot pushed a commit that referenced this issue Jan 8, 2025
DASH periods can have a duration that is less than the end of the
last chunk in the period. In these cases, the sample data needs to
be clipped to the declared period duration. This already happens
IF the period duration is known at the point where we start loading
the media chunk. However, if the duration becomes known later or is
reduced (e.g. in a live stream), the existing media chunks are not
clipped. This causes unclean transitions across periods where the
player tries to transition to the next period, but renderers struggle
to output all the remaining surplus samples that should have been
clipped.

This can be fixed by asking ChunkSampleStream to discard surplus
samples that were loaded beyond a clipped duration when evaluating
the sample queue between chunk loads.

Issue: #1698
PiperOrigin-RevId: 713288221
@tonihei
Copy link
Collaborator

tonihei commented Jan 8, 2025

The enhancement to handle changing durations is also done now (b321c8d), which results in a much cleaner transition for streams where this happens (like the one shared in #1698 (comment)). I'm closing this issue now as there is no other problem left as far as I'm aware.

@tonihei tonihei closed this as completed Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants