Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do I query output latency for echo cancellation #80

Closed
yangwuan55 opened this issue Apr 10, 2018 · 19 comments
Closed

how do I query output latency for echo cancellation #80

yangwuan55 opened this issue Apr 10, 2018 · 19 comments

Comments

@yangwuan55
Copy link

yangwuan55 commented Apr 10, 2018

I want to know how to get the delay for output(write data until it real rendered),or is there any way to get the delay?Can I use the buffersize as the play delay?
I want do echo cancel use the webrtc-apm,but the AudioTrack can't get the right delay(I'm not sure if the delay is too long).
So I want to try oboe.
Thanks for help!!

@niklasdahlheimer
Copy link

niklasdahlheimer commented Apr 14, 2018

Hey,
I will try to give you a few hints (and hope, in case of misleading, Phil or Don will correct me :) ).

  1. you CAN predict the time audio data needs from getting into the stream until oboe passes it to the DSP, lets call it "buffer-latency"
  2. You CAN NOT predict the time from DSP to your speaker. (But their are ways to estimate this. See [5])

About 1)
A good visualisation about the inner workings of a buffer you can find in [1] starting at ~29:00.
You can query the actual buffer size by calling the method getBufferSizeInFrames().
So your buffer-latency is getBufferSizteInFrames()/(framerate), where framerate=samplerate/channelcount.
If your sampleRate is 48kHz, you have 2 AudioChannels (Stereo), your frameRate is 24kHz.
So if your buffer is filled with 5 brusts with 128 frames each (you can query the frames per burst by calling getDefaultFramesPerBurst()) your buffer-latency is

(5*128)/24kHz = 26ms

Also you can query the maximum buffer-latency by looking at the buffer capacity with getBufferCapacityInFrames()

hopefully helpful sources
[1]https://www.youtube.com/watch?v=Pjgfje52Yv0
[2]https://developer.android.com/ndk/guides/audio/audio-latency.html
[3]https://github.com/google/oboe/blob/master/FullGuide.md
[4]https://larsimmisch.github.io/pyalsaaudio/terminology.html
[5]https://source.android.com/devices/audio/latency_measure

Greetings Niklas

@yangwuan55
Copy link
Author

Thank you so much!谢谢你!
I will study these.

@yangwuan55
Copy link
Author

If I want get the right delay for all android devices,I shoud measure the delay for every devices and save to a device_delay_list,is it right?

@dturner
Copy link
Collaborator

dturner commented Apr 16, 2018

Great answers @niklasdahlheimer. Just a couple of things to add:

you CAN predict the time audio data needs from getting into the stream until oboe passes it to the DSP, lets call it "buffer-latency"

This is almost correct. The buffer size determines the latency between your app and the Android audio framework. It (unfortunately) does not account for latency from the Android audio framework to the audio device.

There are, however, a few methods of obtaining the "app to audio device rendering" latency:

From API 19: Use AudioTrack.getTimestamp (in Java only)
From API 24: Use this same method in C++ via JNI using AcquireJavaProxy. Example code
From API 26: You can use OboeStream::getTimestamp. Example code

There are 2 caveats with this approach to calculating latency:

  1. It relies on the audio device reporting accurate timestamps. All Google devices I've tested are accurate, but others may have varying levels of accuracy.
  2. It does not account for any audio device to transducer (headphones, built-in speaker etc) latency. Particularly on the built-in speaker path sometimes latency can be introduced by DSP to improve the acoustic properties of final audio (e.g. bass boost, echo cancellation). This is why you often see "best with headphones" on games/apps which are audio latency sensitive.

If your sampleRate is 48kHz, you have 2 AudioChannels (Stereo), your frameRate is 24kHz.

This is a common source of confusion. "sampleRate" nearly always refers to "sampleRate per channel" so even if you have 10 channels your frameRate will still be 48kHz, with each frame containing 10 samples.

With this in mind if you had a buffer of 5 "bursts" and each burst is 128 frames the latency would be:

(5 * 128) / 48000 = 0.01333s = 13.33ms

To answer @yangwuan55

If I want get the right delay for all android devices,I shoud measure the delay for every devices and save to a device_delay_list,is it right?

I would try the "get timestamp" approach above first. If that doesn't work out then try one or all of the following.

You could measure delay manually on different devices, and that is indeed what some developers do. It definitely gives the most accurate results. Although the process of acquiring and testing Android devices (and testing them on different O/S versions) can be expensive and time consuming, depending on the number of models you test. If you go down this road I can recommend getting a WALT device: https://github.com/google/walt which will allow you to test the output latency in isolation (and is what we use to test new Pixel devices).

Another approach is to include a "loopback" test in your app which runs when the app is first run. The app plays a simple sound (like a sine wave) and measures the time taken for that sound to reach the built-in microphone. This measurement gives you the "round-trip latency" (ouput+input+app). If you just want the output latency then a rough estimate is to subtract about 3ms off that figure as input latency is typically much lower than output latency (source: https://github.com/google/walt/blob/master/docs/AudioLatency.md).

If you're writing an app which requires the use of headphones (meaning the sound played cannot be heard at the built-in microphone), such as a karaoke app, you can implement a "clap track". The app plays a clap sound at regular intervals and the user claps along whilst the app records the clap sounds. Assuming the user is able to clap exactly when they hear the clap the app can measure the difference between the clap played and the clap recorded in order to obtain the round-trip latency figure.

@yangwuan55
Copy link
Author

yangwuan55 commented Apr 17, 2018

How about add a api to the android framework in the futrue?Let the oem offer the delay of the device.
Just a suggestion.

@philburk philburk changed the title About delay how do I query or calculate output latency Apr 27, 2018
@philburk philburk changed the title how do I query or calculate output latency how do I query output latency for echo cancellation Apr 27, 2018
@philburk
Copy link
Collaborator

Please see this related bug:
#69

Note that knowing the reported latency will not be precise enough for echo cancellation. You will still need to run an adaptive filter that sync up with the actual echo.

The timestamps are only accurate enough for AV sync (lip sync).

@philburk
Copy link
Collaborator

philburk commented May 1, 2018

I added calculateLatencyMillis() in this Pull Request.
#84

Can we close this?

@jbloit
Copy link

jbloit commented May 11, 2018

@philburk from which API level is the calculateLatencyMillis() method available? (I'm looking for a solution that can work from level 19)

@philburk
Copy link
Collaborator

It requires AAudio, which was added in API 26, Oreo Cookie.

I am closing this because we added calculateLatencyMillis().
Unfortunately we cannot go back and add functionality to OpenSL ES.

@niklasdahlheimer
Copy link

niklasdahlheimer commented Jun 28, 2018

I do not want to reopen this issue, but maybe @dturner, @philburk or someone else could specifiy the term "audio device", the point where the presentation timestamp is coming from. So is it:

  • HAL (probably not, because its in the native framework)
  • ALSA in the Linux Kernel
  • Audio Codec Hardware
  • AD/DA-Converter

?

I see that the term is coming from other audio-frameworks, but I can not find a precise defintion.
I do not think, that it's somewhere beyond the Audio Codec, so maybe you can share some experience how big the expected latency from there to the transducer could be (If we asume using of headphones, so without the DSP latency).

[1] Android Framework
[2] Graphic from geeknizer

@niklasdahlheimer
Copy link

niklasdahlheimer commented Jun 28, 2018

From @dturner
From API 24: Use this same method in C++ via JNI using AcquireJavaProxy. Example code
From API 26: You can use OboeStream::getTimestamp. Example code

Referring to this issue and the AudioStreamBuilder::isAAudioRecommended() method comment in AudioStreamBuilder.cpp, AAudio will only be supported from API 27 on completely.
So in 95,2% of the cases oboe would choose to use OpenSL ES. I totally get that you can not add functionality to OpenSL ES. But would it be possible to implement the getTimestamp for openSL ES in oboe, using the Android configuration Interface via AquireJavaProxy?
Because oboe encapsulates the OpenSL ES objects, it seems like it's not possible to query the timestamp, from the outside of oboe, so the only option would be to change over to OpenSL ES completely.

@dturner
Copy link
Collaborator

dturner commented Jun 28, 2018

would it be possible to implement the getTimestamp for openSL ES in oboe, using the Android configuration Interface via AquireJavaProxy?

Yes, sure it would. In fact, we have been discussing a way of passing the Java environment to Oboe so that Oboe can query the relevant audio APIs itself (e.g. for getting the native sample rate for built-in audio devices).

Is it easy? No, not really. It's a bit of a pain to pass the JNIEnv to C++ elegantly (should it be cached between calls? should the client have to pass it every time?). I'm planning on hacking on some ideas for it next week. I'll update progress on #116

@stereomatch
Copy link

yangwuan55,

I did some preliminary practical tests for latency using the AAudio echo demo.

What I found was that the latency varied i.e. there was considerable jitter.

You can try to reproduce the experiment. Plug a headset with built-in mic into the earphone jack of your android device. Run the AAudio echo sample app. Place one of the headset's earphones near a second android device's built in microphone. Run an audio recorder app on the second android device.

Now use the second android device to record some environmental sound (stand some feet away from the devices and snap your fingers to generate pulses).

The second android device will hear (and be recording to a file) - your finger snap sound. Meanwhile that same sound will go through the first android device (AAudio echo sample app) and be output on the first android device's earphones.

So the second android device will hear your finger snap over the air, and also hear it from the earphones (coming from the first android device).

After you have snapped your fingers a few times, take the audio recording that was made by the second android device, and view it in Audacity or some audio editor app. You will be able to see the finger snap pulses, and notice that it occurs in pairs. Measure the time-difference between these pulses.

You will notice that sometimes it is as low as 40 milliseconds, but sometimes it can be as large as 100 milliseconds.

This difference is also audible - as you use the AAudio echo sample app, you will be able to hear that when it starts off it is quite low latency, but then later as it runs the latency can increase.

@yangwuan55
Copy link
Author

yangwuan55 commented Jul 13, 2018

@stereomatch Yes it was,I can only do this work between differents devices and write the result to a property file.At the beginning I thought there has some way(invoke some method) to get the latency,anyway I thanks every body who care about this question.

@philburk
Copy link
Collaborator

@niklasdahlheimer The presentation timestamp is supposed to be the time that the audio signal leaves the android device. So it might be when the sound comes out the speaker or is sent to the headphones. Additional latency may be added by external devices such as USB converters or televisions. Also note that approximately 1 msec is added per foot of air. Also note that timestamps vary in accuracy. We are working to improve the latency.

@stereomatch When reporting latency, please include information about the device and Android version. Note that latency can vary from device to device. But once a measurement has stabilized then the latency should not increase. It should stabilize within a second. If the latency continues to increase then that might be a bug, which should be filed separately. This bug is closed.

@stereomatch
Copy link

stereomatch commented Jul 18, 2018

yangwuan55 and philburk:

When reporting latency, please include information about the device and Android version. Note that latency can vary from device to device. But once a measurement has stabilized then the latency should not increase. It should stabilize within a second. If the latency continues to increase then that might be a bug, which should be filed separately. This bug is closed.

Results were for:

  • Nexus 4 running Oreo 8.1 (this is an old device running a custom Oreo 8.1)
  • OnePlus 5T running Oreo 8.1 (new device with official 8.1 update)

The test setup is as described above - one device is running our app which is using AAudio to echo the microphone to the earphones. A headset (with built-in mic and earphones) is plugged in. When this android device hears a click on the headset mic, that emanates 40ms to 100ms later on the headset earphones. Sometimes that is also picked up by the headset mic again - and thus sometimes there will be a series of clicks emanating from the earphones, diminishing as they repeat.

Now we place a 2nd android device that is running an audio recorder app - right next to one of the headset earphones mentioned above. So the 2nd android device is merely recording what a human would hear in the earphones.

So you now have:

  • 1st android device (with headset plugged in - with AAudio app running echoing headset mic to headset earphones).
  • 2nd android device merely recording the output of one of the headset earphones mentioned above (2nd android device is here representing what a human would hear)

Now when you make a click sound, that sound is heard by the 1st device headset microphone and emanates on headset earphones 40ms to 100ms later.

The second android device records the first click directly (over the air). 40ms to 100ms later when that click emanates from 1st device earphones, that is also heard by the second device audio recorder app.

Occasionally the feedback from headset earphone to headset mic will also be significant - and you will have a train of pulses emanating from headset earphones - each pulse diminishing in strength (feedback happening on 1st android headset earphones to headset mic).

This means that the 2nd android device audio recorder app will now hear one strong click (over the air), then 40ms to 100ms later it will hear a click from headset earphones, and sometimes it will hear echoes of that (feedback mentioned above).

For our purposes, we will take the recording from the 2nd android device, and analyze it in Audacity on a computer. We will measure the time interval between the first click (over the air) and the second click (from earphone).

Latency Test Data

Here is an example of the click separation as viewed in Audacity for the OnePlus 5T running Oreo 8.1 - you can observe there is some variation in latency from 0.34ms to 0.42ms and one with 104ms:

NOTE: each horizontal timeline is one experiment i.e. the impact of a single click and what was heard by 2nd audio device audio recorder app. Each image below has multiple rows - each row is a separate click. Thus each row gives an estimate for latency by measuring interval between first two pulses, and as you compare between rows, you can see that the latency seems to be different each time.

Here is the same for the Nexus 4 running Oreo 8.1 - using headset mic and headset - giving about 80ms:

But then there were other examples where this varied - from 75ms to 85ms for the Nexus 4:

And examples where it varied much more - for Nexus 4 device mic, and headset earphones - about 110ms:

The OnePlus 5T also varied a bit - for device mic and headset earphones, it varies from 100ms to 155ms:

I recognize these experiments are not conclusive - but the methodology is correct (please correct me if wrong), since the experiment is measuring the end-result (how click separation is heard by end-user on earphones) - the 2nd device audio recorder is standing in for the human observer.

The only weakness I can identify in the above method is if the 2nd android device audio recorder app itself has jitter or is not recording all audio frames (which can happen with android audio recorder app).

So an improvement would be to use a certified digital audio recorder as the 2nd device (which is known to not lose any audio frames).

However, the results of the above experiments are consistent with user experience - I have heard noticeable variation in the latency (the space between click sounds as heard on earphones will often be very close together initially but then a few seconds later it will be much larger i.e. sound like there is now larger latency). This is something that has been reported to me by users as well. So it is a very noticeable effect.

Thanks for all the feedback.

@stereomatch
Copy link

stereomatch commented Jul 19, 2018

I initially posted a series of incrementally developed explanations for why latency maybe increasing - which is now better explained in the new issue thread below (hopefully it is more understandable):

#165
AAudio full-duplex latency increasing over time reason identified and needs fix to dataCallback() API in AAudio #165

@stereomatch
Copy link

stereomatch commented Jul 23, 2018

@philburk Do you find the above explanation of latency increasing and the experimental data plausible ? If so, is there a way for developers to flush/clear the playback pipeline ?

@stereomatch
Copy link

To reproduce the latency variation, run AAudio echo sample - choose:
Recording device: built-in microphone
Playback device: built-in earphone speaker
Keep volume low enough that you don't have heavy feedback, but if you snap your fingers, you should hear a pulse train - 2 or 3 echoes at least.

Now listen to the earphone speaker with your ear, as you snap your fingers. Each snap will create a pulse train (echoes due to feedback).

Initially the pulse train will be very close together, but then 20 seconds later the pulse train period will noticeably seem to be longer.

However, a bit later the pulse train period will again become smaller.

This suggests the audio latency doesn't just increase monotonically and then become stable. Instead it increases, then is reset to the minimum, then a bit later it is more. This suggests there is resetting going on of the playback pipeline (or something like that) - which is resetting the latency back to low levels. However there is a great deal of variation in it (which means there is probably data loss on playback pipeline as well).

I urge AAudio engineers to do this test on their android devices (Oreo 8.1) to see if they can reproduce this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants