Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrency bug in caching transport #2026

Merged
merged 4 commits into from
Oct 31, 2022
Merged

Conversation

mattjohnsonpint
Copy link
Contributor

@mattjohnsonpint mattjohnsonpint commented Oct 29, 2022

This is basic usage and should always work:

using (SentrySdk.Init(options))
{
    SentrySdk.CaptureEvent(new SentryEvent());

    // possible other application logic, perhaps some that takes a bit of time
}

It does work, if caching is not enabled.

However when caching is enabled, it sometimes sends the envelope, and sometimes exits the program before the envelope can be sent. It is intermittent, depending on how much time (if any) is allowed to pass for the background worker and cache worker to complete. If there's no time at all between the capture and the dispose, then the envelope isn't sent.

This is happening because there's a race condition between the normal background cache worker that is processing items within the caching transport, and the flush that occurs when the transport is disposed.

  • If the flush processes first, it dequeues the envelope, sends it to the inner transport, and exits after the transport completes. The cache worker might start at any time, but it finds nothing in the queue. All is good.
  • If the cache worker processes first, it dequeues the envelope, but then the flush happens before the envelope is sent. The flush finds no items and exits immediately. If the app terminates before the worker can continue, the envelope is not sent. It's left in the processing folder until the next run of the app.

I fixed this by only allowing one thread at a time to be processing items in the cache. If the flush occurs while the cache worker is processing, it will wait asynchronously until the worker completes processing.

Added a test to demonstrate this both with and without caching. Without the fix applied, the test passes only when caching is disabled. With the fix applied, the test passes both with and without caching.

Fixes #2025

@mattjohnsonpint mattjohnsonpint marked this pull request as draft October 29, 2022 02:54
@mattjohnsonpint mattjohnsonpint merged commit 0a13531 into main Oct 31, 2022
@mattjohnsonpint mattjohnsonpint deleted the fix-caching-transport branch October 31, 2022 14:31
@khalilyamoun
Copy link

khalilyamoun commented Nov 3, 2022

@mattjohnsonpint can we please have a sentry-xamarin release also ?
Can you also confirm that your fix will also fixes this exception :

 Sharing violation on path /data/user/0/com.packagename.package/files/.local/share/Sentry/2307E3E05F3949AB506007C7897275B7007CF6F7/1667473559_3144__-1797975990.envelope
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x001b7] in <4545e8ca11c24322a9e3bfbda26c0dea>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.IO.FileOptions options) [0x00000] in <4545e8ca11c24322a9e3bfbda26c0dea>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int,System.IO.FileOptions)
  at System.IO.FileSystem.CopyFile (System.String sourceFullPath, System.String destFullPath, System.Boolean overwrite) [0x00025] in <4545e8ca11c24322a9e3bfbda26c0dea>:0 
  at System.IO.File.Copy (System.String sourceFileName, System.String destFileName, System.Boolean overwrite) [0x00062] in <4545e8ca11c24322a9e3bfbda26c0dea>:0 
  at Sentry.Internal.FileSystem.MoveFile (System.String sourceFileName, System.String destFileName, System.Boolean overwrite) [0x00003] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.Http.CachingTransport.TryPrepareNextCacheFileAsync (System.Threading.CancellationToken cancellationToken) [0x000c3] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.Http.CachingTransport.ProcessCacheAsync (System.Threading.CancellationToken cancellation) [0x0008d] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.BackgroundWorker.DoFlushAsync (System.Threading.CancellationToken cancellationToken) [0x00227] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.BackgroundWorker.FlushAsync (System.TimeSpan timeout) [0x0010b] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.BackgroundWorker.FlushAsync (System.TimeSpan timeout) [0x00256] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.Internal.Hub.Dispose () [0x0003c] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.SentrySdk.UseHub (Sentry.IHub hub) [0x00014] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.SentrySdk.Init (Sentry.SentryOptions options) [0x00006] in <af3c1f33f08f44fb855b493dd31402c2>:0 
  at Sentry.SentryXamarin.Init (Sentry.SentryXamarinOptions options) [0x00000] in <4f27275fafca4429b56c79b5405a8802>:0 
  at Sentry.SentryXamarin.Init (System.Action`1[T] configureOptions) [0x00080] in <4f27275fafca4429b56c79b5405a8802>:0 
  at com.packagename.package.Utils.AndroidSentryHandler.InitSentry () [0x00011] in <95e29835d54c433c9d0fae8ee080053b>:0 
  at com.packagename.package.Utils.AndroidSentryHandler..ctor () [0x00006] in <95e29835d54c433c9d0fae8ee080053b>:0 
  at com.packagename.package.SplashActivity.Initialize () [0x0001a] in <95e29835d54c433c9d0fae8ee080053b>:0 
  at com.packagename.package.SplashActivity.OnCreate (Android.OS.Bundle savedInstanceState) [0x00153] in <95e29835d54c433c9d0fae8ee080053b>:0 
  at System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.<ThrowAsync>b__7_0 (System.Object state) [0x00000] in <4545e8ca11c24322a9e3bfbda26c0dea>:0 
  at Android.App.SyncContext+<>c__DisplayClass2_0.<Post>b__0 () [0x00000] in <6888eae8a8484bc6b6aef4c4ad4d0e25>:0 
  at Java.Lang.Thread+RunnableImplementor.Run () [0x00008] in <6888eae8a8484bc6b6aef4c4ad4d0e25>:0 
  at Java.Lang.IRunnableInvoker.n_Run (System.IntPtr jnienv, System.IntPtr native__this) [0x00008] in <6888eae8a8484bc6b6aef4c4ad4d0e25>:0 
  at (wrapper dynamic-method) Android.Runtime.DynamicMethodNameCounter.48(intptr,intptr)

@mattjohnsonpint
Copy link
Contributor Author

@khalilyamoun - I didn't specifically investigate this from a Xamarin perspective, but from the stack trace you gave I believe it could be a similar issue. We'll push a release of Sentry.Xamarin soon. If you still get that problem after updating, please open an issue on the sentry-xamarin repo. Thanks.

@khalilyamoun
Copy link

Thank you @mattjohnsonpint,
Can we have the updated version of Sentry.Xamarin released before the end of the week so we can release it by the start of the next week ?
The crash is currently impacting hundreds of users ... :(

@mattjohnsonpint
Copy link
Contributor Author

mattjohnsonpint commented Nov 3, 2022

@khalilyamoun - Yes, but you don't have to wait at all. You can simply add Sentry 3.23.1 to your project and rebuild.

The only thing bumping the version of Sentry on the Sentry.Xamarin release will do is ensure that it's current when brought in transitively.

@khalilyamoun
Copy link

khalilyamoun commented Nov 4, 2022

@mattjohnsonpint - sure, but :

  • We use the SDK is 5 projects
  • At scale / larger teams, keeping a clean list/history of dependencies is heavily recommended.

I installed version 3.23.1 just to give it a try - it sadly didn't fix our issue.
I'm going to open an issue on the sentry-xamarin repo.
getsentry/sentry-xamarin#129

@mattjohnsonpint
Copy link
Contributor Author

FYI, Sentry.Xamarin 1.4.4 has been released, which updates its dependency on Sentry 3.23.1 to include this fix.

Sorry to hear that it didn't resolve the other issue you have. We'll continue that discussion on the issue you opened there.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When Caching is enabled, flushing (e.g. on Dispose) sometimes does not flush all events on app crash
3 participants