Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Microsoft.Azure.ServiceBus - System.InvalidOperationException: Can't create session when the connection is closing. #9416

Closed
schwartzma1 opened this issue Jan 9, 2020 · 41 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus

Comments

@schwartzma1
Copy link

Describe the bug
Getting an InvalidOperation Exception in our ExceptionReceivedHandler about being unable to create a session.

Exception or Stack Trace

System.InvalidOperationException: Can't create session when the connection is closing.
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.d__86.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass64_0.<b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.RetryPolicy.d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.Azure.ServiceBus.RetryPolicy.d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.d__64.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.d__62.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.MessageReceivePump.<b__11_0>d.MoveNext()

To Reproduce
Not sure exactly how to reproduce this - it occurs intermittently.

Expected behavior
This exception should perhaps be ignored by the message pump similar to other exceptions which are ignored.

Setup (please complete the following information):

  • OS: [e.g. iOS] Windows
  • IDE : Visual Studio 2019
  • Version of the Library used - Microsoft.Azure.ServiceBus 4.1.0 (going to upgrade to 4.1.1 but I don't believe that will resolve this problem)

Additional context
Wondering if InvalidOperationException should be handled in the same way ObjectDisposed and OperationCanceledException are being handled in PR 8449 #8449 and this commit:
008bb2b#diff-a4508926a30ad8c3ce214614ed2fd446

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [x ] Bug Description Added
  • [x ] Repro Steps Added
  • [x ] Setup information Added
@triage-new-issues triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 9, 2020
@AlexGhiondea AlexGhiondea added Client This issue points to a problem in the data-plane of the library. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus labels Jan 10, 2020
@triage-new-issues triage-new-issues bot removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Jan 10, 2020
@ghost
Copy link

ghost commented Jan 10, 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl

1 similar comment
@ghost
Copy link

ghost commented Jan 10, 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl

@geektania geektania added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Jan 13, 2020
@nemakam
Copy link
Contributor

nemakam commented Jan 13, 2020

@schwartzma1, This seems to be same as #6940.
Could you check again if you are indeed using 4.1.0?

@schwartzma1
Copy link
Author

schwartzma1 commented Jan 14, 2020

@nemakam - Yes so in the release the project that is using this has:
PackageReference Include="Microsoft.Azure.ServiceBus" Version="4.1.0"

I am wondering not whether it should be ServiceBusCommunicationException or InvalidOperationException, but why is this exception raised to the caller all if the connection is closing? Can't it just be ignored on ServiceBus side like ObjectDisposed and OperationCanceledException are handled in this commit in the MessagePump.cs? : 008bb2b#diff-a4508926a30ad8c3ce214614ed2fd446?

@nemakam
Copy link
Contributor

nemakam commented Jan 22, 2020

I agree. This can be ignored if the pump itself is closing. Is that the case? Connection closing and pump closing would be two different things.
Also, are you getting this exception as ServiceBusCommunicationException or InvalidOperationException?

@schwartzma1
Copy link
Author

We are getting InvalidOperationException.

As far as whether the pump is closing vs. the connection closing - I am not sure can you tell from the stack trace? I see further down it has: at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.ServiceBus.MessageReceivePump.<b__11_0>d.MoveNext()

And the exception itself is Can't create session when the connection is closing.

But as to whether that means the pump is closing or the connection is closing I am not sure if I can tell based on that.

@nemakam
Copy link
Contributor

nemakam commented Jan 23, 2020

@schwartzma1 , pump can only be closed by the user, when you do queueClient.Close(). So do you know if you performed a close() at the time of this exception?
If yes, we can try to not show this error to user.
If not, I cannot make this error go away as the user is supposed to be informed of all communication issues that are happening. Only thing I'd change then is to make sure it is thrown as ServiceBusCommunicationException as opposed to InvalidOperation

@nemakam
Copy link
Contributor

nemakam commented Apr 6, 2020

Another occurrence - #11066

@Mortana89
Copy link

Hi @nemakam , to confirm, we are not explicitly closing anything. I can see from the Stacktrace the connection is indeed closing.

@nemakam
Copy link
Contributor

nemakam commented Apr 7, 2020

Okay connections can get closed once in a while in a distributed world. The only thing we could aim at here is to make sure we translate this exception into communication exception.

@nemakam
Copy link
Contributor

nemakam commented Apr 7, 2020

Once we change this into communication exception, the retry policy will kick in and retry if the client is not closed. So it will improve the condition drastically

@Mortana89
Copy link

Once we change this into communication exception, the retry policy will kick in and retry if the client is not closed. So it will improve the condition drastically

Does this mean it won't pop-up anymore in AI or the console output?

@nemakam
Copy link
Contributor

nemakam commented Apr 7, 2020

It will. By default the client will report every communication exception. Having few communication exceptions is expected and can happen and you could ignore it. If there a lot of such exceptions happening frequently then its something we should look at. For this, the client is will always report such exceptions if an operation failed with it.

Only times when exceptions could be swallowed is when the client was explicitly closed and the exceptions are related to that.

@Mortana89
Copy link

Hmm understand. Well, both AI and the console is being spammed a lot with service bus exceptions. Timeouts, exceptions like above, taskcancelled,... It's really annoying as it makes it really difficult to find value in there. Would it be possible for you (or one of your colleagues) and me to have a short call where I can showcase the behavior perhaps?

@Mortana89
Copy link

Small update, we went through AZ (paid) support channels as this issue existed for almost a year already and was consuming not only a lot of AI bandwith, but also taking in the available slots for snapshot debugs.
After a lot of back and forth, we tested one small change; We currently have (or had) a queueclient for sending, and one for receiving, both receiving a connectionstring(so not the servicebusconnection object).
We refactored this to one queueclient and literally all these exceptions have gone away.
So this makes me think there's an issue with having two queueclients, which should still be possible I guess (we have one for sending as this may never be blocked, but receiving we'd like to close upon high pressure, although we don't have that functionality currently).

I noticed the 7.0 supports this, correct? Any ETA for that?

@nemakam
Copy link
Contributor

nemakam commented Apr 20, 2020

@Mortana89 , sorry I think I didnt get few things. What support are you exactly looking at in 7.0?

Also, would you be able to provide numbers?
How many queueClients did you have earlier?
How many sessions do you have on avg at any time?
What are the sessionHandlerOptions that you used to use, and the values that you use now?

@nemakam
Copy link
Contributor

nemakam commented Apr 20, 2020

And you did mention "constant" in your description, but I wanted to confirm that again. Do you see these exceptions constantly at the same rate or do you see a burst of exceptions at sometime, and then remaining times you don't see any?

@Mortana89
Copy link

@Mortana89 , sorry I think I didnt get few things. What support are you exactly looking at in 7.0?

Also, would you be able to provide numbers?
How many queueClients did you have earlier?
How many sessions do you have on avg at any time?
What are the sessionHandlerOptions that you used to use, and the values that you use now?

Hi Nekam,

Constant yes, see following screenshot from AI before applying the change;
Screenshot_20200420-230003

We had two seperate queueclients per queue, one for sending, one for receiving. Some microservices consume multiple queues and thus have more queueclients than others. We have roughly 60 microservice instances that were generating these exceptions.
It's difficult to put a number on the avg amount of sessions, we have peak load at night but this didnt represent in the AI logging. I'd say at any given time no more than 150.

Sessionhandleroptions are the same, nothing changed there. We use peeklock, maxtimeout of 30min, maxwait of 5sec and 100 concurrency max.

@Mortana89
Copy link

Also, the screenshot is with 50% ingestion sampling enabled!

@nemakam
Copy link
Contributor

nemakam commented Apr 20, 2020

Could you expand the first column "overall" and send a screenshot / maybe just copy-paste?

@nemakam
Copy link
Contributor

nemakam commented Apr 20, 2020

Its very surprising that switching from two queueClients to one reduces the exceptions. The pipelines are very independent. Do you think you could provide a sample snippet?
And also, you mentioned queue clients were created using connection string and not the connection object. That would mean even the connections are independent..
And, are you sure the exception you are seeing is from the queueClient that's handling the receives and not the sends?

@Mortana89
Copy link

Hi Nekam,

It looks as if it's for receiving where the exceptions are thrown as they all come from the messagepump;
Microsoft.Azure.ServiceBus.ServiceBusException:
at Microsoft.Azure.ServiceBus.SessionClient+d__37.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.SessionReceivePump+d__20.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

Message:
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:5abed6bf93f841e484cbe107e93c1c40_G2

It's always per two exceptions. One of above also triggers the taskcancelled exceptions:
Microsoft.Azure.ServiceBus.ServiceBusException:
at Microsoft.Azure.ServiceBus.SessionClient+d__37.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.SessionReceivePump+d__20.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
Inner exception System.Threading.Tasks.TaskCanceledException handled at Microsoft.Azure.ServiceBus.SessionClient+d__37.MoveNext:
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.Amqp.TaskHelpers.EndAsyncResult (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.Azure.Amqp.IteratorAsyncResult1.StepCallback (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.Azure.Amqp.AsyncResult.End (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at Microsoft.Azure.Amqp.AmqpCbsLink+<>c__DisplayClass4_0.<SendTokenAsync>b__1 (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Amqp.AmqpLinkCreator+d__11.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+d__101.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.Amqp.FaultTolerantAmqpObject1+<OnCreateAsync>d__6.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.Azure.Amqp.Singleton1+d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.Amqp.Singleton`1+d__13.MoveNext (Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+d__83.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.SessionClient+d__37.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

Message:

A task was canceled. A task was canceled.

Do you have an e-mailadress I can send the previous sourcecode of our queue interop client to?

@mladedav
Copy link
Contributor

This is still happening to us and I am fairly sure we did not call close. Can this be changed from InvalidOperationException to ServiceBusCommunicationException so that the retrying kicks in?

@DorothySun216
Copy link
Contributor

DorothySun216 commented Jun 22, 2020

@mladedav @Mortana89 Can you share with us a snippet of your code when you run into this error and we will see if we can repro it?

We could try to translate this exception into communication exception if we can repro. But you will still get the logs on AI. Would that be okay for you?

@mladedav
Copy link
Contributor

In our case this processes one message and then we get this exception when trying to read the next message. I was told the load should be about 35 messages/hour. It happens for one job consistently in kubernetes.

We are ok with the exceptions being shown, it's just an issue of retrying. Since there is built-in retrying in the SDK, that would seem like the preferred way, but if that wasn't reliable, we would have to implement it ourselves costing us more time and code.

       public async Task<IList<Message>?> ReceiveAsync(CancellationToken cancellationToken)
        {
            if (!_isReaderOpen)
            {
                throw new InvalidOperationException("Reader was not opned but an attempt to read from receiver was made.");
            }
            if (_receiver is null)
            {
                throw new InvalidOperationException("Receiver was not created but receive was called. Reader seems to have been opened though.");
            }

            var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);

            using (cancellationToken.Register(() => tcs.SetResult(true)))
            {
                var cancelTask = tcs.Task;

                while (true)
                {
                    var receiveTask = _receiver.ReceiveAsync(_maxBatchSize, TimeSpan.FromSeconds(10));

                    var completedTask = await Task.WhenAny(cancelTask, receiveTask);
                    if (completedTask == receiveTask)
                    {
                        var messages = await receiveTask;
                        if (messages is null)
                        {
                            continue;
                        }

                        _logger.LogTrace("Received batch of size {count}/{max}.", messages.Count, _maxBatchSize);
                        return messages;
                    }
                    else
                    {
                        break;
                    }
                }
            }

            cancellationToken.ThrowIfCancellationRequested();
            throw new InvalidOperationException("Control is in unexpected parts of code. ServiceBus receiver did not finish reading any messages.");
        }

with the receiver built like

            var builder = new ServiceBusConnectionStringBuilder(connectionString);
            builder.EntityPath = EntityNameHelper.FormatSubscriptionPath(builder.EntityPath, subscription);
            _receiver = new MessageReceiver(builder, ReceiveMode.PeekLock, Microsoft.Azure.ServiceBus.RetryPolicy.Default);
        }

@DorothySun216
Copy link
Contributor

Thanks for providing us with the source code. I will look into this and see if I can repro the error. Will update as soon as I can.

@mladedav
Copy link
Contributor

Hi, were you able to reproduce? Would it be possible to change the type?

@DorothySun216
Copy link
Contributor

@mladedav Sorry about the delay in response. I have checked our code in the _receiver.ReceiveAsync path and confirmed that you should not receive InvalidOperationException if the connection is closing. I cannot repro what you are seeing with our SDK.

I am also a bit confused since you have shared 2 stack trace, one is the first one:
'''
System.InvalidOperationException: Can't create session when the connection is closing.
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.d__86.MoveNext()
'''
For this path, you should not see this error is you are using "Microsoft.Azure.ServiceBus" Version="4.1.0". This stack trace corresponds with the code snippet you shared with me:
var receiveTask = _receiver.ReceiveAsync(_maxBatchSize, TimeSpan.FromSeconds(10));
Can you please just double check again if the machine that throws this exception is INDEED using Version="4.1.0" or greater?

Another stack trace you have shared:
'''
It looks as if it's for receiving where the exceptions are thrown as they all come from the messagepump;
Microsoft.Azure.ServiceBus.ServiceBusException:
at Microsoft.Azure.ServiceBus.SessionClient+d__37.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.SessionReceivePump+d__20.MoveNext (Microsoft.Azure.ServiceBus, Version=4.1.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)

Message:
The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:5abed6bf93f841e484cbe107e93c1c40_G2
'''

From this stack trace you seem to use SessionClient for receive, can you share a piece of code snippet that causes this stack trace? Which path is InvalidOperationException that you want to change the exception type?

@DorothySun216
Copy link
Contributor

@mladedav Also just a follow up to the first stack trace:
'''
System.InvalidOperationException: Can't create session when the connection is closing.
at Microsoft.Azure.ServiceBus.Core.MessageReceiver.d__86.MoveNext()
...
at Microsoft.Azure.ServiceBus.MessageReceivePump.<b__11_0>d.MoveNext()
'''

I saw it is throwing from MessageReceivePump, are you by any chance using OnMessageHandler in your code?

@mladedav
Copy link
Contributor

I haven't actually sent any stack traces, that must have been someone else. I must admit I cannot seem to find them anywhere and I can't reproduce the error with our older code. When I get home I will try to hack around a bit if I can break it again.

At this time I only remember that this was the exception with this message (by which I found this thread). I thought since there are already two I wouldn't clutter this place more with stack traces but lesson learned for next time

@DorothySun216
Copy link
Contributor

@mladedav Thanks for your reply. Are you by any chance using OnMessageHandler in your code? If so, can you share with me that piece of code, especially how you handle exception in the OnMessageHandler.

@mladedav
Copy link
Contributor

No, we're using only the MessageReceiver directly to receive directly. Only other service-bus relevant information I can think of is we were failing to renew messages before locks expired when these issues happened but we fixed that at the same time we added some defensive retrying so I can't say whether it's relevant.

@DorothySun216
Copy link
Contributor

Got it, then the first log trace shows 'MessageReceivePump' in the trace which is only coming from receiving using a OnMessageHandler. So I am not sure how that error trace is thrown.

We will need a reliable error trace to continue investigating since we cannot repro. Can you repro it and send us the log and timestamp? thanks!

@mr-davidc
Copy link

I too have been receiving this error intermittently for quite some time from our Azure function instances running in Kubernetes.

We have a functions pod running .NET Core 2.2.8 with functions v2 in our Production cluster and a separate functions pod running .NET Core 3.1.5 with functions v3 in our Sandbox cluster after recently upgrading and the exceptions are being received from both pods still.

The production pod references Microsoft.Azure.ServiceBus v3.4.0 and the sandbox pod references Microsoft.Azure.ServiceBus v4.1.3.

Exception message:
Message processing error (Action=Receive, ClientId=MessageReceiver12account-events/Subscriptions/new-account-setup, EntityPath=account-events/Subscriptions/new-account-setup, Endpoint=sndbx-sb-project-au.servicebus.windows.net)

Stack Trace:
System.InvalidOperationException: Can't create session when the connection is closing. Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in OnReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver+<>c__DisplayClass64_0+<<ReceiveAsync>b__0>d", in MoveNext Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "Microsoft.Azure.ServiceBus.RetryPolicy", in RunOperation Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "Microsoft.Azure.ServiceBus.RetryPolicy", in RunOperation Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in ReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in ReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.MessageReceivePump+<<MessagePumpTaskAsync>b__11_0>d", in MoveNext

Another interesting piece of info, is that I am also receiving this exception as well at essentially the same time:
System.ObjectDisposedException: Cannot access a disposed object. Object name: '$cbs'. Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in OnReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver+<>c__DisplayClass64_0+<<ReceiveAsync>b__0>d", in MoveNext Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "Microsoft.Azure.ServiceBus.RetryPolicy", in RunOperation Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "Microsoft.Azure.ServiceBus.RetryPolicy", in RunOperation Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in ReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.Core.MessageReceiver", in ReceiveAsync Module "System.Runtime.ExceptionServices.ExceptionDispatchInfo", in Throw Module "System.Runtime.CompilerServices.TaskAwaiter", in ThrowForNonSuccess Module "System.Runtime.CompilerServices.TaskAwaiter", in HandleNonSuccessAndDebuggerNotification Module "Microsoft.Azure.ServiceBus.MessageReceivePump+<<MessagePumpTaskAsync>b__11_0>d", in MoveNext

Let me know if you require any more information.

@DorothySun216
Copy link
Contributor

Hi @mr-davidc, thanks for reaching out. Are you from the same team as @mladedav? If not, can you create another thread and share with us a piece of code that has the exceptions thrown? This thread is already really long and hard to navigate. We need to tailor for each repro to see what's the problem there since these kinds of problems are hard to debug.

@DorothySun216
Copy link
Contributor

@mr-davidc, @mladedav any updates on the logs? If not, we will close this incident.

@mr-davidc
Copy link

@mr-davidc, @mladedav any updates on the logs? If not, we will close this incident.

Hi @DorothySun216, mladedav and I are from separate teams so I can't speak for them but from my end I created a separate issue for the problems I am getting here: #13637

Thanks

@DorothySun216
Copy link
Contributor

We have rolled out a fixed #17023 on latest release 5.1.0 and can you test if with this new nuget package, are you still seeing the same issue? https://www.nuget.org/packages/Microsoft.Azure.ServiceBus/5.1.0

@ColeSiegelTR
Copy link

ColeSiegelTR commented Mar 3, 2021

I'm still getting a similar issue with Microsoft.Azure.ServiceBus 5.1.2. If I close all the sessions and the SubscriptionClient with CloseAsync() then ExceptionReceivedHandler continues to fire with exception below:

      Exception handled during connection refresh: [Microsoft.Azure.ServiceBus.ServiceBusException: The operation was canceled.
       ---> System.OperationCanceledException: The operation was canceled.
         at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
         at Microsoft.Azure.Amqp.AmqpCbsLink.SendTokenAsyncResult.<>c__DisplayClass13_0.<GetAsyncSteps>b__3(SendTokenAsyncResult thisPtr, IAsyncResult r)
         at Microsoft.Azure.Amqp.IteratorAsyncResult`1.StepCallback(IAsyncResult result)
      --- End of stack trace from previous location where exception was thrown ---
         at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
         at Microsoft.Azure.Amqp.AmqpCbsLink.<>c__DisplayClass4_0.<SendTokenAsync>b__1(IAsyncResult a)
         at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
      --- End of stack trace from previous location where exception was thrown ---
         at Microsoft.Azure.ServiceBus.Amqp.AmqpLinkCreator.CreateAndOpenAmqpLinkAsync()
         at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CreateLinkAsync(TimeSpan timeout)
         at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout)
         at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
         at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
         at Microsoft.Azure.ServiceBus.Core.MessageReceiver.GetSessionReceiverLinkAsync(TimeSpan serverWaitTime)
         at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
         at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
         at Microsoft.Azure.ServiceBus.SessionClient.AcceptMessageSessionAsync(String sessionId, TimeSpan operationTimeout)
         --- End of inner exception stack trace ---
         at Microsoft.Azure.ServiceBus.SessionClient.AcceptMessageSessionAsync(String sessionId, TimeSpan operationTimeout)
         at Microsoft.Azure.ServiceBus.SessionReceivePump.SessionPumpTaskAsync()]

Any suggestions?

@DorothySun216
Copy link
Contributor

DorothySun216 commented Mar 3, 2021

@ColeSiegelTR Thanks for reaching out. This is a very unpredictable issue and we attempted to repro it many times but couldn't repro. Do you have retry mechanisms that can recover from this exception? If this exception happened within 30 days, can you open a Azure support ticket with us? https://azure.microsoft.com/en-us/support/create-ticket/ We will investigate further.

@Mortana89
Copy link

We upgraded to the latest Azure SDK and this still happens (Azure.Messaging.ServiceBus)

@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. Service Attention Workflow: This issue is responsible by Azure service team. Service Bus
Projects
None yet
Development

No branches or pull requests