-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go-routine leak while cache enabled #8715
Comments
The expiry time is updated here every time the entry is accessed: https://github.com/hashicorp/consul/blob/master/agent/cache/cache.go#L431-L434
I think that is a problem, but I'm not sure how it's related to the expiry. There is only a single expiry goroutine for all cache entries. The extra goroutine is the one started by |
@dnephin I think I maybe found something strange. Line Line 433 in d32e7f0
Line Line 797 in d32e7f0
It may be one reason for this issue, but there maybe any other reasons. |
I will try to explain the goroutine leak. (Maybe it's not right) Line 796 in d32e7f0
2,fetch() was waiting for the result when the entry was deleted. Line 608 in d32e7f0
3, fetch() got the result and add the entry to c.entries. Line 720 in d32e7f0
Now though no one calls the getWithIndex() , the entry was added. In Consul V1.5.3, entriesExpiryHeap will also be added. |
Maybe there is a solution: |
Another question: And newEntry was added to the c.entries. Since this time ,fetching is false until the next fetch set it to true. But between this time,because fetching is false ,any call to fetch will not return the waiter and create a new go routine. |
There is a lock ( I think #8715 (comment) sounds pretty correct. I suspect the problem is in this area: At this point the entry could have been expired, but the goroutine never checks to see if the entry has been removed. So if the expire happens when while a fetch is running, then it won't properly expire, and, if another request comes in between the expiry time and when fetch returns, we'll have leaked a goroutine (as you mentioned). I think somehow in that locked section (lines 711-721) we need to be able to detect if the previous entry was expired. I'm not sure if we have enough data stored to determine that right now. We can't check One way to handle this might be to add an |
@varnson I should also mention that if you are running version 1.5.3 we have fixed some cache problems in more recent versions. #8092 is the one fix that I remember. Maybe that fixes your problem? I think there are still possible goroutine leaks, but the one we've identified should be rare because it can only happen when an entry expires, and that can only happen after the 3 day TTL. |
Go routine leak while cache enabled
file: agent/cache/cache.go
Step 1, func fetch() will set expire time to entry only at the first time. And will never change it. And will create a goroutine to refresh it.
func (c *Cache) fetch(tEntry typeEntry,
...........................
if newEntry.Expiry == nil || newEntry.Expiry.HeapIndex == -1 { //at the first time,the expire time was set
newEntry.Expiry = &cacheEntryExpiry{Key: key}
newEntry.Expiry.Update(tEntry.Opts.LastGetTTL)
heap.Push(c.entriesExpiryHeap, newEntry.Expiry)
}
................................
Step 2, when time expired, the entry will be deleted by runExpiryLoop.
func (c *Cache) runExpiryLoop() {
.................
case <-expiryCh:
c.entriesLock.Lock()
..................
}
Step 3, when some one call the fetch again, because the entry was deleted, so a new entry and a new go-routine will be created. Now, we have one entry and two go-routines.
The text was updated successfully, but these errors were encountered: