[exporterhelper] boundedMemoryQueue may send on closed channel #7388

atingchen · 2023-03-15T04:49:25Z

Describe the bug
When the gorutine prepares to submit new item to the queue, q.stopped.Load() returns true. The gorutine has not finished sending items to the channel.

func (q *boundedMemoryQueue) Produce(item Request) bool {
	if q.stopped.Load() {
		return false
	}

	if q.size.Load() >= q.capacity {
		return false
	}

	q.size.Add(1)
	select {
	case q.items <- item:
		return true
	default:
		// should not happen, as overflows should have been captured earlier
		q.size.Add(^uint32(0))
		return false
	}
}

At the same time, service shutdown and invokes stop() to close the channel. The gorutine may send item to the closed channel.

func (q *boundedMemoryQueue) Stop() {
	q.stopped.Store(true) // disable producer
	close(q.items)
	q.stopWG.Wait()
}

What did you expect to see?
thread-safe

What did you see instead?

What version did you use?
v0.73.0

The text was updated successfully, but these errors were encountered:

yutingcaicyt · 2023-05-04T09:50:12Z

The collector will stop in topological order so that upstream components are stopped before downstream components. This ensures that each component has a chance to drain to its consumer before the consumer is stopped. Therefore, under normal circumstances, before the exporter is closed, the receiver and processor have guaranteed to complete the data transmission and close. When the exporter is closing, no data will enter the exporter.

atingchen · 2023-05-06T07:19:44Z

The collector will stop in topological order so that upstream components are stopped before downstream components. This ensures that each component has a chance to drain to its consumer before the consumer is stopped. Therefore, under normal circumstances, before the exporter is closed, the receiver and processor have guaranteed to complete the data transmission and close. When the exporter is closing, no data will enter the exporter.

Thanks for your reply @yutingcaicyt. boundedMemoryQueue will be used as a retry queue. When the upstream component is closed, data may still re-enter the queue due to the retry strategy.

When queuedRetrySender calls shutdown, it will first close retryStopCh, and then close the retry queue.
OnTemporaryFailure is called between these two steps, which has a small chance of putting the data back into the retry queue.

		select {
		case <-req.Context().Done():
			return fmt.Errorf("Request is cancelled or timed out %w", err)
		case <-rs.stopCh:
			return rs.onTemporaryFailure(rs.logger, req, fmt.Errorf("interrupted due to shutdown %w", err))
		case <-time.After(backoffDelay):
		}

dmitryax · 2023-12-08T01:40:22Z

The re-enqueuing is only applicable to the persistent queue, which doesn't have a closed channel. Memory queue doesn't get data re-enqueued

This change unblocks adding the `enqueue_on_failure` option to the queue sender by removing the requeue behavior on the shutdown. If we don't remove requeue on shutdown, it's possible to run into a situation described in #7388. After the recent refactoring, the chance of running into it is pretty small, but it's still possible. The only reason to requeue on shutdown is to make sure there is no data loss with the persistent queue enabled. The persistent queue captures all the inflight requests in the persistent storage anyway, so there is no reason to requeue an inflight request. The only downside is it potentially can cause sending duplicate data on the collector restart in case of a partially failed request during shutdown. Another option would be to rework the memory queue to never close the channel but still ensure draining.

) This change unblocks adding the `enqueue_on_failure` option to the queue sender by removing the requeue behavior on the shutdown. If we don't remove requeue on shutdown, it's possible to run into a situation described in open-telemetry#7388. After the recent refactoring, the chance of running into it is pretty small, but it's still possible. The only reason to requeue on shutdown is to make sure there is no data loss with the persistent queue enabled. The persistent queue captures all the inflight requests in the persistent storage anyway, so there is no reason to requeue an inflight request. The only downside is it potentially can cause sending duplicate data on the collector restart in case of a partially failed request during shutdown. Another option would be to rework the memory queue to never close the channel but still ensure draining.

atoulme · 2023-12-19T05:30:16Z

I believe this issue to be fixed as we have changed the implementation of queues quite a bit, as @dmitryax points above. The code is completely different now. Previously, a test was flaky because of this problem, but it no longer is.

Closing as resolved. Feel free to comment or create new issues to follow up.

atingchen added the bug Something isn't working label Mar 15, 2023

atingchen changed the title ~~[exporter] boundedMemoryQueue may send on closed channel~~ [exporterhelper] boundedMemoryQueue may send on closed channel Mar 23, 2023

This was referenced Dec 8, 2023

[chore] [exporterhelper] Do not requeue on shutdown #9054

Merged

reenqueue may not be needed #8382

Closed

atoulme closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporterhelper] boundedMemoryQueue may send on closed channel #7388

[exporterhelper] boundedMemoryQueue may send on closed channel #7388

atingchen commented Mar 15, 2023

yutingcaicyt commented May 4, 2023

atingchen commented May 6, 2023

dmitryax commented Dec 8, 2023

atoulme commented Dec 19, 2023

[exporterhelper] boundedMemoryQueue may send on closed channel #7388

[exporterhelper] boundedMemoryQueue may send on closed channel #7388

Comments

atingchen commented Mar 15, 2023

yutingcaicyt commented May 4, 2023

atingchen commented May 6, 2023

dmitryax commented Dec 8, 2023

atoulme commented Dec 19, 2023