Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RequestLoggerController concurrency issues #167

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

pranavr12
Copy link
Contributor

@pranavr12 pranavr12 commented Jan 21, 2025

There seems issues with threads on this line in aws-proxy - https://github.com/trinodb/aws-proxy/blob/main/trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/rest/RequestLoggerController.java#L307
Its using EvictingQueue which is not thread-safe.

java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:361)\n\tat 
java.base/java.util.ArrayDeque.remove(ArrayDeque.java:522)\n\tat 
com.google.common.collect.EvictingQueue.add(EvictingQueue.java:111)\n\tat 
io.trino.aws.proxy.server.rest.RequestLoggerController.logAndClear(RequestLoggerController.java:307)\n\tat 
io.trino.aws.proxy.server.rest.RequestLoggerController.internalNewRequestSession(RequestLoggerController.java:225)\n\tat 
io.trino.aws.proxy.server.rest.RequestLoggerController.lambda$newRequestSession$1(RequestLoggerController.java:175)

This PR makes EvictingQueue synchronised by adding a wrapper. Alternatively, we could have used a custom implementation of ConcurrentLinkedQueue (non-blocking, thread safe, natively unbounded) with a FIFO policy. The code would look something like this:

public static class ConcurrentLinkedQueueWithEviction<E>
{
    private final int maxSize;
    private AtomicInteger currentSize;
    private final Queue<E> queue;

    public ConcurrentLinkedQueueWithEviction(int maxSize)
    {
        this.maxSize = maxSize;
        currentSize = new AtomicInteger(0);
        queue = new ConcurrentLinkedQueue<>();
    }

    public boolean add(E e)
    {
        boolean isOperationSuccessful = queue.add(e);
        if (isOperationSuccessful) {
            currentSize.incrementAndGet();
            evict();
        }
        return isOperationSuccessful;
    }

    public Stream<E> stream()
    {
        return queue.stream();
    }

    public void clear()
    {
        queue.clear();
        currentSize.set(0);
    }

    private void evict()
    {
        while (currentSize.get() > maxSize) {
            queue.poll(); // Remove the oldest element (FIFO eviction)
            currentSize.decrementAndGet();
        }
    }
}

I did a performance test on my local for adding elements into in the queue (16k threads, 50 ops per thread, 100k queue size). The synchronised version of EvictingQueue performed consistently better than the custom implementation. So, decided to go with the synchronised version

@cla-bot cla-bot bot added the cla-signed label Jan 21, 2025
@pranavr12 pranavr12 changed the title Add queue Fix RequestLoggerController concurrency issues Jan 21, 2025
@@ -146,7 +147,7 @@ public static String eventId(Instant timestamp, long requestNumber, EventType ev
public RequestLoggerController(TrinoAwsProxyConfig trinoAwsProxyConfig)
{
// *2 because we log request/response
saveQueue = EvictingQueue.create(trinoAwsProxyConfig.getRequestLoggerSavedQty() * 2);
saveQueue = Queues.synchronizedQueue(EvictingQueue.create(trinoAwsProxyConfig.getRequestLoggerSavedQty() * 2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't ideal. This will introduce a lot of synchronization. We should find a better solution.

Copy link
Contributor Author

@pranavr12 pranavr12 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was favouring a custom implementation of a ConcurrentLinkedQueue with a FIFO eviction policy. I've posted an alternative solution in the PR description.
But running a performance test, the Java synchronisation libraries performed much better maybe due to inbuilt optimisations. I'm good with either solutions. WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's OK? @vagaerg and @mosiac1 wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed the comment. Sync is probably the best solution. Let's go with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I trust the Guava implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, and if we observe a performance penalty we can reconsider our options.

Since we disable the requestlogger entirely if the quantity is set to zero, there's an easy way to avoid performance issues if people run into problems after deployment. (that's of course not ideal and we should fix it if it happens, just saying there's an easy rollback)

@mosiac1 mosiac1 changed the title Fix RequestLoggerController concurrency issues https://github.com/trinodb/aws-proxy/issues/165 Fix RequestLoggerController concurrency issues Jan 21, 2025
@mosiac1 mosiac1 changed the title https://github.com/trinodb/aws-proxy/issues/165 Fix RequestLoggerController concurrency issues Fix RequestLoggerController concurrency issues Jan 21, 2025
@mosiac1
Copy link
Contributor

mosiac1 commented Jan 21, 2025

Fixed #165

@pranavr12 could you please rename the commit to "Fix RequestLoggerController concurrency issues"

@mosiac1 mosiac1 merged commit 97b4f9e into trinodb:main Jan 21, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants