Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Add shared client support #7

Merged
merged 1 commit into from
Nov 8, 2023
Merged

Conversation

jrajahalme
Copy link
Member

Add SharedClient and SharedClients types.

@jrajahalme jrajahalme marked this pull request as draft November 7, 2023 18:37
@jrajahalme jrajahalme force-pushed the client-support-sharing branch 3 times, most recently from 42a5898 to 6f69545 Compare November 7, 2023 20:07
@jrajahalme jrajahalme marked this pull request as ready for review November 7, 2023 20:47
@jrajahalme jrajahalme force-pushed the client-support-sharing branch 2 times, most recently from 24113b5 to bba3eb4 Compare November 8, 2023 09:34
shared_client.go Outdated
client.refcount--
if client.refcount == 0 {
// Make client unreachable and close it's connection.
// Must hold the proxy mutex for this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I guess proxy is not relevant once this code landed in the library, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will amend!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the review!

@jrajahalme jrajahalme force-pushed the client-support-sharing branch from bba3eb4 to 6ce6bc6 Compare November 8, 2023 11:32
@jrajahalme
Copy link
Member Author

Fixed comments still mentioning the proxy and squashed the commits to clean things up.

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!
Left a suggestion for a marginal optimization when key == "" and a question about a possible perf regression.

shared_client.go Outdated
Comment on lines 59 to 62
// connection must be closed while holding the proxy lock to avoid a race
// where a new client is created with the same 5-tuple before this one is
// closed, which could happen if the proxy lock is released before this
// Close call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the lock on the SharedClients struct serves two purposes:

  1. protect each client's refcount
  2. avoid binding errors because of the race mentioned in this comment

Actually, both 1) and 2) needs to be serialized only in case of concurrent calls to GetSharedClient for the same key/5-tuple. So, to increase parallelism, we could use an approach similar to the one in https://github.com/cilium/cilium/blob/e40fd083a503b21dd3c88e8e21e83814356a3616/daemon/cmd/fqdn.go#L687 (having a slice of mutexes and hashing the 5-tuple to get the mutex to lock).

Looking at the two critical sections protected by the lock, I think the only relevant case where we might experience an unexpected delay is the closing of a TCP connection. Might that become a problem?
OTOH the hashing + mutex will surely increase complexity even more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock on the SharedClients also protects the map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed to say that I was thinking about using a sync.Map instead of a plain map (docs reports one of its suggested use as (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came up with this pattern:

  • client's refcount is protected by it's own mutex
  • collection's (global) lock is only held during collection map operations and during the new Client creation in between a failing lookup and store, but Client creation does not make any system calls
  • only client's own lock is held while it is dialing OR closing (including system calls)
  • this means that while waiting for the client's lock to increase it's refcount, the client may start closing, but in this case refcount == 0 and we may discard this client and start again.
func (s *SharedClients) GetSharedClient(key string, conf *Client, serverAddrStr string) (client *SharedClient, closer func()) {
	for {
		// lock for s.clients access
		s.lock.Lock()
		if key != "" {
			// locate client to re-use if possible.
			client = s.clients[key]
		}
		if client == nil {
			client = newSharedClient(conf, serverAddrStr)
			if key != "" {
				s.clients[key] = client
			}
			s.lock.Unlock()
			// new client, we are done
			break
		}
		s.lock.Unlock()

		// reusing client that may start closing while we wait for its lock
		client.Lock()
		if client.refcount > 0 {
			// not closed, add our refcount
			client.refcount++
			client.Unlock()
			break
		}
		// client was closed while we waited for it's lock, discard and try again
		client.Unlock()
		client = nil
	}

	return client, func() {
		client.Lock()
		defer client.Unlock()
		client.refcount--
		if client.refcount == 0 {
			// connection close must be completed while holding the client's lock to
			// avoid a race where a new client dials using the same 5-tuple and gets a
			// bind error.
			// The client remains findable so that new users with the same key may wait
			// for this closing to be done with.
			close(client.requestsToSend)
			client.conn = nil
			client.wg.Wait()
			// Make client unreachable
			// Must take s.lock for this.
			s.lock.Lock()
			if key != "" {
				delete(s.clients, key)
			}
			s.lock.Unlock()
		}
	}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SharedClients' lock is now private and it is only used within GetSharedClient function. When it is held no client locks can be taken.

shared_client.go Outdated
Comment on lines 34 to 47
s.Lock()
defer s.Unlock()

if key != "" {
// locate client to re-use if possible.
client = s.clients[key]
}
if client == nil {
client = newSharedClient(conf, serverAddrStr)
if key != "" {
s.clients[key] = client
}
}
client.refcount++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the client won't be shared in case key == "" we can avoid setting refcount and taking the lock:

	if key == "" {
		client = newSharedClient(conf, serverAddrStr)
		return client, func() {
			if client.conn != nil {
				client.conn.Close()
			}
		}
	}

	// here we can take the lock (but we don't have to check if key is empty anymore)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, adopted!

Comment on lines 99 to 112
if err != nil {
errors.Store(id, fmt.Errorf("failed to exchange: %v", err))
} else {
if r == nil {
errors.Store(id, fmt.Errorf("response is nil"))
} else {
if r.Id != id {
errors.Store(id, fmt.Errorf("incorrect id (%d != %d)", r.Id, id))
}
if r.Rcode != RcodeSuccess {
errors.Store(id, fmt.Errorf("failed to get an valid answer\n%v", r))
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return makes this more readable:

Suggested change
if err != nil {
errors.Store(id, fmt.Errorf("failed to exchange: %v", err))
} else {
if r == nil {
errors.Store(id, fmt.Errorf("response is nil"))
} else {
if r.Id != id {
errors.Store(id, fmt.Errorf("incorrect id (%d != %d)", r.Id, id))
}
if r.Rcode != RcodeSuccess {
errors.Store(id, fmt.Errorf("failed to get an valid answer\n%v", r))
}
}
}
if err != nil {
errors.Store(id, fmt.Errorf("failed to exchange: %v", err))
return
}
if r == nil {
errors.Store(id, fmt.Errorf("response is nil"))
return
}
if r.Id != id {
errors.Store(id, fmt.Errorf("incorrect id (%d != %d)", r.Id, id))
return
}
if r.Rcode != RcodeSuccess {
errors.Store(id, fmt.Errorf("failed to get an valid answer\n%v", r))
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, adopted!

@jrajahalme jrajahalme force-pushed the client-support-sharing branch 2 times, most recently from 6a4a248 to 3d09a0a Compare November 8, 2023 14:58
Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve any performance concern in case of high parallelism! 💯

I've left some suggestions to simplify the code a bit, but nothing major.

shared_client.go Outdated
*Client

requestsToSend chan request
responses chan sharedClientResponse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to make the responses channel part of the SharedClient.
We can instantiate this in handler and keep it there, since:

  1. only the goroutines in handler send to, receive from and close the channel
  2. handler is called only once when the SharedClient connects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed!

Comment on lines +143 to +149
wg.Add(1)
go func() {
defer wg.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of these wg.Add and wg.Done() here. That's because handler can't exit until responses is closed, and only after that the wg.Done() deferred at the beginning of the function will be called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that handler can exit before responses is closed, for example if requests is closed. I saw this happen in tests when I was working on getting rid of channel buffers (which can hide bugs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I've missed that case. 👍

shared_client.go Outdated
// Make client unreachable
// Must take s.lock for this.
s.lock.Lock()
if key != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check for key being not-empty after checking it at the beginning of the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed!

shared_client.go Outdated
Comment on lines 38 to 39
client.conn = nil
client.wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should swap these two. Doing that we'll be sure that conn is not still being used by the goroutines in handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

shared_client.go Outdated
Comment on lines 80 to 81
client.conn = nil
client.wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for swapping the wait and the setting of conn to nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!, also refactored this repetition to SharedClient.close()

@jrajahalme jrajahalme force-pushed the client-support-sharing branch from 3d09a0a to 8f678c9 Compare November 8, 2023 17:27
@jrajahalme jrajahalme requested a review from pippolo84 November 8, 2023 17:27
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks! 💯

Add SharedClient and SharedClients types to support multiple different
requests in flight on the same DNS client connection.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the client-support-sharing branch from 8f678c9 to a547953 Compare November 8, 2023 17:50
@jrajahalme jrajahalme merged commit eaf71f6 into master Nov 8, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants