Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream: Fix moving EDS hosts between priorities. #14483

Merged
merged 14 commits into from
Jan 13, 2021

Commits on Jan 12, 2021

  1. Fix moving EDS hosts between priorities doesn't work.

    At present if health checks are enabled and passing then moving an EDS host from P0->P1 is a NOOP, and P1->P0 results in an abort.
    
    In the first case:
    * P0 processing treats A as being removed because it's not in P0's list of endpoints anymore.
    * P0 marks A's existing Host as PENDING_DYNAMIC_REMOVAL. It marks A as having been updated in this config apply.
    * P1 skips over A because it is marked as updated in this update cycle already.
    
    In the second case:
    * P0 updates the priority on the existing Host. It is appended to the vector of added hosts.
    * P1 marks A's existing Host as PENDING_DYNAMIC_REMOVAL. It does adjust the removed host vector as the host is still pending removal.
    * A's Host is now in both priorities and is PENDING_DYNAMIC_REMOVAL. This is wrong, and would cause problems later but doesn't have a chance to because:
    * onClusterMemberUpdate fires with A's existing Host in the added vector (remember it wasn't removed from P1!)
    * HealthChecker attempts to create a new health check session on A, which results in an abort from the destructor of the already existing one.
    
    This was masked in tests by the tests enabling ignore_health_on_host_removal.
    
    We fix this by passing in the set of addresses that appear in the endpoint update. If a host being considered for removal appears in this set,
    and it isn't being duplicated into the current priority as a result of a health check address change, then we assume it's being moved and will
    immediately remove it.
    
    To simplify the case where a host's health check address is being changed AND it is being moved between priorities we always apply priority moves in
    place before we attempt any other modifications. This means such a case is treated as a separate priority move, followed by the health check address change.
    
    fixes envoyproxy#11517
    
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    327fe3a View commit details
    Browse the repository at this point in the history
  2. "Fixed" handling of priority move combined with health endpoint change.

    I had intended to preserve the PENDING_DYNAMIC_REMOVAL behaviour for an endpoint that was
    simultaniously being moved between priorities and having it's health endpoint changed. However, adding a
    test for that case proved that my code didn't work. Fixing that would be quite a bit more difficult,
    and on thinking about the implications I'm also not sure that it's desirable behaviour.
    
    Instead I now immediately remove the old endpoint.
    
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    6440ac8 View commit details
    Browse the repository at this point in the history
  3. Add test for priority change in other direction.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    3a3d7a0 View commit details
    Browse the repository at this point in the history
  4. Re add the original tests

    Hoping that this fixes the drop in code coverage.
    
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    68f3a80 View commit details
    Browse the repository at this point in the history
  5. Update comment to make the remove behaviour clearer.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    79170af View commit details
    Browse the repository at this point in the history
  6. Improved test for redis cluster host removal.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    86ff76e View commit details
    Browse the repository at this point in the history
  7. Add TODO

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    db74977 View commit details
    Browse the repository at this point in the history
  8. Add missing doc comment

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    6444dcc View commit details
    Browse the repository at this point in the history
  9. Use flat hash map as we don't need node stability.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    0b23cad View commit details
    Browse the repository at this point in the history
  10. Convert HostMap to a flat_hash_map.

    I don't think any usage of this requires the stability guarantees of node_hash_map, so it should be safe to change its type.
    
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    c6f3b6c View commit details
    Browse the repository at this point in the history
  11. Update comment for consistency.

    Co-authored-by: Matt Klein <mattklein123@gmail.com>
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO and mattklein123 committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    b93018a View commit details
    Browse the repository at this point in the history
  12. Add changelog entry.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    4527f05 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2021

  1. Use string_view for tracking addresses being added to this priority.

    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 13, 2021
    Configuration menu
    Copy the full SHA
    58da9c1 View commit details
    Browse the repository at this point in the history
  2. Revert "Use string_view for tracking addresses being added to this pr…

    …iority."
    
    Changed my mind, not sure this was correct usage of string_view.
    
    This reverts commit 58da9c1.
    
    Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
    JonathanO committed Jan 13, 2021
    Configuration menu
    Copy the full SHA
    d400f15 View commit details
    Browse the repository at this point in the history