Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fuzz] Added random load balancer fuzz #13400

Merged
merged 35 commits into from
Oct 20, 2020
Merged

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Oct 6, 2020

Commit Message: Added random load balancer fuzz
Additional Description: Part of securing Envoy against untrusted upstreams. The high level design of this fuzzer is a base class that handles static host setup/updates, setting up random() calls, and replays the action stream. Then, more specific fuzzers (in this pull request, random), create specific load balancers with all the state they need, pass it into the base class, which then runs checks on it. #13424 also uses this design pattern, except with a least request load balancer, however using the same shared base class.

43.6% line coverage over source/common/upstream/load_balancer_impl.cc, 30ish exec/sec.

Risk Level: Low
Testing: Added corpus entries that represent unit tests.
Docs Changes:
Release Notes:

Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
@zasweq
Copy link
Contributor Author

zasweq commented Oct 6, 2020

/assign @asraa @adisuissa

Signed-off-by: Zach <zasweq@google.com>
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looking great :)

Keeping in mind that the main utility of the the base class is to implement the replay logic (translate an action stream to LB actions).... what do you envision LB fuzzers to look like if a new developer adds a LB? Can you add this to the PR description?

Will the developer need to subclass LoadBalancerFuzzBase and implement initialize/prefetch/chooseHost, and add a line to the switch statement of the main fuzzer currently written? How fast is the fuzzer? If it is slow (<1000 exec/sec), it would be more effective if each LB had their own fuzzer so fuzzing could be parallelized.

test/common/upstream/load_balancer_fuzz.h Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
std::unique_ptr<LoadBalancerFuzzBase> load_balancer_fuzz;

// TODO: Switch across type once added more
load_balancer_fuzz = std::make_unique<RandomLoadBalancerFuzzTest>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Viewing the code, seems like all the load balancers will be fuzzed in a single fuzzer/single input. What are the trade-offs here (maybe add to the PR description)? Do you anticipate all of them having the same inputs, or would the inputs be clearer or fuzzing more efficient to have single fuzzers for each?

test/common/upstream/load_balancer_fuzz.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.h Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.h Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_test_utils.h Outdated Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!! This looks amazing on the whole.

Could you comment on testing (what corpus entries were added etc), speed, and coverage in the PR description please?

test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.h Outdated Show resolved Hide resolved
test/common/upstream/random_load_balancer_fuzz_test.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/BUILD Outdated Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few API comments mostly to start, thanks.
/wait

test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
repeated LbAction actions = 2;
//Ports of hosts constructed range from 80 - 65455, all on the same Ip Address
//These caps make sure that the port of the hosts constructed does not overflow
uint32 num_hosts_in_priority_set = 3 [(validate.rules).uint32.lt = 32727];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would model different kinds of priority levels and localities in the initial state. Is this possible to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added arbitrary amounts of priority levels (have a hardcoded hard cap at 20 right now). Working on localities.

test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz.proto Show resolved Hide resolved
@zasweq
Copy link
Contributor Author

zasweq commented Oct 8, 2020

/review @htuch

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Got a question about the expected behavior of updateHealthFlagsForAHostSet.

Signed-off-by: Zach <zasweq@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Mostly nits.

uint32 num_degraded_hosts = 3;
uint32 num_excluded_hosts = 4;
// This is used to determine which hosts get marked as healthy, degraded, and excluded.
bytes random_bytestring = 5 [(validate.rules).bytes = {min_len: 1, max_len: 256}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, an idea of making this run-length encoded just occurred to me, it would be space efficient, but not sure how that would interact with the fuzzing cross-over.

test/common/upstream/load_balancer_fuzz.proto Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
for (uint32_t i = 0; i < number_of_elements_in_subset && index_vector_.size() != 0; i++) {
// Index of bytestring will wrap around if it "overflows" past the random bytestring's length.
uint64_t index_of_index_vector =
random_bytestring_[index_of_random_bytestring_ % random_bytestring_.length()] %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always be capped to 256 elements, since it's a uint8_t in the string?

I was originally thinking that we could use a N byte byte string to provide 8*N subset selection by looking at it as a bit vector. I think what you've chosen instead is to make the string a list of indices. Which is fine, it has much better sparse properties. But, I don't think we should cap hosts at 256, this is too small, we want at least O(10k) hosts to model realistic scenarios.

Copy link
Contributor Author

@zasweq zasweq Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, will look into what number to move this to. @adisuissa @asraa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Harvey, I talked about this with Asra in standup and have been thinking about it, I think since this PR has been open a long time that this question should be answered in a subsequent PR. Right now, I'm scared that my fuzzer might have a O(N^2) bottleneck somewhere, but the clear bottleneck is using mock objects, particularly hosts. I think that the way this fuzzer is setup with Mock Hosts, that 4 figure hosts would render this fuzzer useless. I'm actually working on making my Health Checking fuzzer more efficient with the help of flamegraphs, so how about making this a TODO for once I finish making mocks more efficient? I will definitely get to this however, as this is the core functionality of the load balancing fuzzer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i think this discussion is also orthogonal to the SubsetSelector interface. The SubsetSelector should maybe be able to handle arbitrary sized sets regardless of how many elements the fuzzer wants to set, but a TODO sounds good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, a TODO is fine, but I think we should make this something that is followed up sooner rather than later, otherwise we will have a false sense of confidence in what we're covering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, once I fix the O(N^2) in my subset selection algorithm I'll play around with this, I'll try to scale this number up within this PR because I would immediately work on this anyway if the PR in it's current state is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of O(N^2) allowed me to use 10k hosts with no problem :)! However, Asra brings up a good point - my subset selector interface is capped at 256 hosts sort of (uint8_t). I will generalize my random subset selector to be able to return any amount of bits!!!!!! :)

Copy link
Contributor Author

@zasweq zasweq Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to document this, I've been messing around with this, and what's currently checked into this PR allows 10k hosts at decent efficiency (of course what's currently checked in has it hardcoded at 256 hosts per priority set). What I'm working on now is figuring out how to handle the fully random bytestring (we wanted entropy across all the bits according to our meeting) in regards to arbitrary length index selection (right now locally I switched it to 32 bits wide).

Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! Great job at separating out the subset selection!

test/fuzz/random.h Outdated Show resolved Hide resolved
test/fuzz/random.h Outdated Show resolved Hide resolved
test/fuzz/random.h Show resolved Hide resolved
test/fuzz/random.h Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.h Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
test/common/upstream/load_balancer_fuzz.proto Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
for (uint32_t i = 0; i < number_of_elements_in_subset && index_vector_.size() != 0; i++) {
// Index of bytestring will wrap around if it "overflows" past the random bytestring's length.
uint64_t index_of_index_vector =
random_bytestring_[index_of_random_bytestring_ % random_bytestring_.length()] %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, a TODO is fine, but I think we should make this something that is followed up sooner rather than later, otherwise we will have a false sense of confidence in what we're covering.

test/fuzz/random.h Outdated Show resolved Hide resolved
test/fuzz/random.h Outdated Show resolved Hide resolved
test/fuzz/random.h Outdated Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! couple of small comments left

test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/fuzz/random.h Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
Signed-off-by: Zach <zasweq@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, mostly nits and cleanups.
/wait

test/common/upstream/load_balancer_fuzz_base.cc Outdated Show resolved Hide resolved
test/common/upstream/load_balancer_fuzz_base.h Outdated Show resolved Hide resolved
test/fuzz/random.h Outdated Show resolved Hide resolved
test/fuzz/random.h Outdated Show resolved Hide resolved
} // namespace Random

namespace Fuzz {
class ProperSubsetSelector {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks really good. I'd suggest a short gtest for this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Signed-off-by: Zach <zasweq@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two nits and we can ship.
/wait

test/fuzz/random_test.cc Outdated Show resolved Hide resolved
test/fuzz/random_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Zach <zasweq@google.com>
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!! 🥳

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Epic.

@htuch htuch merged commit d043a1d into envoyproxy:master Oct 20, 2020
mpuncel added a commit to mpuncel/envoy that referenced this pull request Oct 21, 2020
* master: (22 commits)
  ci: various improvements (envoyproxy#13660)
  dns: fix defunct fd bug in apple resolver (envoyproxy#13641)
  build: support ppc64le with wasm (envoyproxy#13657)
  [fuzz] Added random load balancer fuzz (envoyproxy#13400)
  dependencies: compute and check release dates via GitHub API. (envoyproxy#13582)
  mac ci: try ignoring update failure (envoyproxy#13658)
  watchdog: Optimize WatchdogImpl::touch in preparation to more frequent petting of the watchdog. (envoyproxy#13103)
  typos: fix a couple 'enovy' mispellings (envoyproxy#13645)
  lua: Expose stream info downstreamLocalAddress and downstreamDirectRemoteAddress for Lua filter (envoyproxy#13536)
  tap: fix upstream streamed transport socket taps (envoyproxy#13638)
  Revert "delay health checks until transport socket secrets are ready. (envoyproxy#13516)" (envoyproxy#13639)
  Watchdog: use abort action as a default if killing is enabled. (envoyproxy#13523)
  [fuzz] Fixed divide by zero bug (envoyproxy#13545)
  wasm: flip the meaning of the "repository" in envoy_wasm_cc_binary(). (envoyproxy#13621)
  fix: record recovered local address (envoyproxy#13581)
  docs: fix incorrect compressor filter doc (envoyproxy#13611)
  docs: clean up docs for azp migration (envoyproxy#13558)
  wasm: fix building Wasm example. (envoyproxy#13619)
  test: Refactor flood tests into a separate test file (envoyproxy#13556)
  wasm: re-enable tests with precompiled modules. (envoyproxy#13583)
  ...

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants