Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new --show-percentiles arg to get more granular percentiles instead of just median to all commands #43

Merged
merged 7 commits into from
Mar 7, 2023

Conversation

felixarntz
Copy link
Collaborator

@felixarntz felixarntz commented Feb 28, 2023

Follow up to #40.

This PR chooses a hard-coded set of percentiles 10, 25, 50 (median), 75, 90, previously discussed with @oandregal. This is also what is commonly used for HTTP Archive queries by that team, and we have also used that set of percentiles in our own HTTP Archive queries (e.g. this one).

In the future we can maybe make the concrete percentiles controllable by the argument, but that doesn't seem necessary at this point. The above percentiles are a good baseline to get a more representative impression of the overall accuracy of the values.

Testing this PR

  1. Run each of the 4 commands prior to this change (for any URL / any WebPageTest result).
  2. Switch to this PR branch.
  3. Run the same commands as above again.
    • For the wpt-* commands, make sure the median results are the same.
    • For the benchmark-* commands, make sure the median results are roughly similar (they won't be the same because it's different runs).
  4. Run the same commands again, but now include the --show-percentiles flag in each one.
    • For the wpt-* commands, make sure the p50 results are the same as the previous median results.
    • For the benchmark-* commands, make sure the p50 results are roughly similar to the previous median results (they won't be the same because it's different runs).
    • For any command, do a quick sanity check on whether percentile values look right, e.g. if you use 8 runs for your command, the p25 value should be the value of the 2nd lowest run result, or if you use 10 runs, the p90 value should be the 9th lowest / 2nd highest run result. This sanity check can be done for any of the commands, since the underlying percentile calculation logic is the same in all of them.

@@ -24,10 +24,14 @@ This command parses the median values for given metrics out of the WebPageTest r

By default, only the median values are returned. You can optionally request all the individual run values as well.

#### Required arguments
#### Arguments
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting this simply because not all arguments are required.

@felixarntz felixarntz changed the base branch from add/benchmark-web-vitals to main March 1, 2023 16:04
@felixarntz felixarntz marked this pull request as ready for review March 2, 2023 20:57
@@ -105,7 +123,8 @@ Sends the selected number of requests with a certain concurrency to provided URL
* `--concurrency` (`-c`): Number of requests to make at the same time.
* `--number` (`-n`): Total number of requests to send.
* `--file` (`-f`): File with URLs (one URL per line) to run benchmark tests for.
* `--output` (`-o`): The output format.
* `--output` (`-o`): The output format: Either "table" or "csv".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixarntz under all other "Arguments" headings I've noticed the usage of --format (-f) for the output format, but here it's defined as --output (-o).

Is this correct? It may be correct in code (and function as --output) but should we be interchanging flags?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@10upsimon I'm aware of this, this happened due to different people working on the commands and names diverging as such :)

I agree we should address this, but thinking better in a separate PR. Please feel free to open one!

@@ -133,7 +157,8 @@ Loads the provided URLs in a headless browser several times to measure median We
* `--url` (`-u`): A URL to benchmark.
* `--number` (`-n`): Total number of requests to send.
* `--file` (`-f`): File with URLs (one URL per line) to run benchmark tests for.
* `--output` (`-o`): The output format.
* `--output` (`-o`): The output format: Either "table" or "csv".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixarntz same comment here WRT --output as opposed to --format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@10upsimon See my reply above.

@10upsimon
Copy link
Collaborator

@felixarntz code is good for me, and I've functionally tested and all of the commands and percentile/non percentile applications work as expected.

So really just looking for feedback on whether we need to alter --output to be inline with the --format usage as it is for the wpt-metrics and wpt-server-timing commands.

This applies to the benchmark-server-timing and benchmark-web-vitals commands.

Aside from that, good to go from my end thank you!

@felixarntz
Copy link
Collaborator Author

@mukeshpanchal27 Could you give this a review some time this week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants