Sharing and distributing log files/build metadata? #441

thoughtpolice · 2023-10-09T14:19:35Z

buck2 log is pretty cool, since it lets you look at what build commands a user ran and what their output is, and you can do things like log cmd or log replay to watch the build. Is it possible or recommended to 'share' these log files? What would some infrastructure for doing that look like? Maybe something like:

Users run buck2 build ... a bunch
Their logs can get synchronized to something like an HTTP endpoint (maybe a proxy that just fronts an S3 bucket)
Something goes wrong, a user asks for help, posts their Build ID.
You could run buck2 log cmd --trace-id $BUILD_ID to find out what happened.
This would (transparently?) download the log files from some endpoint (somehow?)

There could be other useful things to derive from this possibly, assuming you had these logs. For example, you could use these logs as a direct source of build analytic information from users to derive information about build times, etc.

Is there any kind of "thing" like this inside Meta? Does this seem interesting? To be truly useful it would need some support in the core executable, I think. It seems like something kind of like what buck2 rage does, right? Except I'd want it in all cases, not just failures.

I imagine this could look like the following for OSS users:

A .buckconfig key like buck2_logs.upload_address is set to https://buck2-logs.aseipp.dev/
All .pb.zst files are POST'd to $UPLOAD_ADDRESS/upload with their trace ID as a primary key, asynchronously, by the daemon
When a user says buck2 log cmd --trace-id $TRACE_ID, then:
- HTTP GET $UPLOAD_ADDRESS/trace/$TRACE_ID
- This should return a json object containing metadata and a new address to find the log content at
- That new address is the public URL to download the .pb.zst file
- This allows users to materialize logs on demand
Something something something authorization with a bearer token
Ideally the HTTP protocol is simple enough to implement "by hand" in a short afternoon

The text was updated successfully, but these errors were encountered:

thoughtpolice · 2023-10-09T14:37:25Z

Actually, if --trace-id could take an https:// URL and download a file, you could probably do the first part completely outside of the core executable with a file watching tool? Just watch buck-out/v2/log and upload every file that gets written? Is that feasible and perhaps less invasive?

cjhopman · 2023-10-10T23:37:37Z

Yeah, I've been really happy with the entire suite of buck2 log commands.

Is there any kind of "thing" like this inside Meta? Does this seem interesting?

Yes, there is. You can see some stuff related to log uploading here:

buck2/app/buck2_client_ctx/src/manifold.rs

Line 131 in 062f014

fn log_upload_url(use_vpnless: bool) -> Option<&'static str> {

and then all the log commands have support for fetching the trace-id from where they are stored around here: https://github.com/facebook/buck2/blob/062f014ddfeddbef12542b3e233e2f609eacca9a/app/buck2_client/src/commands/log/options.rs#L90C1-L112

This has been incredibly useful for understanding and providing support for user builds.

I think it would be great if we could figure out the appropriate API for supporting this for open source users as well. I think it probably should be considered together with #226. The answer there may be that it's actually best to have separate apis for them (you'll note that internally we are doing them basically entirely separately).

One important thing we've found is that we want to upload the log incrementally as the build progresses to ensure that we get logs even on failed things (particularly our CI may aggressively kill off timing out or ooming things that makes it hard to reliably capture on failures).

When doing a command like `buck2 log replay --trace-id ...` it helps to be able to download previously created logs from users, automated CI, etc. This functionality exists within Meta, as logs are available for download through the "Manifest" system, but it isn't usable in OSS, despite being useful for diagnostics and getting help. The only missing thing to really get things working is a download mechanism for log files, and a way to know where they should come from. With this patch, if a user configures the `buck2.log_url` key, which is expected to look something like: [buck2] log_url = https://example.com Then, upon executing a `log` command with a `--trace-id` flag, this server will be queried with a: GET /v1/get/{uuid} request, which is expected to return the raw zst-encoded protobuf file. So, buck2 will do that and download the trace, and then use it like normal. The request is done with `curl`, though in principle it could be done within Buck itself. This shares as much code as possible with the existing infrastructure and tries to only insert a small key set of `#[cfg(fbcode_build)]` directives. Notably, the path Meta uses for their "Manifold" client is still fully available in the OSS version; `fbcode_build` is only used to gate what the default choice is. How the server gets access to these files and how they are uploaded is another question. But for now, this can be done several ways outside of buck2 core, so this is good enough. GitHub Issue: facebook#441 Signed-off-by: Austin Seipp <aseipp@pobox.com>

Summary: When doing a command like `buck2 log replay --trace-id ...` it helps to be able to download previously created logs from users, automated CI, etc. This functionality exists within Meta, as logs are available for download through the "Manifest" system, but it isn't usable in OSS, despite being useful for diagnostics and getting help. The only missing thing to really get things working is a download mechanism for log files, and a way to know where they should come from. With this patch, if a user configures the `buck2.log_url` key, which is expected to look something like: [buck2] log_url = https://example.com Then, upon executing a `log` command with a `--trace-id` flag, this server will be queried with a: GET /v1/get/{uuid} request, which is expected to return the raw zst-encoded protobuf file. So, buck2 will do that and download the trace, and then use it like normal. The request is done with `curl`, though in principle it could be done within Buck itself. This shares as much code as possible with the existing infrastructure and tries to only insert a small key set of `#[cfg(fbcode_build)]` directives. Notably, the path Meta uses for their "Manifold" client is still fully available in the OSS version; `fbcode_build` is only used to gate what the default choice is. How the server gets access to these files and how they are uploaded is another question. But for now, this can be done several ways outside of buck2 core, so this is good enough. GitHub Issue: #441 Pull Request resolved: #770 Reviewed By: cjhopman Differential Revision: D66988949 fbshipit-source-id: ac3dd60f429742997cab231579a7d488e048828b

When doing a command like `buck2 log replay --trace-id ...` it helps to be able to download previously created logs from users, automated CI, etc. This functionality exists within Meta, as logs are available for download through the "Manifest" system, but it isn't usable in OSS, despite being useful for diagnostics and getting help. The only missing thing to really get things working is a download mechanism for log files, and a way to know where they should come from. With this patch, if a user configures the `buck2.log_url` key, which is expected to look something like: [buck2] log_url = https://example.com Then, upon executing a `log` command with a `--trace-id` flag, this server will be queried with a: GET /v1/get/{uuid} request, which is expected to return the raw zst-encoded protobuf file. So, buck2 will do that and download the trace, and then use it like normal. The request is done with `curl`, though in principle it could be done within Buck itself. This shares as much code as possible with the existing infrastructure and tries to only insert a small key set of `#[cfg(fbcode_build)]` directives. Notably, the path Meta uses for their "Manifold" client is still fully available in the OSS version; `fbcode_build` is only used to gate what the default choice is. How the server gets access to these files and how they are uploaded is another question. But for now, this can be done several ways outside of buck2 core, so this is good enough. GitHub Issue: facebook#441 Signed-off-by: Austin Seipp <aseipp@pobox.com>

thoughtpolice mentioned this issue Sep 5, 2024

buck2_client: download traces via buck2.log_url config key #770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing and distributing log files/build metadata? #441

Sharing and distributing log files/build metadata? #441

thoughtpolice commented Oct 9, 2023

thoughtpolice commented Oct 9, 2023 •

edited

Loading

cjhopman commented Oct 10, 2023

Sharing and distributing log files/build metadata? #441

Sharing and distributing log files/build metadata? #441

Comments

thoughtpolice commented Oct 9, 2023

thoughtpolice commented Oct 9, 2023 • edited Loading

cjhopman commented Oct 10, 2023

thoughtpolice commented Oct 9, 2023 •

edited

Loading