-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telemetry: Push telemetry event to OpenSearch on pipeline startup #74
Conversation
Codecov Report
@@ Coverage Diff @@
## master #74 +/- ##
==========================================
+ Coverage 67.66% 70.30% +2.64%
==========================================
Files 32 37 +5
Lines 1976 2512 +536
==========================================
+ Hits 1337 1766 +429
- Misses 570 651 +81
- Partials 69 95 +26
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
conduit/data/block_export_data.go
Outdated
@@ -16,6 +18,8 @@ type InitProvider interface { | |||
GetGenesis() *sdk.Genesis | |||
SetGenesis(*sdk.Genesis) | |||
NextDBRound() sdk.Round | |||
GetTelemetryClient() *telemetry.OpenSearchClient | |||
SetTelemetryClient(*telemetry.OpenSearchClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Set
can be a property of the constructor / implementation rather than included as part of the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, removed from interface in b4acb9a
} | ||
|
||
// OpenSearchClient holds the OpenSearch client and TelemetryConfig | ||
type OpenSearchClient struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! It already is wrapped. Could you rename this to TelemetryClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but the current lint configurations output an error because of stuttering:
exported: type name will be used as telemetry.TelemetryClient by other packages, and that stutters; consider calling this Client (revive)
type TelemetryClient struct {
Should I disable this check and rename to TelemetryClient
anyway? I'm okay either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget about this. telemetry.Client
seems good too, how about just Client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget about this. telemetry.Client seems good too, how about just Client?
The interface is named Client
so there would be a collision unfortunately :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case outside of this package you can reference the interface instead of telemetry.OpenSearchClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Will do this
telemetryConfig := telemetry.MakeTelemetryConfig(p.cfg.Telemetry.URI, p.cfg.Telemetry.Index, p.cfg.Telemetry.UserName, p.cfg.Telemetry.Password) | ||
telemetryClient, err := telemetry.MakeOpenSearchClient(telemetryConfig) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to initialize telemetry: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the events we add in the future, an implementation that puts them in the log file could be helpful in this case.
conduit/telemetry/README.md
Outdated
### Configurations | ||
The conduit.yml will have a new telemetry boolean. If true, `telemetry.Config` (which hold the OpenSearch URI, hardcoded credentials, GUID, Index name) will be initialized. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file. | ||
The index name in algod is set to the Chain ID (e.g. mainnet and testnet). In conduit, we consolidate to a common index name for now. | ||
|
||
### Client | ||
The configuration and client is represented by the `telemetry.Client` interface and will use the `OpenSearchClient` by default. Users should be able to define their own clients to send telemetry events. This will also have client functions to write to OpenSearch using the official Go client. There is no plan to use a particular logging library as of now, and just marshalls the `telemetry.Event` struct into JSON before sending the event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this seems out of date. Good point about the index name, maybe that should be an optional field in the config as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops you're right, I will update this file
} | ||
|
||
// Event represents a single event to be emitted to OpenSearch | ||
type Event struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the version string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added conduit LongVersion() string to the event and fixed existing tests: 234ca36
# These configs define default values for initializing the telemetry server. | ||
uri: "" | ||
index: "" | ||
username: "" | ||
password: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# These configs define default values for initializing the telemetry server. | |
uri: "" | |
index: "" | |
username: "" | |
password: "" | |
# By default the following fields will be configured to send data to Algorand. | |
# To store your own telemetry events, they can be overridden. | |
# uri: "" | |
# index: "" | |
# username: "" | |
# password: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 96a7c9c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, just had a few pieces of feedback and requests.
@@ -16,6 +18,7 @@ type InitProvider interface { | |||
GetGenesis() *sdk.Genesis | |||
SetGenesis(*sdk.Genesis) | |||
NextDBRound() sdk.Round | |||
GetTelemetryClient() telemetry.Client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other nice thing about this approach is that you can create a test client for unit testing the event.
conduit/pipeline/pipeline.go
Outdated
} | ||
// Try sending a startup event. If it fails, log a warning and continue | ||
event := telemetryClient.MakeTelemetryStartupEvent() | ||
if telemetryErr = telemetryClient.SendEvent(event); telemetryErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, without config modifications, we'll always log a warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline tries to initialize the client based on the config file. Since the default config file doesn't specify a URI, we don't send an event for now. We can change the default config.yml when we have an OpenSearch production server ready or alternatively hardcode the URI/credentials in like we do in algod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the Telemetry initialization failed
warning, or failed to send telemetry event
?
Let's default Enabled
to false, so this block wouldn't be hit anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the Telemetry initialization failed warning, or failed to send telemetry event?
Currently you'll get both warnings - it might be too redundant so I'll change it to send one or the other.
Let's default Enabled to false, so this block wouldn't be hit anyway.
Good point, will do 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to come up with a design that lets us gracefully disable telemetry. Keep in mind that plugins are going to be grabbing the telemetryClient object and they wont be able to check for Enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to come up with a design that lets us gracefully disable telemetry. Keep in mind that plugins are going to be grabbing the telemetryClient object and they wont be able to check for Enabled.
The telemetryClient object has its own Enabled
bool: https://github.com/algorand/conduit/pull/74/files#diff-ada99fa4b32a319d368310f322561a3e2292fefcd48533e763dff3069fe64cf0R11
This reads the config file upon pipeline initialization, checks if the config file has set the telemetry enabled bool to true, and initializes the client object (which then sets the client's bool to true). I think the current implementation lets us dynamically disable telemetry on the client as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Now we have the ability to start assessing what's happening in the wild. We ought to follow up with a couple of more SendEvents()
for some crucial information such as succeeding/failing to catchup.
I left a couple of questions and nits, but nothing blocking.
conduit/telemetry/README.md
Outdated
Telemetry related utilities are in this directory. | ||
|
||
### Configurations | ||
The conduit.yml will have a new telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. If true, `telemetry.Config` will be initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conduit.yml will have a new telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. If true, `telemetry.Config` will be initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file. | |
The `conduit.yml` config file includes a telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. When true, `telemetry.Config` is initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If the GUID already exists, it will be read from the file. |
Suggesting more present tense since the feature is present when the README is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed in f32193f
return Config{ | ||
Enable: true, | ||
URI: telemetryURI, | ||
GUID: uuid.New().String(), // Use Google UUID instead of go-algorand utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like an uncontroversial decision to me, but the comment got me curious about the pros/cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - go-algorand
has a custom uuid implementation for some legacy reasons. It seems extraneous at this point, only used in the go-algorand telemetry package, and the Doogle UUID is functionally same with no maintenance overhead but does introduce an extra dependency.
Summary
Sends telemetry events with a GUID upon pipeline startup. Adds a telemetry config to
conduit.yml
and saves GUID to the pipeline metadata file. Telemetry files are intelemetry/
and added a README for a quick summary.More context & research in this gist: https://gist.github.com/algochoi/5fe7381100f2af8cf8c6b0e828be1e56
#51
Test Plan
Unit tests for telemetry configs during pipeline initialization.
Tested on local OpenSearch cluster and Algorand dev cluster. See the gist above on how to replicate this.