Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: Push telemetry event to OpenSearch on pipeline startup #74

Merged
merged 30 commits into from
Jun 16, 2023

Conversation

algochoi
Copy link
Contributor

@algochoi algochoi commented May 15, 2023

Summary

Sends telemetry events with a GUID upon pipeline startup. Adds a telemetry config to conduit.yml and saves GUID to the pipeline metadata file. Telemetry files are in telemetry/ and added a README for a quick summary.

More context & research in this gist: https://gist.github.com/algochoi/5fe7381100f2af8cf8c6b0e828be1e56

#51

Test Plan

Unit tests for telemetry configs during pipeline initialization.

Tested on local OpenSearch cluster and Algorand dev cluster. See the gist above on how to replicate this.

@codecov
Copy link

codecov bot commented May 15, 2023

Codecov Report

Merging #74 (c337917) into master (442791a) will increase coverage by 2.64%.
The diff coverage is 77.29%.

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
+ Coverage   67.66%   70.30%   +2.64%     
==========================================
  Files          32       37       +5     
  Lines        1976     2512     +536     
==========================================
+ Hits         1337     1766     +429     
- Misses        570      651      +81     
- Partials       69       95      +26     
Impacted Files Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) ⬆️
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) ⬇️
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) ⬆️
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) ⬆️
pkg/cli/internal/list/list.go 20.75% <ø> (ø)
...lugins/exporters/postgresql/postgresql_exporter.go 65.68% <51.21%> (-12.52%) ⬇️
... and 15 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@algochoi algochoi requested a review from winder May 19, 2023 16:48
@@ -16,6 +18,8 @@ type InitProvider interface {
GetGenesis() *sdk.Genesis
SetGenesis(*sdk.Genesis)
NextDBRound() sdk.Round
GetTelemetryClient() *telemetry.OpenSearchClient
SetTelemetryClient(*telemetry.OpenSearchClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Set can be a property of the constructor / implementation rather than included as part of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed from interface in b4acb9a

}

// OpenSearchClient holds the OpenSearch client and TelemetryConfig
type OpenSearchClient struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! It already is wrapped. Could you rename this to TelemetryClient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but the current lint configurations output an error because of stuttering:

exported: type name will be used as telemetry.TelemetryClient by other packages, and that stutters; consider calling this Client (revive)
    type TelemetryClient struct {

Should I disable this check and rename to TelemetryClient anyway? I'm okay either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget about this. telemetry.Client seems good too, how about just Client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget about this. telemetry.Client seems good too, how about just Client?

The interface is named Client so there would be a collision unfortunately :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case outside of this package you can reference the interface instead of telemetry.OpenSearchClient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will do this

telemetryConfig := telemetry.MakeTelemetryConfig(p.cfg.Telemetry.URI, p.cfg.Telemetry.Index, p.cfg.Telemetry.UserName, p.cfg.Telemetry.Password)
telemetryClient, err := telemetry.MakeOpenSearchClient(telemetryConfig)
if err != nil {
return nil, fmt.Errorf("failed to initialize telemetry: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the events we add in the future, an implementation that puts them in the log file could be helpful in this case.

conduit/telemetry/README.md Outdated Show resolved Hide resolved
conduit/telemetry/README.md Outdated Show resolved Hide resolved
Comment on lines 11 to 16
### Configurations
The conduit.yml will have a new telemetry boolean. If true, `telemetry.Config` (which hold the OpenSearch URI, hardcoded credentials, GUID, Index name) will be initialized. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file.
The index name in algod is set to the Chain ID (e.g. mainnet and testnet). In conduit, we consolidate to a common index name for now.

### Client
The configuration and client is represented by the `telemetry.Client` interface and will use the `OpenSearchClient` by default. Users should be able to define their own clients to send telemetry events. This will also have client functions to write to OpenSearch using the official Go client. There is no plan to use a particular logging library as of now, and just marshalls the `telemetry.Event` struct into JSON before sending the event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this seems out of date. Good point about the index name, maybe that should be an optional field in the config as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you're right, I will update this file

}

// Event represents a single event to be emitted to OpenSearch
type Event struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the version string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added conduit LongVersion() string to the event and fixed existing tests: 234ca36

Comment on lines 40 to 44
# These configs define default values for initializing the telemetry server.
uri: ""
index: ""
username: ""
password: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# These configs define default values for initializing the telemetry server.
uri: ""
index: ""
username: ""
password: ""
# By default the following fields will be configured to send data to Algorand.
# To store your own telemetry events, they can be overridden.
# uri: ""
# index: ""
# username: ""
# password: ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 96a7c9c

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, just had a few pieces of feedback and requests.

@@ -16,6 +18,7 @@ type InitProvider interface {
GetGenesis() *sdk.Genesis
SetGenesis(*sdk.Genesis)
NextDBRound() sdk.Round
GetTelemetryClient() telemetry.Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other nice thing about this approach is that you can create a test client for unit testing the event.

@algochoi algochoi marked this pull request as ready for review June 6, 2023 18:06
}
// Try sending a startup event. If it fails, log a warning and continue
event := telemetryClient.MakeTelemetryStartupEvent()
if telemetryErr = telemetryClient.SendEvent(event); telemetryErr != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, without config modifications, we'll always log a warning here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline tries to initialize the client based on the config file. Since the default config file doesn't specify a URI, we don't send an event for now. We can change the default config.yml when we have an OpenSearch production server ready or alternatively hardcode the URI/credentials in like we do in algod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the Telemetry initialization failed warning, or failed to send telemetry event?

Let's default Enabled to false, so this block wouldn't be hit anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the Telemetry initialization failed warning, or failed to send telemetry event?

Currently you'll get both warnings - it might be too redundant so I'll change it to send one or the other.

Let's default Enabled to false, so this block wouldn't be hit anyway.

Good point, will do 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to come up with a design that lets us gracefully disable telemetry. Keep in mind that plugins are going to be grabbing the telemetryClient object and they wont be able to check for Enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to come up with a design that lets us gracefully disable telemetry. Keep in mind that plugins are going to be grabbing the telemetryClient object and they wont be able to check for Enabled.

The telemetryClient object has its own Enabled bool: https://github.com/algorand/conduit/pull/74/files#diff-ada99fa4b32a319d368310f322561a3e2292fefcd48533e763dff3069fe64cf0R11

This reads the config file upon pipeline initialization, checks if the config file has set the telemetry enabled bool to true, and initializes the client object (which then sets the client's bool to true). I think the current implementation lets us dynamically disable telemetry on the client as well.

@algochoi algochoi requested a review from winder June 6, 2023 19:58
Copy link
Contributor

@tzaffi tzaffi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Now we have the ability to start assessing what's happening in the wild. We ought to follow up with a couple of more SendEvents() for some crucial information such as succeeding/failing to catchup.

I left a couple of questions and nits, but nothing blocking.

Telemetry related utilities are in this directory.

### Configurations
The conduit.yml will have a new telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. If true, `telemetry.Config` will be initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The conduit.yml will have a new telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. If true, `telemetry.Config` will be initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If this field already exists, it will be read from the file.
The `conduit.yml` config file includes a telemetry boolean, and optional fields to enter the OpenSearch URI, credentials (username/password), and Index name. When true, `telemetry.Config` is initialized with the optional fields. The GUID is generated when the pipeline is started, and saved to the pipeline's `metadata.json`. If the GUID already exists, it will be read from the file.

Suggesting more present tense since the feature is present when the README is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in f32193f

conduit/telemetry/README.md Outdated Show resolved Hide resolved
conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved
conduit/telemetry/telemetry.go Outdated Show resolved Hide resolved
return Config{
Enable: true,
URI: telemetryURI,
GUID: uuid.New().String(), // Use Google UUID instead of go-algorand utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an uncontroversial decision to me, but the comment got me curious about the pros/cons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - go-algorand has a custom uuid implementation for some legacy reasons. It seems extraneous at this point, only used in the go-algorand telemetry package, and the Doogle UUID is functionally same with no maintenance overhead but does introduce an extra dependency.

go.mod Outdated Show resolved Hide resolved
@algorand algorand deleted a comment from algochoi Jun 16, 2023
@winder winder merged commit a93992d into algorand:master Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants