Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector offline #163

Merged
merged 52 commits into from
Feb 28, 2024
Merged

Collector offline #163

merged 52 commits into from
Feb 28, 2024

Conversation

jt-dd
Copy link
Contributor

@jt-dd jt-dd commented Feb 5, 2024

This PR add a function to add an offline collector. It allows to dump locally or directly on a s3 bucket.

  • local dump (compressed): ./kubehound dump local --compress --output-dir /tmp/kubehound --debug
  • local dump (flat): ./kubehound dump local --output-dir /tmp/kubehound --debug
  • s3 dump: ./kubehound collector s3 --bucket kubehound --region us-east-1

Copy link
Contributor

@d0g0x01 d0g0x01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to rework the package structure and add the glue code that calls these new classes so we can understand how the pieces fit together as no 100% clear at the moment

This is definitely not a collector client as we've described it so should be separated very clearly

CacheKeyTag = "cache_key"
EdgeTypeTag = "edge_type"
CollectorTag = "collector"
CollectorS3Bucket = "s3_bucket"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should have the corrersponding function (like the other tags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not using it, those are for span tags not metrics. Do you still want the function to be created ?

CollectorReadFile = "kubehound.collector.readFile"
CollectorS3Push = "kubehound.collector.s3_push"
CollectorS3Download = "kubehound.collector.s3_download"
CollectorFileWriterWrite = "kubehound.collector.file_writer.write"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe CollectorWriterXXX with tags for S3/tar etc would be more extensible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not go in this direction, because I wanted to capture the network latency with the s3 push and download which does not exist with the file writer (only local).

@@ -203,3 +203,37 @@ func Launch(ctx context.Context, opts ...LaunchOption) error {

return nil
}

func Bootstrap(ctx context.Context, cfg *config.KubehoundConfig) (string, context.Context, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not called FYI ( I know its a draft)

@@ -127,6 +137,19 @@ func NewConfig(configPath string) (*KubehoundConfig, error) {
return &kc, nil
}

// NewConfig creates a new config instance from the provided file using viper.
func NewInlineConfig() (*KubehoundConfig, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why inline? not sure I understand the naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline as for commands provided by an inline command and not a config file

Write(context.Context, []byte, string) error
Flush(context.Context) error
Close(context.Context) error
GetOutputPath() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputPath() is more go :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,247 @@
package collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to go in a different package (and you need to consider naming) as currently quite confusing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pkg/offline ? with the file writer pieces

@d0g0x01
Copy link
Contributor

d0g0x01 commented Feb 12, 2024

This PR add a function to add an offline collector. It allows to dump locally or directly on a s3 bucket.

  • local dump (compressed): ./kubehound dump local --compress --output-dir /tmp/kubehound --debug
  • local dump (flat): ./kubehound dump local --output-dir /tmp/kubehound --debug
  • s3 dump: ./kubehound collector s3 --bucket kubehound --region us-east-1

I think this should be a separate binary. If you consider the KHaaS use case we just want a small binary to collect data and upload to s3 so no need to ship all the kubehound stuff like mongo/janus clients etc to do that

pkg/collector/k8s_api.go Outdated Show resolved Hide resolved
pkg/collector/k8s_api.go Outdated Show resolved Hide resolved
pkg/collector/k8s_api.go Show resolved Hide resolved

raw, err := kubeConfig.RawConfig()
// Generate metrics for k8sAPI collector
func (c *k8sAPICollector) computeMetrics(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like something that should be done in DD based on the CollectorWait metric vs the code here to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could not get to have relevant metric from Datadog directly to work

pkg/collector/k8s_api.go Outdated Show resolved Hide resolved
pkg/dumper/s3/s3.go Outdated Show resolved Hide resolved
pkg/dumper/s3/s3.go Outdated Show resolved Hide resolved
pkg/dumper/s3/s3.go Outdated Show resolved Hide resolved
pkg/dumper/writer/file_writer.go Outdated Show resolved Hide resolved
pkg/dumper/writer/tar_writer.go Outdated Show resolved Hide resolved
pkg/dump/ingestor_test.go Outdated Show resolved Hide resolved
pkg/dump/pipeline.go Outdated Show resolved Hide resolved
@jt-dd jt-dd marked this pull request as ready for review February 27, 2024 14:46
@jt-dd jt-dd requested a review from a team as a code owner February 27, 2024 14:46
@jt-dd jt-dd requested a review from d0g0x01 February 27, 2024 14:46
Copy link
Contributor

@d0g0x01 d0g0x01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jt-dd jt-dd merged commit d2a75c8 into main Feb 28, 2024
7 checks passed
@jt-dd jt-dd deleted the jt-dd/collector-dump-local-s3 branch February 28, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants