Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snmp Trap service #929

Merged
merged 3 commits into from
Jan 9, 2017
Merged

Snmp Trap service #929

merged 3 commits into from
Jan 9, 2017

Conversation

titilambert
Copy link
Contributor

@titilambert titilambert commented Sep 20, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

@titilambert titilambert mentioned this pull request Sep 20, 2016
@titilambert titilambert changed the title [WIP] Snmp Trap service Snmp Trap service Sep 21, 2016
@titilambert
Copy link
Contributor Author

@nathanielc ready for first review

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR is organized well, I just have a few questions about the workflow and structure of traps. See my comments in-line.

// stream
// |alert()
// .snmp()
// .trap('1.3.6.1.2.1.1.7', 'i', {{ Index field value }})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example isn't valid template syntax. Can you update it to be accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed !


// List of email recipients.
// tick:ignore
TrapList [][]interface{} `tick:"Trap"`
Copy link
Contributor

@nathanielc nathanielc Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What information does a trap need?

Based on the configuration and the docs, it looks like traps have a these fields:

  • Target IP
  • Target Port
  • OID
  • Version
  • Community

Anything else?

Maybe changing the syntax up a bit would make it more clear:

|alert()
    .snmpTrap()
       .ip('x.x.x.x')
       .port(9863)
       .oid('1.3.6.1.2.1.1.7')
       .version('v3')
       .community('asdf')
    .snmpTrap()
       .ip('x.x.x.x')
       .port(2656)
       .oid('1.3.6.1.2.1.1.7')
       .version('v2')
    .snmpTrap()
       // defaults to config values if not specified
       .oid('1.3.6.1.2.1.1.7')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last comment

//
// Passing addresses to the `email` property directly or using the `email.to` property is the same.
// tick:property
func (h *SnmpHandler) Trap(trap ...interface{}) *SnmpHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated to be SNMP related, its still the email comments.

Version string `toml:"version"`
// Whether all alerts should automatically use stateChangesOnly mode.
// Only applies if global is also set.
StateChangesOnly bool `toml:"state-changes-only"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the Global option like other handlers have? Or does that not work well in the context of SNMP alerts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this could be global. You need to add data to your snmp trap


var varBinds snmpgo.VarBinds
for _, trap := range traps {
fmt.Printf("\n\n\nREERRRR %s", trap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this debug logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done !

func (s *Service) Alert(traps [][]interface{}, level kapacitor.AlertLevel) error {
address := s.targetIp + strconv.Itoa(s.targetPort)
snmp, err := snmpgo.NewSNMP(snmpgo.SNMPArguments{
Version: snmpgo.V2c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a version option in the config if it is being hardcoded here?

@@ -82,6 +82,10 @@ type TaskMaster struct {
StateChangesOnly() bool
Alert(channel, message string, level AlertLevel) error
}
SnmpService interface {
StateChangesOnly() bool
Alert(trap [][]interface{}, level AlertLevel) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do snmp traps support a message concept? Meaning can they contain a short string of some short or do they only support the alert level? Also how do you know the context of the alert? For example, say you are triggering alerts on CPU usage per host, when the NMS receives the alert how does it know which host triggered the alert and that it is a CPU alert?

I am only marginally familiar with SNMP so if I am missing something obvious please enlighten me. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last comment

@titilambert
Copy link
Contributor Author

titilambert commented Sep 22, 2016

@nathanielc With all your comments, I understood that this service was not clear. So I renamed it to snmptrap.
In fact, a SNMPtrap contains at leat one data. You can add data to your trap to add more context.
The following example should be more clear:

    |alert()
        .id('kapacitor/{{ index .Tags "host"}}/{{ .Name }}')
        .message('{{ .Level}}: {{ .ID }} - memory: {{ index .Fields "used_percent" }}%')
        .warn(lambda: "used_percent" > 50 )
        .crit(lambda: "used_percent" > 95 )
        .snmpTrap()
          .data('1.3.6.1.4.1.1.1', 's', '{{ .index .Tags "host" }}' )
          .data('1.3.6.1.4.1.1.2', 'i', '{{if eq .Level "OK"}}0{{else}}{{if eq .Level "WARNING"}}44443{{else}}{{if eq .Level "CRITICAL"}}5{{end}}{{end}}{{end}}')
          .data('1.3.6.1.4.1.1.3', 's', '{{ .Message }}' )
          .data('1.3.6.1.4.1.1.5', 's', '{{ .Level }}' )
          .data('1.3.6.1.4.1.1.6', 's', '50' )
          .data('1.3.6.1.4.1.1.7', 's', '{{ index .Fields "used_percent" }}' )

The NMS will receive the trap and process it as ONE message with several fields. One field for one oid.

I think those new names is more clear than the previous one.

@titilambert
Copy link
Contributor Author

@nathanielc any news on that one ?

@nathanielc
Copy link
Contributor

@titilambert

OK so One field for one oid. this makes sense now. But what about all the settings in the config file like ip version etc. How can they be overwritten by the TICKscript?

@titilambert
Copy link
Contributor Author

titilambert commented Sep 27, 2016

@nathanielc I didn't do that because the community is like a password, so I thought it's better to not expose it in TICK scripts
What do you think about it ?

@titilambert
Copy link
Contributor Author

I found a small issue in the trap sent, I will send an update today

@titilambert
Copy link
Contributor Author

titilambert commented Sep 28, 2016

@nathanielc Issue fix with last commit: 28f0c16

@titilambert
Copy link
Contributor Author

@nathanielc this PR is ready

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests in the integrations package?

You will find tests in there for the other alert handlers.

How can I test this locally? Is there a simple SNMP server I can stand up to test this works as expected?

// .data('1.3.6.1.4.1..1.7', 's', '{{ index .Fields "used_percent" }}' )
//
// tick:property
func (h *SnmpTrapHandler) Data(data ...interface{}) *SnmpTrapHandler {
Copy link
Contributor

@nathanielc nathanielc Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these arguments always strings? And are there always three of them?

If so should the signature change to Data(oid, rawType, value string)? Not sure what the names should be...

return errors.New("Version 3 not supported yet")
default:
return errors.New("Bad snmp version should be: '1', '2c' or '3'")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't add value to add a version config option and then only support a single version. How hard is it to support the other versions? From what I saw it looked like the snmp package has support for various versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I can't test snmp v3 :/

//
// tick:property
func (h *SnmpTrapHandler) Data(data ...interface{}) *SnmpTrapHandler {
// TODO check element validity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a validate method to the SnmpTrapHandler, and call if from the AlertNode validate method.

TrapOid string
// List of trap data.
// tick:ignore
DataList [][]interface{} `tick:"Data"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the structure of the data? Is there a reason we need to use an interface{} or can this be converted to a list of structs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello ! in fact, I'm just trying to get [][]string but I don't know how to get that
from alert.go line 250

                tpattrtmpl, err := text.New("trap").Parse(attr.(string))

How can I get string from this stuff ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanielc in fact, I think that the struct could be [][3]string
Example:

|alert()
    .snmpTrap('MAINOID')
    .data('SUBOID1', 's', '{{ .Message }}')
    .data('SUBOID2', 'i', '{{if eq .Level "OK"}}0{{else}}{{if eq .Level "WARNING"}}3{{end}}{{end}}')
    .data('SubOid3', 's', '{{ .Name }}')

@titilambert
Copy link
Contributor Author

Hello @nathanielc !
I have now time to finish this PR !

@titilambert
Copy link
Contributor Author

@nathanielc I just added tests

@nathanielc
Copy link
Contributor

@titilambert Thanks! I am rounding the bend on a large refactor for the alert handlers system, ( I hope to have the majority of the work done by tomorrow 🙏 ). As a result I hope to take a look at this next week. It will take some work to refactor it a bit for the changes I am making , if you have time next week as well I can point you in the right direction, otherwise I can make the changes later.

@djsly
Copy link

djsly commented Dec 16, 2016

@nathanielc thanks for keeping a priority in this PR. Lets us know when the refactoring will be completed asap. we would absolutely need this in 1.1.2 if possible since we are also waiting for some bugs fixes from this release.

@nathanielc
Copy link
Contributor

@djsly @titilambert The Alert system work is basically complete at this point. I am going to start looking into this PR again. Can you give me push access to your fork so that I can make changes? See https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/

@titilambert
Copy link
Contributor Author

@nathanielc hello, Thanks for your reply !
I just clicked on "Allow edits from maintainers."
Let me know if you can edit the PR.

@nathanielc
Copy link
Contributor

@titilambert Thanks! Looks like it is working, I was able to rebase and push to your snmp branch.

@nathanielc
Copy link
Contributor

@titilambert @djsly I have most things migrated over to the new alerting system but I have a few questions for things that did not make sense to me as I dug around in the code.

  1. The snmptrap.Service does not do anything with the alert Level. What should it do? Maybe create a string var bind with a definable Oid?
  2. Why are we hardcoding the TimeTicks 1000 var bind for every trap?

For now I removed the logic around configuring the version since only one version is supported. When we need/add support for other versions we can add it back in. Once I am out of the weeds I'll push up my changes so you can take a look.

@nathanielc
Copy link
Contributor

@titilambert @djsly OK I have tests passing most of the time ;), but the bad news is the github.com/k-sone/snmpg package is giving me some issues. I know that you had originally used a different package and then switched to this one. Can you provide some details as to why you chose this package?

The issues I am running into only effect testing so they are not super critical but I we need to have stable tests.

I can consistently get the TrapServer to panic when generating a request ID. Also the TrapServer type doesn't have a graceful shutdown (granted that is difficult to do with UDP) but the shutdown process is racy and causes data races like:

WARNING: DATA RACE
Write at 0x00c420148b90 by goroutine 7:
  github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo.(*packetTransport).Close()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo/transport.go:76+0xe4
  github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo.(*TrapServer).Close()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo/server.go:229 +0x7f
  github.com/influxdata/kapacitor/services/snmptrap/snmptraptest.(*Server).Close()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/services/snmptrap/snmptraptest/snmptraptest.go:59 +0x107
  github.com/influxdata/kapacitor/integrations.TestStream_AlertSNMPTrap()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/integrations/streamer_test.go:7587 +0x6b6
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:610 +0xc9

Previous read at 0x00c420148b90 by goroutine 8:
  github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo.(*packetTransport).Listen()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo/transport.go:26+0x3f
  github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo.(*TrapServer).Serve()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/vendor/github.com/k-sone/snmpgo/server.go:197 +0x132
  github.com/influxdata/kapacitor/services/snmptrap/snmptraptest.NewServer.func1()
      /home/nathanielc/projects/kapacitor/src/github.com/influxdata/kapacitor/services/snmptrap/snmptraptest/snmptraptest.go:41 +0x8f

At this point we have a few options:

  1. Submit PRs to the k-sone/snmpgo package
  2. Choose a different package

Again since I know you played around with other packages too I would like your input before moving forward. Thanks!

@nathanielc
Copy link
Contributor

I took a look around and the k-sone/snmpgo package seems to be the best so I put together a PR to fix the races in the server logic. See k-sone/snmpgo#13

I also protected access to the snmp client since it is not designed for concurrent use either.

At this point tests are passing locally without races.

@djsly
Copy link

djsly commented Jan 8, 2017

@nathanielc thanks for this work on the PR! Would there be anything you would want @titilambert or myself to look at to complete this merge process ?

@nathanielc
Copy link
Contributor

nathanielc commented Jan 9, 2017

@djsly @titilambert

There are two things I would like you to look at before I merge:

  1. Look at the examples in the tests and make sure they make sense (aka aren't misleading or confusing to a new user). While I think I understand how SNMP traps work now, I don't have all the context of how they are typically used, so if one of the examples is just odd I'd like to fix it.
  2. Test it out locally, this is the most important. I have refactored the way it works a little bit so I want to make sure it still fits your use case.

And one question:

  • Is the community value considered secret? Meaning, its basically the auth key for V2 traps, so should we treat it like a password?

EDIT: I read back through all the comments on this PR and found:

I didn't do that because the community is like a password, so I thought it's better to not expose it in TICK scripts

So I'll make the change to treat it like a password.

Other than that the PR to the snmpgo repo was merged over the weekend so we have no blockers.

@@ -210,6 +210,17 @@ default-retention-policy = ""
# meaning alerts will only be sent if the alert state changes.
state-changes-only = false

[snmp]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[snmptrap]

// Example:
// [snmptrap]
// enabled = true
// host = "127.0.0.1:9162"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addr = "127.0.0.1:9162"

// Example:
// stream
// |alert()
// .snmpTrap()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You HAVE to put an OID in the snmpTrap Node

@titilambert
Copy link
Contributor Author

@nathanielc Big Thanks !
I just tested it and this is working !
I made some comment about example and docs...

@nathanielc
Copy link
Contributor

@titilambert Thanks for the review! I'll merge on green

…com/k-sone/snmpgo

subrepo:
  subdir:   "vendor/github.com/k-sone/snmpgo"
  merged:   "1cd4bf6f9"
upstream:
  origin:   "https://github.com/k-sone/snmpgo.git"
  branch:   "master"
  commit:   "1cd4bf6f9"
git-subrepo:
  version:  "0.3.0"
  origin:   "???"
  commit:   "???"
@nathanielc nathanielc merged commit 14d9ee1 into influxdata:master Jan 9, 2017
nathanielc added a commit that referenced this pull request Jan 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants