Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New junos messages for bgp, ospf, and configuration #168

Merged

Conversation

lampwins
Copy link
Contributor

This covers bgp neighbor state changes, ospf state changes, and configuration commit and rollback life cycle.

@coveralls
Copy link

coveralls commented Sep 29, 2017

Coverage Status

Coverage decreased (-0.2%) to 51.072% when pulling 2a3d0fc on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

Copy link
Member

@mirceaulinic mirceaulinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome @lampwins!

messages:
# 'error' should be unique and vendor agnostic. Currently we are using the JUNOS syslog message name as the canonical name.
# This may change if we are able to find a more well defined naming system.
- error: BPDU_BLOCK_INTERFACE_DISSABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here: BPDU_BLOCK_INTERFACE_DISABLED

@@ -0,0 +1 @@
<149>Jun 21 14:03:12 vmx01 rpd[2902]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 1.1.1.1 (External AS 2222) changed state from Active to Connect (event AnEvent) (instance master)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if you could check the actual value the device sends for this PRI: this will be reflected in the expected output from yang.json. Now, the documentation is auto-generated from these mock files, as examples, e.g., http://napalm-logs.readthedocs.io/en/latest/messages/NTP_SERVER_UNREACHABLE.html, so it would be good to have the facility and severity in the examples as close as possible to the real values, so the users know what to expect.

@@ -0,0 +1,15 @@
# This error tag corresponds to syslog messages notifying that the configured
# interface has been dissabled due to a bpdu block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled here again

@@ -0,0 +1,15 @@
# This error tag corresponds to syslog messages notifying that the configured
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the filename of the profile is the value of the error, so it's easier to check where is a feature defined; however, that's not mandatory.

@mirceaulinic
Copy link
Member

Solid work here @lampwins, thank you! Besides my suggestions above, I'd recommend you to double check your git settings and amend the commit author (the email address in particular), so it will link to your github profile.

@lampwins
Copy link
Contributor Author

I have changed the filenames in the profile and fixed the PRI numbers.

@coveralls
Copy link

coveralls commented Sep 29, 2017

Coverage Status

Coverage decreased (-0.2%) to 51.072% when pulling 560090a on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

Copy link
Member

@mirceaulinic mirceaulinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more minor changes please, and some food for thoughts :-)

area: (\d+\.\d+\.\d+\.\d+)
reason_message: ([\w ]+)
line: 'OSPF neighbor {neighbor} (realm ospf-v2 {interface} area {area}) state changed from Full to Down due to KillNbr (event reason: {reason_message})'
model: openconfig-network-instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this previously: I think we should reference the openconfig-opsf, as it's a standalone model (although it's just based on the network instance one). Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, except for why openconfig does this :)

"1.1.1.1": {
"state": {
"peer_as": "2222",
"session-state-change-event": "AnEvent",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a genuine event message (AnEvent)? If not, could you please reference a real example (same documentation reason).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just me being lazy, will fix. Also this relates to my comments on the ospf events.

@@ -0,0 +1,33 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might revisit this structure later (I can't think of a better structure currently though), but I fear it won't be compatible with other platforms. That's just a FYI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this will not last in the long run. Ideally I think it should map to actual openconfig models depending on the error. I can't even begin to imagine what that would look like in a mapping definition though (outside of some complex python version).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion to follow under #175

"neighbor": {
"1.1.1.1": {
"state": {
"adjacency-state-change-reason-message": "dead timer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how diverse is this message and if we could define a typeref as with the others, e.g., DEAD_TIMER (so it will be standard across other platforms). We can leave this as an open discussion for later, but it would be nice if you could build up a list of the messages you've seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just me being lazy with the test and I should have fixed it in the last commit. The reason message is tough is because there are many different messages that relate to the INACTIVE_TIMER event type. For example, whether of not BFD is in play.

This is the same problem in the BGP state changes. As you can see there are already a number mapping required to cover all state transitions in the BGP state machine. IMO it would be very unwieldy to then throw in all the human event messages because there are cases where several messages map to one state change.

I think it would be more feasible down the road if we could define these messages separate from the log mappings. These would allow us to map several different vender messages for an event to some model value while not having to keep up with tens/hundreds of log mappings for a single log type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. My opinion is that we should try to map them as much as possible (if they are not completely random and there's a decent amount of mappings needed).

I think it would be more feasible down the road if we could define these messages separate from the log mappings.

Good suggestion - after this is merged, I will open an issues so we can discuss further and see in time what's the best way to approach this, after we'll have more examples and feedback from the users / developers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this further and build up a list of possible messages: #176

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 51.072% when pulling ad7a3da on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 51.072% when pulling ad7a3da on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

@lampwins
Copy link
Contributor Author

Any idea what's up with the py2.7 travis build? Just curious as I have also been having to exclude it locally.

Copy link
Member

@mirceaulinic mirceaulinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
Would you have any feedback @loverend?

@mirceaulinic
Copy link
Member

@lampwins I don't know either why's py27 failing - I will have a closer somewhere next week, certainly worth investigating. I suspect there was a recent change in one of the underlying dependencies; tox sometimes behaves very weird too, so there can be several possibilities.
But for sure it is not related at all to your changes here (checked locally).

Copy link
Collaborator

@luke-orden luke-orden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, just a couple of suggestions.

neighbor: ([\w\d:\.]+)
interface: ([\w\-\/\.\d]+)
area: (\d+\.\d+\.\d+\.\d+)
reason_message: ([\w ]+)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use special characters in the key name as we escape all special characters. This will cause a key error when parsing.

Copy link
Contributor Author

@lampwins lampwins Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loverend Could you elaborate on this? Do you mean in the regex's or reason_message itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean reason_message, it should be reasonMessage

bgp//neighbors//neighbor//{peer}//state//peer_as: asn
bgp//neighbors//neighbor//{peer}//state//session-state-change-event: event
static:
bgp//neighbors//neighbor//{peer}//state//session-state: IDLE
Copy link
Collaborator

@luke-orden luke-orden Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeat entries can be condensed into one entry by matching the old and new states and then converting the values into the appropriate openconfig names. You can do this by casting to a function. See https://github.com/napalm-automation/napalm-logs/blob/master/napalm_logs/config/junos/SYSTEM_ALARM.yml#L13 for an example.

In this case you could add a function to convert any changes like OpenSent to OPEN_SENT and others you could just return as uppercase, i.e:

def bgp_state_convert(state):
    state_dict = {'OpenSent': 'OPEN_SENT',
                  'OpenConfirm': 'OPEN_CONFIRM'}
    return state_dict.get(state, state.upper())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better than my current approach but IMO this is going to get unwieldy fast once you start to throw other vendors. For example, Juniper calls it OpenSent while cisco calls it OPEN SENT. Obviously these mappings have to be defined somewhere but I don't think hiding them in a function inside the utils package is the best idea in the long run.

Like I said, this is better than what I have for now, but I think it is worth further discussion, along with the vendor event message mappings.

Copy link
Member

@mirceaulinic mirceaulinic Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a functionality directly into the YAML profiles and help with this kind of 1:1 mappings, e.g.,

mapping:
  replace:
    state:
      OpenSent: OPEN_SENT
      OpenConfirm: OPEN_CONFIRM
  variables:
    bgp//neighbors//neighbor//{peer}//state//peer_as: asn
    bgp//neighbors//neighbor//{peer}//state//session-state-change-event: event
    bgp//neighbors//neighbor//{peer}//state//session-state: state

This way we also avoid any function utility, but the mapping is specific to a particular platform only.
Any thoughts on this @loverend @lampwins?

Anyway, for the moment I would say to get this PR in, and investigate this sort of optimisations later (before they'll grow more), but it's very good to have this discussion right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mirceaulinic I think this is headed in the right direction. Ultimately will take some playing around with but I like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even something like:

preprocess:
  state:
    - replace:
        OpenSent: OPEN_SENT
        OpenConfirm: OPEN_CONFIRM
    - upper
    - lower
    - upper
mapping:
  variables:
    bgp//neighbors//neighbor//{peer}//state//peer_as: asn
    bgp//neighbors//neighbor//{peer}//state//session-state-change-event: event
    bgp//neighbors//neighbor//{peer}//state//session-state: state

Just in case we'd even need more extensive preprocessing to some fields (that's maybe just not the best example, but I think you see what I mean: to a certain field you can apply several consecutive operations).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 51.207% when pulling 89764db on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

1 similar comment
@coveralls
Copy link

coveralls commented Oct 3, 2017

Coverage Status

Coverage decreased (-0.03%) to 51.207% when pulling 89764db on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 51.317% when pulling 3d119e4 on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 51.317% when pulling 3d119e4 on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 51.317% when pulling 3d119e4 on lampwins:feature/new-junos-logs into 8967650 on napalm-automation:master.

@mirceaulinic mirceaulinic mentioned this pull request Oct 4, 2017
@mirceaulinic mirceaulinic merged commit 5d6740f into napalm-automation:master Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants