Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify on crawl failures or when a host is marked in active #246

Open
jasontibbitts opened this issue Apr 25, 2018 · 3 comments
Open

Notify on crawl failures or when a host is marked in active #246

jasontibbitts opened this issue Apr 25, 2018 · 3 comments

Comments

@jasontibbitts
Copy link

Occasionally one of my hosts is offlined due to a crawl failure. So far it hasn't been the fault of the host; the crawl completes pretty quickly but somehow the processing of the crawl data doesn't complete in a sufficient amount of time. It's really not so bad that this happens occasionally, and there's certainly a possibility that sometimes the fault could be on my end.

The problem is that I don't know about it until I notice that traffic has dropped off. As far as I can tell there's no notice sent anywhere when something goes wrong.

Judging from the amount of things that can send to fedmsg, it must be pretty simple to add support for putting events on the bus, and that would probably be completely sufficient to let people be notified. You have to have a Fedora account to configure a MM site so I don't really see much in the way of downside unless the crawlers are so restricted that they can't get to the fedmsg servers.

If I can get some hints about where in the code messages could be emitted, I can try to learn how to use the fedmsg libraries and have a go at implementing this.

@adrianreber
Copy link
Member

#208 is the same idea. I think email makes more sense as most mirror admins do not listen to the messages bus. Besides that you need to look at the crawler. The crawler already sends messages to the message bus which might include what you want:

https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.mirrormanager.crawler.complete

If you look for example at this:

https://apps.fedoraproject.org/datagrepper/id?id=2018-71be711e-1b06-40a1-8e83-93c654481f95&is_raw=true&size=extra-large

It says 3 mirrors failed and you can see the return code of those crawls. Looking at the crawler code a repeated return code of '2' would mean auto disabled. You could send a special message or return code if a mirror actually has been auto-disabled.

@jasontibbitts
Copy link
Author

Mirror admins wouldn't need to listen to the message bus; the standard Fedora notification service will generate the necessary emails.

The problem with the crawler notifications is that they don't include any user information, so the notification service has no way to take those messages and generate a notification to the mirror admin. Basically the bus should get individual messages per crawled host which include the account info for the mirror admin. And of course a mirror being disabled should put a different message on the bus that can be filtered separately.

@adrianreber
Copy link
Member

Sounds like a good plan. You can put all those changes in the crawler. That is where all those information is gathered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants