[WBEL-users] Still killing me softly with mon

Jesse j@lumiere.net
Tue, 14 Sep 2004 13:04:42 -0700 (PDT)


Tests will fail from time to time -- pretty much all monitoring devices
encounter this, even commercial load balancers etc with health checks.

I use mon to monitor a few hundred hosts. The key is to use the
'alertafter' option to specify the number of times the test must fail
consecutively before an alarm is issued. I recommend using 2 or 3,
depending on the type of service/test. You'll want to specify upalertafter
as well, as minutes the service must be down before you get an upalert
(otherwise you'll get keeping upalerts without corresponding alerts). SO
upalertafter should be interval * alertafter.

Hope that helps.

For example:

watch mail
	service ping
		description ICMP ping of mai servers
		interval 1m
		monitor fping.monitor -r 3
		period wd {Mon-Sun}
			alertafter 3
			alertevery 30m
			alert mail.alert <snip>
			upalertafter 3m
			upalert mail.alert <snip>

---
Jesse <j@lumiere.net>