[WBEL-users] Still killing me softly with mon

Ed Morrison emorrison@ncen.org
Tue, 14 Sep 2004 10:34:27 -0700


Well my mon issues seem to never solve.  After a vacation I am once
again tackling my unsolvable mon problems.  I have a RedHat 8 box
running mon-0.99.2 (last stable version).  I need/want to rid myself of
this box and use WBEL for this purpose.  To this end I have WBEL
installed along with mon-0.99.2.  Mon works great for sending me alerts
that the services I'm monitoring are down.  Unfortunately they are not
down.  Below is a tail of my log file.  I thought iptables might be my
problem so I stopped them with no change.  My RH8 box is on the same
network as the WBEL box and behind the same firewall as the WBEL with
the same port openings to it (working just fine).  For that matter I
reassigned the IPs of both boxes and gave the WBEL box the RH8 IP and
this too did nothing to help my cause. 

In addition I decided to setup a WBEL box on the same network I am
running the services I am monitoring to see if I am missing anything
with my firewall or iptables, this WBEL box does the exact same thing as
the WBEL box on my home network.

Below is a tail of /var/log/messages and a copy of my mon.cf file.

Tail excerpt:

Sep 14 10:21:02 whitebox mon[7593]: mon server started
Sep 14 10:21:07 whitebox mon[7593]: failure for wwwservers ping
1095182467 ncen yubamail
Sep 14 10:21:07 whitebox mon[7593]: calling alert mail.alert for
wwwservers/ping (/usr/lib/mon/alert.d/mail.alert,changed to protect the
innocent) ncen yubamail
Sep 14 10:21:07 whitebox mon[7593]: calling alert mail.alert for
wwwservers/ping (/usr/lib/mon/alert.d/mail.alert,changed to protect the
innocent) ncen yubamail


mon.cf:

# Example "mon.cf" configuration for "mon".
#
# $Id: example.cf 1.1 Sat, 26 Aug 2000 15:22:34 -0400 trockij $
#                                                                                  
#
# This works with 0.38pre8
#                                                                               
#
# global options
#
cfbasedir   = /usr/lib/mon/etc
alertdir    = /usr/lib/mon/alert.d
mondir      = /usr/lib/mon/mon.d
maxprocs    = 20
histlength = 100
randstart = 60s
                                                                                     
#
# authentication types:
#   getpwnam      standard Unix passwd, NOT for shadow passwords
#   shadow        Unix shadow passwords (not implemented)
#   userfile      "mon" user file
#
authtype = getpwnam
                                                                                 
#
# NB:  hostgroup and watch entries are terminated with a blank line (or
# end of file).  Don't forget the blank lines between them or you lose.
#                                                                                   
#
# group definitions (hostnames or IP addresses)
#
hostgroup servers-nccc nccc_01 nccc_02 sql_1 sql_2
                                                                                     
# hostgroup serversbd2 dns-yp2 foo2 bar2 ola3
                                                                                     
hostgroup mailhost yubamail
                                                                                     
hostgroup routers admin ctec1 ctec2
                                                                                     
hostgroup switch admin-switch
                                                                                     
# hostgroup workstations blue yellow red green cornflower violet
                                                                                     
# hostgroup netapps f330 f540
                                                                                     
hostgroup wwwservers www www2 yubamail ncen
                                                                                     
# hostgroup printers hp5si hp5c hp750c
                                                                                     
# hostgroup new nntp
                                                                                     
hostgroup ftp ftp
                                                                                     
#
# For the servers in building 1, monitor ping and telnet
# BOFH is on weekend call :)
#
watch servers-nccc
    service ping
        description ping servers in bd1
        interval 5m
        monitor fping.monitor
        period wd {Mon-Fri} hr {7am-10pm}
            alert mail.alert changed to protect the innocent
            alertevery 1h
        period NOALERTEVERY: wd {Mon-Fri} hr {7am-10pm}
            alert mail.alert changed to protect the innocent
        period wd {Sat-Sun}
            alert mail.alert changed to protect the innocent
            alert mail.alert changed to protect the innocent
#    service telnet
#       description telnet to servers in bd1
#       interval 10m
#       monitor telnet.monitor
#       depend serversbd1:ping
#       period wd {Mon-Fri} hr {7am-10pm}
#           alertevery 1h
#           alertafter 2 30m
#           alert mail.alert emorrison@ncen.org
#           alert page.alert changed to protect the innocent
                                                                                     
watch mailhost
    service fping
        period wd {Mon-Fri} hr {7am-10pm}
            alert mail.alert changed to protect the innocent
            alertevery 1h
#    service telnet
#       interval 10m
#       monitor telnet.monitor
#       period wd {Mon-Fri} hr {7am-10pm}
#           alertevery 1h
#           alertafter 2 30m
#           alert mail.alert emorrison@ncen.org
#           alert page.alert changed to protect the innocent
    service smtp
        interval 10m
        monitor smtp.monitor
        period wd {Mon-Fri} hr {7am-10pm}
            alertevery 1h
            alertafter 2 30m
            alert mail.alert changed to protect the innocent
    service imap
        interval 10m
        monitor imap.monitor
        period wd {Mon-Fri} hr {7am-10pm}
            alertevery 1h
            alertafter 2 30m
            alert mail.alert changed to protect the innocent
    service pop
        interval 10m
        monitor pop3.monitor
           period wd {Mon-Fri} hr {7am-10pm}
            alertevery 1h
            alertafter 2 30m
            alert mail.alert changed to protect the innocent
                                                                                     
watch wwwservers
    service ping
        interval 2m
        monitor fping.monitor
        allow_empty_group
        period wd {Sun-Sat}
            alert mail.alert changed to protect the innocent
            alert mail.alert changed to protect the innocent
            alertevery 45m
    service http
        interval 4m
        monitor http.monitor
        allow_empty_group
        period wd {Sun-Sat}
            alert netpage.alert edpage
            upalert mail.alert -S "web server is back up" mis
            alertevery 45m
#    service telnet
#       monitor telnet.monitor
#       allow_empty_group
#       period wd {Mon-Fri} hr {7am-10pm}
#           alertevery 1h
#           alertafter 2 30m
#           alert mail.alert mis@domain.com
#           alert page.alert mis-pagers@domain.com
                                                                                     
#
# If the routers aren't pingable, send a page using
# a phone line and the IXO protocol, which doesn't
# rely on the network. Failure of a router is pretty serious,
# so check every two minutes.
#
# Send out one page every 45 minutes, but log the failure
# to a file every time.
#
watch routers
    service ping
        description routers which connect bd1 and bd2
        interval 1m
        monitor fping.monitor
        period wd {Sun-Sat}
            alert mail.alert changed to protect the innocent
            alert mail.alert changed to protect the innocent
            alertevery 45m
#       period LOGFILE: wd {Sun-Sat}
#           alert file.alert -d /usr/lib/mon/log.d routers.log
                                                                          #
# If mon cannot ping one of the hubs, users will be calling soon
#
watch switch
    service ping
        interval 1m
          monitor fping.monitor
        period wd {Sun-Sat}
            alert mail.alert changed to protect the innocent
            alert mail.alert changed to protect the innocent
            alertevery 45m
                                                                                     
#
# Monitor free disk space on the NFS servers
#
# When space gets below 5 megs, send mail, and delete
# the oldest nightly snapshots.
#
# monitors that terminate with ";;" are not executed with the
# host group appended to the command line
#
#watch netapps
#    service freespace
#       interval 15m
#       monitor freespace.monitor /f330:5000 /f540:5000 ;;
#       period wd {Sun-Sat}
#           alert mail.alert mis@domain.com
#           alert delete.snapshot
#           alertevery 1h
                                                                                     
#
# workstations
#
#watch workstations
#    service ping
#       interval 5m
#       monitor fping.monitor
#       period wd {Sun-Sat}
#           alert mail.alert mis@domain.com
#           alertevery 1h
                                                                                     
#
# news server
#
#watch news
#    service ping
#       interval 5m
#       monitor fping.monitor
#       period wd {Sun-Sat}
#           alert mail.alert mis@domain.com
#           alertevery 1h
#    service nntp
#       interval 5m
#       monitor nntp.monitor
#       period wd {Sun-Sat}
#           alert mail.alert mis@domain.com
#           alertevery 1h
                                                                                     
#
   # FTP server
#
watch ftp
    service ftp
        interval 5m
        monitor ftp.monitor
        period wd {Sun-Sat}
            alert mail.alert changed to protect the innocent
            alertevery 1h
                                                                              
#
# dial-in terminal server
#
#watch dialin
#    service 555-1212
#        interval 60m
#        monitor dialin.monitor.wrap -n 555-1212 -t 80 ;;
#        period wd {Sun-Sat}
#            alert mail.alert mis@domain.com
#            upalert mail.alert mis@domain.com
#            alertevery 8h
#    service 555-1213
#        interval 33m
#        monitor dialin.monitor.wrap -n 555-1213 -t 80 ;;
#        period wd {Sun-Sat}
#            alert mail.alert mis@domain.com
#            upalert mail.alert mis@domain.com
#            alertevery 8h
                                                                                     
                                                                                  
                                    
If anyone can help me with this I would appreciate it.  


Thank you,

Ed