Monitoring

From V.S.V., Inc.
Jump to: navigation, search

OpCfg

Bulk updates

Scenario:
We have a service that we forgot to add the Extended Information to. This service was then replicated to over 350 machines before we noticed the problem. Doing the changes through the GUI is painful to say the least.

A single test case

In order to start fixing the problem I had to find the service in the OpCfg database. I used the following SQL statements to trackdown one instance to the service and change it to verify my changes would work and I was looking at the correct data.

  1. SELECT host_id, host_name FROM `opcfg`.`nagios_hosts` where host_name='server1.example.com';
  2. SELECT service_id, service_description FROM `opcfg`.`nagios_services` where host_id=376 and service_description='Service - crond';
  3. SELECT sei.service_id, sd.service_description, sei.notes_url, sei.icon_image FROM  `opcfg`.`nagios_services_extended_info` as sei join (`opcfg`.`nagios_services` as sd) on (sei.service_id = sd.service_id) where sei.service_id=6634 and sd.service_description = 'Service - crond';
  4. UPDATE `opcfg`.`nagios_services_extended_info` as sei, `opcfg`.`nagios_services` as sd set sei.notes_url='http://wiki.example.com/wiki/index.php/$SERVICEDESC$', sei.icon_image='crond.gif' WHERE sei.service_id = sd.service_id and sei.service_id=6634 and sd.service_description = 'Service - crond';

In the first SELECT, change host_name to your test machine. This will give you the host_id for the second SELECT.
In the second SELECT, change host_id to what you got in the first SELECT and change service_description to the name of the service you need to fix.
The third SELECT gets you the fields you will actually be changing. Set sei.service_id to the value of service_id in the second SELECT. sd.service_description should be set to the service_description you used in the second SELECT.
The UPDATE is what actually makes the changes. Be sure to set sei.service_id and sei.service_description to what they were in the third SELECT. sei.notes_url should be the full browser address to your documentation site. sei.icon_image should be updated to the correct image name or removed altogether if you don't yet have an image for this service.

En masse

To make the changes in bulk is actually much easier then changing a single service on a single machine. Here is the SQL you need:

  1. SELECT sei.service_id, sd.service_description, sei.notes_url, sei.icon_image FROM  `opcfg`.`nagios_services_extended_info` as sei join (`opcfg`.`nagios_services` as sd) on (sei.service_id = sd.service_id) where sd.service_description = 'Service - crond';
  2. UPDATE `opcfg`.`nagios_services_extended_info` as sei, `opcfg`.`nagios_services` as sd set sei.notes_url='http://wiki.example.com/wiki/index.php/$SERVICEDESC$', sei.icon_image='crond.gif' WHERE sei.service_id = sd.service_id and sd.service_description = 'Service - crond';

In the SELECT, just set sd.service_description to the name of the service you are updating.
In the UPDATE, make sure sd.service_description is set the same as in the SELECT. As before, sei.notes_url is the URL to your documentation site. sei.icon_image will need to be set to the proper file name or removed if you don't have an icon yet for your service.

Service Group

SELECT h.host_name, s.service_description, s.service_id, sgm.servicegroup_id, sg.servicegroup_name, sg.alias 
FROM  `opcfg`.`nagios_hosts` as h 
left join `opcfg`.`nagios_services` as s
on (h.host_id = s.host_id) 
left join `opcfg`.`nagios_servicegroup_membership` as sgm 
on (s.service_id = sgm.service_id) 
left join `opcfg`.`nagios_servicegroups` as sg 
on (sgm.servicegroup_id = sg.servicegroup_id) 
where (s.service_description = 'Service - crond' and sgm.servicegroup_id is null);
INSERT INTO opcfg.nagios_servicegroup_membership (service_id, servicegroup_id)
SELECT service_id, '4' sgid
FROM opcfg.nagios_services
WHERE service_description = 'Service - crond'
AND service_id NOT IN (SELECT service_id 
                       FROM opcfg.nagios_servicegroup_membership 
                       WHERE servicegroup_id = 4) ;

To check for duplicates:

SELECT count(*), service_id 
FROM opcfg.nagios_servicegroup_membership
WHERE servicegroup_id = 4
GROUP BY service_id
HAVING COUNT(*) > 1;

Wrap up

After you have made your changes, don't forget to go back into the GUI and run the export or Icinga will never know about all your hard work.

SNMPTT

/usr/sbin/snmpttconvertmib --in=/usr/share/snmp/mibs/CISCO-STACKWISE-MIB.my --out=snmptt.conf-cisco-c3750 --net_snmp_perl --exec='/usr/lib64/nagios/plugins/submit_check_result $r TRAP 2'

snmptrapd

man snmptrapd.conf At the top it says: Previously, snmptrapd would accept all incoming notifications, and log them automatically (even if no explicit configuration was provided). Starting with release 5.3, access control checks will be applied to incoming notifications. If snmptrapd is run without a suitable configuration file (or equivalent access control settings), then such traps WILL NOT be processed. See the section ACCESS CONTROL for more details. The option you are probably interested in can be found at the bootom of the "ACCESS CONTROL" section. It says: disableAuthorization yes will disable the above access control checks, and revert to the previous behaviour of accepting all incoming notifications.

If you add disableAuthorization yes to /etc/snmp/snmptrapd.conf you should be happy. If this file does not exist, create it, then restart snmptrapd.

Where's the database?

If you are using a database to store trap information you will need to configure it in /etc/snmp/snmptt.ini

Troubleshooting

  • If you are getting duplicate alerts from Icinga, verify the ownership of the directory, /var/spool/snmptt. It should be owned by the user that snmptt runs as; svcsnmptt.
  • If you see an error in /var/log/messages similar to:
snmptt-sys[10483]: Unable to delete trap file #snmptt-trap-1370271985157888 from spool dir
You need to make sure the spool directory permissions are correct. /var/spool/snmptt should be owned by the user that snmptt runs as.
  • If you see the following error in /var/log/messages:
snmptt-sys[10483]: Can not open log file /var/log/snmptt/snmptt.log

Verify ownership of the directory, /var/log/snmptt. It should be owned by the user that snmptt runs as.

Icinga

How To's

How To - Build or Upgrade the mon-icinga RPM
How To - Build or Upgrade the mon-nrpe-plugin RPM
How To - Build or Upgrade the mon-pnp4nagios RPM
How To - Build or Upgrade the mon-rrdtool RPM
How To - Build or Upgrade the mon-opcfg RPM

Where's the database?

The location of the database is set in /usr/local/icinga/etc/ido2db.cfg

OpCfg

Where's the database?

The location of the database is set in /usr/local/icinga/share/opcfg/includes/config.inc