NCAP - “Nagios Collector and Plugin” version 0.4 Giray Devlet 2004-02-29 --------------------------------------------------------------------------- Table of Contents Introduction Design Internals Installation Installation from Source Installation with RPM Configuring NCAP The ncap daemon The check_ncap plugin Debuggin Contact Info --------------------------------------------------------------------------- Introduction This document is about using and configuring the Nagios Collector and Proxy (ncap). You need an understanding of Nagios - http://www.nagios.org/ nrpe (Nagios Remote Plugin Executor) - http://www.nagios.org/download/extras.php ncap will receive requests by check_ncap, via the nagios (monitoring) host. Then these requests are proxied to actual nrpe daemons. --------------------------------------------------------------------------- Design The idea is to have a separate collector that gathers information from nrpe's. Then nagios gets its information from the collector (via a separate plugin, similar to check_nrpe). The need for something like this comes from the need for scalability (to have separate collectors for separate environments). Or to use it as a gateway between network borders. 'check_ncap' is a plugin for nagios, it: communicates with a given collector. Has parameters such as collector-IP, host-IP, check-command. 'ncap' is a daemon that can work in two modes I) transparent In transparent mode it will receive requests from a specified nagios host, and blindly forward the request to the specified host with the given command. Received requests will be cached, and continuously (according to a specified interval) checked. Results will be cached at the collector. When nagios asks for a status, cached information will be provided. If for a certain period new requests do not arrive, the checks for that service will not be repeated. Advantage: No need for extra configuration on the collector. (only specification of authorized nagios host) Disadvantage: nagios host IP's can be spoofed and random checks can be executed, which can result into DOS attacks! initial check will result in a 'UNKNOWN' state. II) non-transparent In the Non-transparent mode extra configuration is required. Host/command pairs need to be entered to specify which checks are allowed. Checks will continue independent from requests generated by the nagios host. When a request is made, cached information will be provided. Advantage: In case of a DOS attack, monitored systems will not be directly bothered. Disadvantage: One more system will have to be configured. --------------------------------------------------------------------------- Internals ncap consists out of 3 threads. 1) The listener thread 2) The scheduler thread 3) The worker thread When the program is initiated the ncap.cfg file is read in and processed. Then the listener and scheduler threads are stared. The listener thread will listen to incoming check_ncap requests at a given port. The default ncap port is 5667. When a request comes in, an new thread (worker) is created which handles the connection, and returns information to check_ncap. If the request is a previously unknown request, it is added to the command_list, and check_ncap gets an 'UNKNOWN' state with a message saying that the 'command'/'request' has been scheduled. If it is a previously known request, then the latest information is retrieved from the command_list and sent back to check_ncap. The scheduler thread traverses the command_list and sends requests out to nrpe daemons, masquerading as a check_nrpe request. The results are stored in the command_list. If a command has not been checked for a certain period of time (set by old_age in the configuration file) it is removed from the command_list. At the end of each complete traversal the scheduler looks at how long it took to get trough the whole list. If the time is less then the scheduler_wait value then it waits before processing anything else. When the the scheduler_wait time has been reached or passed, the scheduler starts over. --------------------------------------------------------------------------- Installation Installation can be made from source or via a pre-compiled RPM. Both of which can be found at http://thelinuxplatform.nl/ncap --------------------------------------------------------------------------- Installation from Source After you have downloaded the source : # tar zxvf ncap-0.4.tar.gz # cd ncap-0.4 # ./configure checking for a BSD-compatible install... /usr/bin/install -c checking for gcc... gcc checking for C compiler default output... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no [ ... output cut ... ] config.status: creating Makefile config.status: creating src/Makefile config.status: creating subst config.status: creating src/config.h *** Configuration summary for ncap 0.2 10-02-2004 ***: General Options: ------------------------- NCAP port: 5667 NRPE port: 5666 NCAP user: nagios NCAP group: nagios NCAP daemon installation: /usr/local NCAP client installation: /usr/local Review the options above for accuracy. If they look okay, type 'make all' to compile the NCAP daemon and client. Currently 'configure' will happily announce that everything went ok. Please check if that is really so. Now we can type 'make all' # make all cd ./src/; make ; cd .. make[1]: Entering directory `/home/giray/OLD/ncap/gd/ncap-0.4/src' gcc -g -O2 -I/usr/include/openssl -I/usr/include -DHAVE_CONFIG_H -o ncap ncap.c utils.c ssl_thread_safe.c -L/usr/lib -lpthread -lssl -lcrypto -lnsl gcc -g -O2 -I/usr/include/openssl -I/usr/include -DHAVE_CONFIG_H -o check_ncap check_ncap.c utils.c -L/usr/lib -lpthread -lssl -lcrypto -lnsl make[1]: Leaving directory `/home/giray/OLD/ncap/gd/ncap-0.4/src' *** Compile finished *** If the NCAP daemon and client compiled without any errors, you can continue with installation. The NCAP daemon and client binaries are located in the src/ subdirectory. ** If this is your monitoring host ** - Copy the check_ncap client to the directory that contains your Nagios plugins. - Create a command definition in your Nagios config file for the NCAP client. See the README file for more info on doing this. ** If this host will be running the NCAP daemon ** - Copy the ncap daemon to /usr/sbin, /usr/local/nagios or wherever you feel it fits best. - Copy the sample ncap.cfg config file to /etc, /usr/local/nagios or wherever you feel it fits best. At this point you can copy ./src/ncap ./src/check_ncap ncap.cfg to relevant locations like cp ./src/ncap /usr/sbin cp ./src/check_ncap /usr/lib/nagios/plugins cp ncap.cfg /etc/nagios alternatively you can use 'make daemon_install' to install ncap, or use 'make client_install' to install check_ncap. To uninstall you can also use 'make uninstall', however, this will remove any ncap related files on your system!!! --------------------------------------------------------------------------- Installation with RPM There are two ncap RPMs. The first one installs the ncap daemon which is supposed to run on the Collector/Proxy and is called: ncap-x.y.z-n.i386.rpm The second one is to be used on the nagios host and is called: ncap-plugin-x.y.z-n.i386.rpm --------------------------------------------------------------------------- NCAP machine So on the ncap machine the following has to be done rpm -Uvh ncap-0.4-1.i386.rpm Then you have to edit the ncap.cfg file which should be at /etc/nagios/ncap.cfg. More information about this in the next section --------------------------------------------------------------------------- NAGIOS machine The following has to be done to install check_ncap rpm -Uvh ncap-plugin-0.4-1.i386.rpm This will install check_ncap in /usr/lib/nagios/plugins However, for nagios to be able to use some files in the nagios configuration directory have to be edited. Details in the next section. --------------------------------------------------------------------------- Configuring NCAP After ncap and check_ncap have been installed some configuration has to be made to be able to use them. --------------------------------------------------------------------------- The ncap daemon The configuration file ncap is looking for should be either under /etc/nagios/ncap.cfg or /usr/local/nagios/etc/ncap.cfg. Of course since ncap is started with '-c' to tell it where its configuration file is, it could be virtually any place. The configuration parameters are as follows: server_port This determines on which port ncap is listening to incoming requests. The default value is 5667. client_port This determines which port the NRPE daemons are listening at, to which nrpe is going to make a connection. The default value is 5666. It is currently not possible to connect to different port per NPRE instance. It is assumed that all nrpe's in one environment listen to the same port. server_address This tells ncap which address to bind to if there are multiple IP addresses on the same host. This is commented out by default. allowed_hosts This is a list of hosts that are allowed to connect to this ncap daemon. The default value is '127.0.0.1', the localhost. This is a comma delimited list. Please make sure that there are no white spaces between the commas. ncap_user This is the user name that ncap is going to run as. The default value is nagios. ncap_group This the group that ncap is going to run as. The default value is nagios. log_level The log level determines how much logging ncap will do. For a productional environment this should be 0. The log levels are as follows: 0 Errors 1 Informative messages 2 Some verbosity, useless information 4 Real debug stuff 8 Developer stuff, very verbose 16 Logs that even the developers hardly want The Default value is 0. transparent_proxy This determines whether ncap will work as transparent proxy, or in restrictive mode. While working as transparent proxy all requests from valid hosts are accepted. In restrictive mode only commands defined in ncap.cfg will be accepted. (restrictive mode has not been implemented yet in 0.4.x) old_age This determines when a command is considered to old and is removed from the command_list. Each command in the command_list has a timestamp, which is updated every time there is a request for it. If a command is not asked for, for more then the time defined by old_age it is dropped. Time is given in seconds and the default value is 600 (10 minutes). scheduler_wait This is the maximum amount of time the scheduler will sleep before it starts to go trough the command list again. If processing the list takes more then the value of scheduler_wait, then it will not wait but start another round immediately. In case the queue is processed faster, then it will wait the difference between the scheduler_wait and processing time values. Time is given in seconds and the default value is 60. (1 minute). Since the scheduler currently is single threaded, in case it cannot reach a nrpe host it will wait for a 10 second timeout per command/request. This will slow down the process considerably. include This can be used to include an external configuration file with the same format as ncap.cfg. It is commented out on default. (not tested). include_dir This can be used to include a configuration directory, which contains configuration files (not tested). command This is used for either the restrictive proxy mode, or to quicken the processing of commands at start up. Syntax: command[check_disk1][192.168.1.1] The first field has the command name that the nrpe daemon will respond to, the second field is the IP address of the nrpe daemon. This is commented out on default. This functionality has not been implemented yet, and is scheduled for the 0.6 release. --------------------------------------------------------------------------- The check_ncap plugin The check_ncap plugin is used to send requests from a nagios host to the ncap daemon. For nagios to be able to use check_ncap several things have to be done. First the following has to be added to the checkcommands.cfg file: define command{ command_name check_ncap command_line $USER1$/check_nrcap -H $HOSTADDRESS$ -C $ARG1-c $ARG2$ } Then you can use this from within services.cfg like define service{ use generic-service host_name myhost.foo.org service_description disk1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups disk-admins notification_interval 120 notification_period workhours notification_options c,r check_command check_ncap!my_collector!disk1 } For a complex environment with many collectors an option would be to define multiple check commands as follows: define command{ command_name check_ncap_env1 command_line $USER1$/check_nrcap -H $HOSTADDRESS$ -C env1_collector -c $ARG2$ } define command{ command_name check_ncap_env2 command_line $USER1$/check_nrcap -H $HOSTADDRESS$ -C env2_collector -c $ARG2$ } define command{ command_name check_ncap_env3 command_line $USER1$/check_nrcap -H $HOSTADDRESS$ -C env3_collector -c $ARG2$ } Then in services we would have something like this define service{ use generic-service host_name myhost.foo.org service_description disk1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups disk-admins notification_interval 120 notification_period workhours notification_options c,r check_command check_ncap_env1!disk1 } --------------------------------------------------------------------------- Debugging Currently all log information goes to local2. You most likely have this not configured and will not get any log messages no matter how high you set the log_level. To see the log messages you will have to edit /etc/syslog.conf local2.*= /var/log/ncap.log then you have to restart syslog. For developers it is possible to run ncap without having daemonizing. ./ncap -D -c /etc/nagios/ncap.cfg will leave ncap running in you current terminal. --------------------------------------------------------------------------- Contact Info To contact the developers you can write to ncap-devel@yahoogroups.com For up-to-date information please visit the website at http://thelinuxplatform.nl/ncap/ ===========================================================================