Monday, December 06, 2010

(Quick) Nagios setup

A quick & dirty "guide" to get Nagios monitoring on a local Linux server in no time. This has been done on CentOS, but the same applies to just about any other distro, as long as you take care of the paths to the configuration files.

cp template.HOSTNAME.FQDN.cfg $HOSTNAME.cfg
emacs $HOSTNAME.cfg

Meta X replace-regex HOSTNAME host
Meta X replace-regex $HOSTNAME.FQDN $HOSTNAME

Section "define host":
Replace "Description here" with the real data
Replace "ip.address.here" with the real IP address of the host

Section "define contact":
Replace "Description here" with the real data, a concise description.
email Replace with the real email addresses of the contacts, or the contact that should receive notifications from this server.

Section "define contactgroup":
Replace "Description here" with the real data of the Linux server.
Be sure to add more services to suit your needs, or the needs of the monitored server.
Copy the file to the Nagios directory:

sudo cp $HOSTNAME.cfg /etc/nagios/objects/


Edit the commands file:

sudo emacs /etc/nagios/objects/commands.cfg


Specifically, edit the strings "***** Nagios *****" on 'notify-host-by-email' & 'notify-service-by-email' to something that identifies the box from which the emails are coming from.
I usually leave like this:

** Nagios $HOSTNAME **

Edit Nagio's master configuration file, so it uses the cfg file just created, and discard the default localhost.cfg one.

sudo emacs /etc/nagios/nagios.cfg

Comment the line:

cfg_file=/etc/nagios/objects/localhost.cfg

And add a line with the cfg file just created.

cfg_file=/etc/nagios/objects/$HOSTNAME.cfg

Before launching Nagios, tight a bit the web access:

sudo emacs /etc/httpd/conf.d/nagios.conf

And setup basic security (enable SSL, setup the password protection, limit IP range, etc).

Test it

sudo /usr/bin/nagios -v /etc/nagios/nagios.cfg

And if its OK, launch it.

Here is a copy of "template.HOSTNAME.FQDN.cfg":

## -----------------------------------------------

## History

## -----------------------------------------------

## sudo /usr/bin/nagios -v /etc/nagios/nagios.cfg

define host{
name HOSTNAME-host
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 120
notification_options d,u,r
contact_groups HOSTNAME-contactgroup
register 0
}

define host{
use HOSTNAME-host
host_name HOSTNAME.FQDN
alias Description here
address ip.address.here
}

define contact{
contact_name HOSTNAME-contact
use generic-contact
alias Description here
email 1@email.com, 2@email.com, 3@email.com
}

define contactgroup{
contactgroup_name HOSTNAME-contactgroup
alias Description here
members HOSTNAME-contact
}

# SERVICE DEFINITIONS

define service{
name HOSTNAME-generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 2
contact_groups HOSTNAME-contactgroup
notification_options w,u,c,r
notification_interval 120
notification_period 24x7
register 0
}

define service{
name HOSTNAME-service
use HOSTNAME-generic-service
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
register 0
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Connectivity
check_command check_ping!100.0,20%!500.0,60%
}

# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Partition /
check_command check_local_disk!20%!10%!/
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Partition /boot
check_command check_local_disk!20%!10%!/boot
}

# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Current Users
check_command check_local_users!20!50
}

# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Swap Usage
check_command check_local_swap!20!10
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Service SSH
check_command check_ssh
}

define service{
use HOSTNAME-service
host_name HOSTNAME.FQDN
service_description Service HTTP
check_command check_http
}

# EoF #

Labels: , , , ,

0 Comments:

Post a Comment

<< Home