Monitor SQL Server (MSSQL) using Nagios

From ben.goodacre.name/tech

Jump to: navigation, search

Nagios is a free open-source monitoring software/platform.

Contents

Service

For a non-renamed instance the following can be used:

define service{
        use                     service
        hostgroup_name  sql-servers
        service_description     SQL Svc
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQLSERVER
        }

If the instance has been renamed the service will have a dollar-sign ("$") as part of it, for example an instance called "MAIN" will be called "MSSQL$MAIN". Dollar signs indicate variables to Nagios so the character must be escaped. Escaping characters involves repeating it and encapsulating it within quotation marks, see below:

define service{
        use                     service
        hostgroup_name  sql-servers
        service_description     SQL Svc PROD
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQL"$$"MAIN
        }

SQL port is open

Default port for SQL is 1433 for non-named instances, adjust as required. Ensure the check_tcp plugin is installed.

define service{
        use                     service
        hostgroup_name  sql-servers-old
        service_description     SQL Connectivity 1433
        check_command           check_tcp!1433
}

If you have a named-instance SQL allocates a random port number upon installation. To ascertain the 'dynamic port' try using tcpview from SysInternals. Or follow this page: How to connect to a named instance of SQL Server 2005 or SQL Server 2000 by using the client tools in the earlier version of SQL Server and look under the 'Determine the TCP/IP port number of the instance of SQL Server' section.

Connection check

The following will allow Nagios to connect to the SQL server with a given username and password.

The plugin check_mssql.sh is already in the contrib/ direcory of the nagios-plugins release. If it is not available it can be downloaded here: http://ben.goodacre.name/nagios/check_mssql.sh . The script requires FreeTDS installed and setup.

Installing FreeTDS

wget ftp://ftp.ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz
tar zxf freetds-stable.tgz
cd freetds-stable/
./configure
make
sudo make install

Test the plugin

user@server:~/nagios-plugins-1.4.11/contrib$ ./check_mssql.sh config-entry-in-square-brackets nagios passwordhere 2000
OK - MS SQL Server 2000 has 2 user(s) connected: 1 nagios, 1 (1rowaffected).

Now copy the check_mssql.sh file to your /usr/local/nagios/libexec and the plugin can be used in the following service and command config:

define service{
        use                     service
        host_name          sql1
        service_description     SQL Connection Check
        check_command           check_mssql_sql1
}
#Due to the way that Nagios expends the $HOSTNAME$ variable it does not work, so an individual check_command must be created per server. If somebody knows the correct way to do this, please add it to discussion.
define command{
        command_name           check_mssql_sql1
        command_line            $USER1$/check_mssql.sh sql1 nagios Rogerrabbit45! 2000
}

See also: http://serverfault.com/questions/15557/testing-that-sql-server-2005-is-listening-for-freetds

Load/health monitoring

Below check_nt is used to monitor WMI counters. The WMI syntax is different if a named instance is used as opposed to the default 'MSSQLSERVER'

#Monitoring with a non-named instance:
define service{
        use                     service
        hostgroup_name          sql-servers-old
        service_description     SQL DB Allocated Pages
        check_command           sql-wmi-totalpages
}
define command{
        command_name sql-wmi-totalpages
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\SQLServer:Buffer Manager\Total pages","Total allocated pages in memory: %.f"
}
define service{
        use                     service
        hostgroup_name          sql-servers-old
        service_description     SQL DB Batch requests/sec
        check_command           sql-wmi-batchreqs
}
define command{
        command_name sql-wmi-batchreqs
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\SQLServer:SQL Statistics\Batch Requests/sec","Batch requests/sec: %.f" -w 50 -c 250
}
define service{
        use                     service
        hostgroup_name          sql-servers-old
        service_description     SQL DB Log flushes/sec
        check_command           sql-wmi-logflushes
}
define command{
        command_name sql-wmi-logflushes
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\SQLServer:Databases(_Total)\Log Flushes/sec","Log flushes/sec: %.f" -w 50 -c 250
}

#Monitoring a named instance called PROD:
define service{
        use                     service
        hostgroup_name          sql-servers-new
        service_description     SQL DB Allocated Pages2
        check_command           sql-wmi-totalpages2
}
define command{
        command_name sql-wmi-totalpages2
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\MSSQL\$PROD:Buffer Manager\Total pages","Total allocated pages in memory: %.f"
}
define service{
        use                     service
        hostgroup_name          sql-servers-new
        service_description     SQL DB Batch requests/sec2
        check_command           sql-wmi-batchreqs2
}
define command{
        command_name sql-wmi-batchreqs2
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\MSSQL\$PROD:SQL Statistics\Batch Requests/sec","Batch requests/sec: %.f" -w 50 -c 250
}
define service{
        use                     service
        hostgroup_name          sql-servers-new
        service_description     SQL DB Log flushes/sec2
        check_command           sql-wmi-logflushes2
}
define command{
        command_name sql-wmi-logflushes2
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\MSSQL\$PROD:Databases(_Total)\Log Flushes/sec","Log flushes/sec: %.f" -w 50 -c 250
}

Connection time

Through use of the check_mssql_health plugin the time to a sucessful connection to SQL can be monitored.

define service{
        use                             service
        hostgroup_name                  sql-servers
        service_description             SQL Connection time
        check_command                   check_mssql_health!connection-time
        }

I/O Busy

Busy I/O should be monitored as it is an indicator of a busy server and can again be monitored though the check_mssql_health plugin.

define service{
        use                             service
        hostgroup_name                  sql-servers-new
        service_description             IO Busy
        check_command                   check_mssql_health!io-busy
        }

Installing the check_mssql_health plugin

define command{
        command_name            check_mssql_health
        command_line            $USER1$/check_mssql_health -server $HOSTNAME$ -username usernamehere -password passhere --mode $ARG1$
}

See Also

check_mssql_health connect to server Translated from German

Monitor mirroring health and availability

Through effective monitoring of WMI counters the queue of data on the master (called principal by Microsoft) awaiting to be sent to the slave (called mirror by Microsoft) can be monitored, called the send queue. Delays in waiting for the mirror to commit a transaction as well as the tlogs that remain to be applied to the mirror to roll it forwards can also be checked, called the transaction-dealy and the redo queue respectively.

#check_nt syntax for perfmon counnters differs from what is seen from within perfmon for SQL named instances
#This monitors a named instance called PROD:

#Send queue / principal queue
define service{
        use                     service
        hostgroup_name          sql-principal
        service_description     SQL Mirroring TLog Send queue KB
        check_command           sql-wmi-mir-tlogqueue
}
define command{
        command_name sql-wmi-mir-tlogqueue
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\MSSQL\$PROD:Database Mirroring(_Total)\Log Send Queue KB","Mirroring TLog Send queue in KB: %.f" -w 50 -c 100
}

#Redo queue / mirror queue
define service{
        use                     service
        hostgroup_name          sql-mirror
        service_description     SQL Mirroring Redo queue KB
        check_command           sql-wmi-mir-redoqueue
}
define command{
        command_name sql-wmi-mir-redoqueue
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\MSSQL\$PROD:Database Mirroring(_Total)\Redo Queue KB","Mirroring TLog Redo queue in KB: %.f" -w 50 -c 100
}

#Transaction delay
define service{
        use                     service
        hostgroup_name          sql-servers-new
        service_description     SQL Mirroring Transaction Delay
        check_command           sql-wmi-mir-tdelay
}
define command{
        command_name sql-wmi-mir-tdelay
        command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\MSSQL\\PROD:Database Mirroring(_Total)\Transaction Delay","Delay in transaction termination acknowledgement: %.f" -w 1000 -c 15000
}

See Also

Using the monitoring counters defined should cover all scenarios. However mirroring can be monitored using the Database mirrroing monitor. Alerts to the event log can be setup for various thresholds and these entries in the event log can be watched for by the use of a SQL Agent Alert.

Another method to monitor mirroring is to use system stored procedures: Using Warning Thresholds and Alerts on Mirroring Performance Metrics

Resources

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox