Monitor SQL Server (MSSQL) using Nagios
Nagios is a free open-source monitoring software/platform.
Service
For a non-renamed instance the following can be used:
define service{
use service
hostgroup_name sql-servers
service_description SQL Svc
check_command check_nt!SERVICESTATE!-d SHOWALL -l MSSQLSERVER
}
If the instance has been renamed the service will have a dollar-sign ("$") as part of it, for example an instance called "MAIN" will be called "MSSQL$MAIN". Dollar signs indicate variables to Nagios so the character must be escaped. Escaping characters involves repeating it and encapsulating it within quotation marks, see below:
define service{
use service
hostgroup_name sql-servers
service_description SQL Svc PROD
check_command check_nt!SERVICESTATE!-d SHOWALL -l MSSQL"$$"MAIN
}
SQL port is open
Default port for SQL is 1433 for non-named instances, adjust as required. Ensure the check_tcp plugin is installed.
define service{
use service
hostgroup_name sql-servers-old
service_description SQL Connectivity 1433
check_command check_tcp!1433
}
If you have a named-instance SQL allocates a random port number upon installation. To ascertain the 'dynamic port' try using tcpview from SysInternals. Or follow this page: How to connect to a named instance of SQL Server 2005 or SQL Server 2000 by using the client tools in the earlier version of SQL Server and look under the 'Determine the TCP/IP port number of the instance of SQL Server' section.
Connection check
The following will allow Nagios to connect to the SQL server with a given username and password.
The plugin check_mssql.sh is already in the contrib/ direcory of the nagios-plugins release. If it is not available it can be downloaded here: http://ben.goodacre.name/nagios/check_mssql.sh . The script requires FreeTDS installed and setup.
Installing FreeTDS
wget ftp://ftp.ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz
tar zxf freetds-stable.tgz
cd freetds-stable/
./configure
make
sudo make install
- Edit the /usr/local/etc/freetds.conf file to include your SQL server hostname and port. This is important as the "-S" parameter refers to the configuration entry here and not the hostname itself.
- Test the config:
tsql -S config-entry-in-square-brackets -U username -P password
If you connect and see a1>
prompt it is working ok. - SQL authentication will need to be enabled, as well as connect privilidges on the default database - usually 'master'.
Test the plugin
user@server:~/nagios-plugins-1.4.11/contrib$ ./check_mssql.sh config-entry-in-square-brackets nagios passwordhere 2000
OK - MS SQL Server 2000 has 2 user(s) connected: 1 nagios, 1 (1rowaffected).
Now copy the check_mssql.sh file to your /usr/local/nagios/libexec and the plugin can be used in the following service and command config:
define service{
use service
host_name sql1
service_description SQL Connection Check
check_command check_mssql_sql1
}
#Due to the way that Nagios expends the $HOSTNAME$ variable it does not work, so an individual check_command must be created per server. If somebody knows the correct way to do this, please add it to discussion.
define command{
command_name check_mssql_sql1
command_line $USER1$/check_mssql.sh sql1 nagios Rogerrabbit45! 2000
}
See also: http://serverfault.com/questions/15557/testing-that-sql-server-2005-is-listening-for-freetds
Load/health monitoring
Below check_nt is used to monitor WMI counters. The WMI syntax is different if a named instance is used as opposed to the default 'MSSQLSERVER'
#Monitoring with a non-named instance:
define service{
use service
hostgroup_name sql-servers-old
service_description SQL DB Allocated Pages
check_command sql-wmi-totalpages
}
define command{
command_name sql-wmi-totalpages
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\SQLServer:Buffer Manager\\Total pages","Total allocated pages in memory: %.f"
}
define service{
use service
hostgroup_name sql-servers-old
service_description SQL DB Batch requests/sec
check_command sql-wmi-batchreqs
}
define command{
command_name sql-wmi-batchreqs
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\SQLServer:SQL Statistics\\Batch Requests/sec","Batch requests/sec: %.f" -w 50 -c 250
}
define service{
use service
hostgroup_name sql-servers-old
service_description SQL DB Log flushes/sec
check_command sql-wmi-logflushes
}
define command{
command_name sql-wmi-logflushes
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\SQLServer:Databases(_Total)\\Log Flushes/sec","Log flushes/sec: %.f" -w 50 -c 250
}
#Monitoring a named instance called PROD:
define service{
use service
hostgroup_name sql-servers-new
service_description SQL DB Allocated Pages2
check_command sql-wmi-totalpages2
}
define command{
command_name sql-wmi-totalpages2
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\MSSQL\\$PROD:Buffer Manager\\Total pages","Total allocated pages in memory: %.f"
}
define service{
use service
hostgroup_name sql-servers-new
service_description SQL DB Batch requests/sec2
check_command sql-wmi-batchreqs2
}
define command{
command_name sql-wmi-batchreqs2
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\MSSQL\\$PROD:SQL Statistics\\Batch Requests/sec","Batch requests/sec: %.f" -w 50 -c 250
}
define service{
use service
hostgroup_name sql-servers-new
service_description SQL DB Log flushes/sec2
check_command sql-wmi-logflushes2
}
define command{
command_name sql-wmi-logflushes2
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -v COUNTER -d SHOWALL -l "\\\\MSSQL\\$PROD:Databases(_Total)\\Log Flushes/sec","Log flushes/sec: %.f" -w 50 -c 250
}
Connection time
Through use of the check_mssql_health plugin the time to a sucessful connection to SQL can be monitored.
- Install the plugin: check_mssql_health Installation
- The service configuration could look like this:
define service{
use service
hostgroup_name sql-servers
service_description SQL Connection time
check_command check_mssql_health!connection-time
}
I/O Busy
Busy I/O should be monitored as it is an indicator of a busy server and can again be monitored though the check_mssql_health plugin.
- Install the plugin: check_mssql_health Installation
- Append the following service config :
define service{
use service
hostgroup_name sql-servers-new
service_description IO Busy
check_command check_mssql_health!io-busy
}
Installing the check_mssql_health plugin
- Download the plugin: check_mssql_health plugin homepage (translated from German) -- Original page in German
- Copy to your /usr/local/nagios/libexec dir.
- The plugin requires FreeTDS to be
installed: Installing
FreeTDS
The plugin can utilise a freetds.conf file that is not in the
location it would be expected to reside in, even if FreeTDS is
currently installed. Check
/usr/share/libct3/freetds.conf
,/usr/local/etc/freetds.conf
and/etc/freetds/freetds.conf
- Append the following to your commands.cfg:
define command{
command_name check_mssql_health
command_line $USER1$/check_mssql_health -server $HOSTNAME$ -username usernamehere -password passhere --mode $ARG1$
}
See Also
[http://translate.google.com/translate?hl=en&sl=de&u=http://www.nagios-portal.org/wbb/index.php%3Fpage%3DThread%26threadID%3D14926&ei=f5CBSu75NtirjAfVyKH2CQ&sa=X&oi=translate&resnum=1&ct=result&prev=/search%3Fq%3D%253Bport%253D1433%2527,%2527nagios%2527,...)%2Bfailed:%2B(no%2Berror%2Bstring)%2Bat%2B/usr/local/nagios/libexec/check_mssql_health%2Bline%2B1897%26hl%3Den%26client%3Dopera%26rls%3Den-GB%26hs%3DSqt check_mssql_health connect to server] Translated from German
Monitor mirroring health and availability
Through effective monitoring of WMI counters the queue of data on the master (called principal by Microsoft) awaiting to be sent to the slave (called mirror by Microsoft) can be monitored, called the send queue. Delays in waiting for the mirror to commit a transaction as well as the tlogs that remain to be applied to the mirror to roll it forwards can also be checked, called the transaction-dealy and the redo queue respectively.
#check_nt syntax for perfmon counnters differs from what is seen from within perfmon for SQL named instances
#This monitors a named instance called PROD:
#Send queue / principal queue
define service{
use service
hostgroup_name sql-principal
service_description SQL Mirroring TLog Send queue KB
check_command sql-wmi-mir-tlogqueue
}
define command{
command_name sql-wmi-mir-tlogqueue
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\\\MSSQL\\$PROD:Database Mirroring(_Total)\\Log Send Queue KB","Mirroring TLog Send queue in KB: %.f" -w 50 -c 100
}
#Redo queue / mirror queue
define service{
use service
hostgroup_name sql-mirror
service_description SQL Mirroring Redo queue KB
check_command sql-wmi-mir-redoqueue
}
define command{
command_name sql-wmi-mir-redoqueue
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\\\MSSQL\\$PROD:Database Mirroring(_Total)\\Redo Queue KB","Mirroring TLog Redo queue in KB: %.f" -w 50 -c 100
}
#Transaction delay
define service{
use service
hostgroup_name sql-servers-new
service_description SQL Mirroring Transaction Delay
check_command sql-wmi-mir-tdelay
}
define command{
command_name sql-wmi-mir-tdelay
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 24601 -s passhere -v COUNTER -d SHOWALL -l "\\\\MSSQL\\\\PROD:Database Mirroring(_Total)\\Transaction Delay","Delay in transaction termination acknowledgement: %.f" -w 1000 -c 15000
}
See Also
Using the monitoring counters defined should cover all scenarios. However mirroring can be monitored using the Database mirrroing monitor. Alerts to the event log can be setup for various thresholds and these entries in the event log can be watched for by the use of a SQL Agent Alert.
Another method to monitor mirroring is to use system stored procedures: Using Warning Thresholds and Alerts on Mirroring Performance Metrics
Resources
- Tips for Using SQL Server Performance Monitor Counters
- Monitoring MS SQL from *nix (Yes, it really works!) A perl script for creating Nagios config
- Configuring Nagios to read WMI information