MoSSHe: Monitoring Simple SHell Environment Copyright (C) 2003- Volker Tanger This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. http://www.gnu.org/copyleft/gpl.html HISTORY The general idea was taken from plugins I write for the Nagios (formerly NetSaint) network/host monitoring system (NMS), especially the various versions of ASLrules (http://www.wyae.de/sfotware/aslrules/). As most of the servers/services I want to monitor are remote systems, traditional NMS (relying on close-looped and/or unencrypted sessions) are either big, complicated to install for safe remote monitoring, ressource intense (when doing remote checks), lack a status history or a combination thereof. Thus I wrote this small, easily configured system. It is intended for monitoring of a handful of typical internet systems. Less than 10 are ideal for the "big" tactical web display, less then 30 for the upcoming compressed view. More than 100 systems should be possible, but I have not tested such an extended setup. For bug reports and suggestions or if you just want to talk to me please contact me at volker.tanger@wyae.de Updates will be available at http://www.wyae.de/software/mosshe/ please check there for updates prior to submitting patches! ------------------------------------------------------------------------ v10.6.22 Volker Tanger - nicer GUI: server-list & per-server listing (optional) - added output file suitable for passive checks in Nagios. This way MoSShE can be used as safe alternative for NRPE. v10.5.18 Volker Tanger - added SyslogOnChange to write alerts to syslog v10.3.24 Volker Tanger - added/changed load function: LoadCheck now checks 5-minutes average (instead of 15min window) LoadCheckFast checks 1-minute average LoadCheckSlow checks 15-minutes average - corrected POP3check which did not work at all since upgrade from version 1.4.7 (dumb variable renaming missed) v10.2.24 Volker Tanger - functions.network: redirected error messages by LYNX to /dev/zero to avoid error messages from CRON calls - corrected bug in self-locking (never reached timeout) - deleted temp files from SLA checks - added IDS functions: CheckFileChanges CheckConfigChanges plus helper-script GENERATE-COMPARES.SH v10.2.22 Volker Tanger - added web interface URL to alert mails - added function to selectively alert admins about their system - added full logging functions - added agglomeration function to allow central server(s), to create one (or more) central servers polling information from agents v10.2.18 Volker Tanger - more of bugfixes, especially in HTML output and SLA checks - self-locking to prevent simultaneous checks, including watchdog-restart with admin alerting v10.2.10 Volker Tanger - lots of bugfixes - generating summary SLA checks v9.11.28 Volker Tanger - complete rewrite - migrating from SSH-on-demand to local-Shell-via-cron - generation of local HTML status files ======================================================================== December 2008 v1.4.6 to v1.4.7 * Volker Tanger - removed bug: restart now clears status files ------------------------------------------------------------------------ November 2007 v1.4.5 to v1.4.6 * Volker Tanger - added check for local files (e.g. Unix sockets) - added check for running daemons/processes ------------------------------------------------------------------------ November 2007 v1.4.4 to v1.4.5 * Volker Tanger - HDD size check off by one, resulting in wrong numbers ------------------------------------------------------------------------ November 2007 v1.4.3 to v1.4.4 * Volker Tanger - corrected error messages due to missing date when programs not installed - corrected server identification which did misfire when the first part of a server matched a different one - added http_time check which returns reply time in milliseconds - cleanup of stale (read: all) status files on startup ------------------------------------------------------------------------ August 2007 v1.4.2 to v1.4.3 * Volker Tanger - console errors from mosshe_ssh are redirected into /dev/null ------------------------------------------------------------------------ May 2007 v1.4.0 to v1.4.2 * Volker Tanger - added mosshe_checkrun (no file changes, for debugging) - added SMB disk free check (sambafree) - corrected filtering RegEx in mosshe.singlerun ------------------------------------------------------------------------ January 2007 v1.3.11 to v1.4.0 * Volker Tanger - changed name of mosshe checks (mosshe_ssh instead of ssh) - added MOSSHE_HTTP check for unencrypted HTTP-based check - added Logs/statechange_YEAR.log for easier up-/downtime calculation ------------------------------------------------------------------------ February 2007 v1.4.0 to v1.4.1 * Volker Tanger - corrected (finally!) memory check in localcheck.functions ------------------------------------------------------------------------ April 2006 v1.3.10 to v1.3.11 * Volker Tanger - added "noping_" feature - added RETRYSLEEP variable - removed bug which left stale status information in server stat file ------------------------------------------------------------------------ December 2005 v1.3.9 to v1.3.10 * Volker Tanger - Changed FTP check to function with more servers (better/more robust return code separation) - cosmetic fixes in web interface - updated documents (thanks Eduardo Grosclaude ) - added "findbuggyline.py" to the Logs directory ------------------------------------------------------------------------ October 2005 v1.3.8 to v1.3.9 * Volker Tanger - Changed naming logic of log check a bit to allow multiple checks on one single log file, which was impossible before. ------------------------------------------------------------------------ October 2005 v1.3.7 to v1.3.8 * Volker Tanger - Added log file check (number of occorrencies) to SSH/localcheck.functions ------------------------------------------------------------------------ June 2005 v1.3.6 to v1.3.7 * Volker Tanger - corrected bug in DNS check * Ronny Henke - updated SNMP check for enchanced compatibility ------------------------------------------------------------------------ June 2005 v1.3.5 to v1.3.6 * Volker Tanger - added MAILQ_CHECK to control the length of the mail queue to localcheck.functions - main MoSSHe script: safer removement of stale stat files - logs completed checks only to stat files - no longer incomplete stats - showall.PY with added date of last check ------------------------------------------------------------------------ May 2005 v1.3.4 to v1.3.5 * Volker Tanger - loosened filtering RegEx to allow pathnames in disk check - changed disk size base for localcheck.embedded to KILObytes instead of MEGAbytes. - set 30 seconds timeout for HTTP/HTTPS checks ------------------------------------------------------------------------ May 2005 v1.3.3 to v1.3.4 * Volker Tanger !!! changed network traffic from byte/sec to the more usued kbit/s - added FTP service check - added HTTPS service check - added localcheck.embedded_functions for appliances using busybox shell/command replacement - added template for known appliances to download: Innominate mGuard - nicened outputs from LOCALCHECK ------------------------------------------------------------------------ May 2005 v1.3.2 to v1.3.3 * Volker Tanger - added network checks (usage, errors) - added template for known appliances for download: Astaro - webscripts ignore empty lines in servers.conf ------------------------------------------------------------------------ March 2005 v1.3.1 to v1.3.2 * Volker Tanger - corrected overzealous check filter in mosshe.singlerun ------------------------------------------------------------------------ January 2005 v1.3.0 to v1.3.1 * Eduardo Grosclaude - corrected Volker's bugs in RaidCheck() ------------------------------------------------------------------------ January 2005 v1.2.9 to v1.3.0 * Volker Tanger - added "changes only" notification/alert - stale server status files (i.e. when server removed from monitoring) will be automatically removed * Eduardo Grosclaude - added Linux-SoftRAID monitoring for LOCALCHECK ------------------------------------------------------------------------ December 2004 v1.2.8 to v1.2.9 * Volker Tanger - improved HTTP check in way easier checking for URLs which unfortunately breaks compatibility... - removed bug that disabled logging ($LOGFILE was not set) ------------------------------------------------------------------------ December 2004 v1.2.7 to v1.2.8 * Volker Tanger - removed bug introduced with 1.2.7 when checking for set environment paths - *OUCH* RegExp gone bad removed all check results. Corrected. Sorry guys, must have slipped somewhere... ------------------------------------------------------------------------ December 2004 v1.2.6 to v1.2.7 * Volker Tanger - split MOSSHE into three parts - config, loop and single check, which helps testing setups. ------------------------------------------------------------------------ November 2004 v1.2.5 to v1.2.6 * Volker Tanger - repaired quite some bugs in snmp check, now numeric values are set for all fields. Changed snmp.* parameter files accordingly - improved MBCheck in localcheck.functions ------------------------------------------------------------------------ November 2004 v1.2.4 to v1.2.5 * Volker Tanger - quite some bugs in localcheck.functions (introduced with helper binary checks - usually missing "fi"s) - nailed down the path to "localcheck.functions" - added check result sanitizing within mosshe script - corrected missing UNDEF listing in web interface(s) and mosshe - corrected SMTP check behaviour when service completely down * Nicholas Fechner - improved HTTP check - now you can define expected returns (e.g. "302 Document moved") as OK, too. ------------------------------------------------------------------------ November 2004 v1.2.2 to v1.2.4 * Volker Tanger - added safety checks for checks that need aditional/external programs (smmp, localcheck.tempcheck) * Eduardo Grosclaude - corrected bug in alert handling (line 89) - added MBMonCheck (CPU temperatur, fan speed, etc...) ------------------------------------------------------------------------ November 2004 v1.2.1 to v1.2.2 * Volker Tanger - corrected SSH / SSH1 checks, did not notice server/SSH gone completely. Will now return "UNDEF" when gone. - plit "localcheck" into a config and a funtions part to make updates much easier (to "just copy"). + Eduardo Grosclaude - updated README re. localcheck.include that was split in the last version - corrected cron_mosshe to enhance compatibility - corrected CheckMem funtion in loclcheck - added comments for the SSH tests ------------------------------------------------------------------------ November 2004 v1.2.2 to v1.2.3 * Volker Tanger - added "UNDEF" status to global_alert and mosshe ------------------------------------------------------------------------ October 2004 v1.2.0 to v1.2.1 * Volker Tanger - corrected HTTP check, did not notice a server gone completely - Added "UNDEF" alerting to alerts and web interface ------------------------------------------------------------------------ October 2004 v1.1.7 to v1.2.0 * Volker Tanger - corrected PrintCheck warning output (did not show limit) - delete status files of servers no longer in use - changed alert basics so we can add other alerts soon... ------------------------------------------------------------------------ October 2004 v1.1.6 to v1.1.7 * Yann Pilpre - increased portability of disk check in LOCALCHECK when using "long" devices like /dev/ide/host0/bus0/target0/lun0/part7 ------------------------------------------------------------------------ October 2004 v1.1.5 to v1.1.6 * Volker Tanger - corrected disk check in LOCALCHECK so hanging network mounts do not hang MOSSHE ------------------------------------------------------------------------ October 2004 v1.1.4 to v1.1.5 * Volker Tanger - added SSH V.1 remote check ------------------------------------------------------------------------ October 2004 v1.1.3 to v1.1.4 * Volker Tanger - added -n flag to PING to make independent on DNS - added DNS service test - added SAMBA service check - added printer and memory checks to LOCALCHECK ------------------------------------------------------------------------ September 2004 v1.1.2 to v1.1.3 * Volker Tanger - added a total count of checks/servers to SHOWALL overview - increased compatibility by changing shell integer handling * Chad Lepto - suggested adding Logs directory to archive - currected use of HEAD command line parameters - added compatibility documentation hints for README ------------------------------------------------------------------------ September 2004 v1.1.1 to v1.1.2 * Volker Tanger - added IMAP2 and IMAP3 checks ------------------------------------------------------------------------ September 2004 v1.1.0 to v1.1.1 * Volker Tanger - fixed small variable initialization error in tactical.oy (webinterface) ------------------------------------------------------------------------ August 2004 v1.0.4 to v1.1.0 * Volker Tanger - logfile rotated daily - added SNMP queries - trend showing min/avg/max values ------------------------------------------------------------------------ August 2004 v1.0.3 to v1.0.4 * Volker Tanger - parametric command (http.tomcat) - PING with execute-time (well, better than nothing...) - central config file(s) for webview - number of shells listed as numeric value - show mount point in addition to "Disk Free" message ------------------------------------------------------------------------ August 2004 v1.0.2 to v1.0.3 * Volker Tanger - changed PING behaviour to do retries, resulting in less false positives. A single ICMP packet is lost sometimes, after all... - corrected tactical.py - missing key in "stati" dictionary ------------------------------------------------------------------------ August 2004 v1.0.1 to v1.0.2 * Volker Tanger - removed (irrelevant but annoying) bug with VERSION setting - removed bug in shell check - removed absolute (and programming-specific) path for HTTP check - added cron handler ------------------------------------------------------------------------ July 2004 v1.0.0 to v1.0.1 * Volker Tanger - added documentation and published ------------------------------------------------------------------------ late 2003 (started the project) * Volker Tanger - wrote the "core system" and mail alerts - started into the web interface ------------------------------------------------------------------------