Actively Monitoring Disk Free Space By Rainer Gerhards Article Date: 2004-07-22
Why care about disk free space?
The obvious answer is that low free space means upcoming problems, like the inability to receive mail (for mail servers) or the inability to store new files (for file servers). There are numerous obvious reasons why free space is an operations management priority. But there are also less obvious reasons: disk space shortage may be caused by a process running wild. Sometimes space consumption is the only warning indicator in such a case. Also, intruders may be the cause of low disk space conditions. For example, movie pirates often break into public servers and mis-use them as FTP servers for pirated videos. As videos are large, this can cause a sharp decrease in disk free space. In this article I primarily address the operations management needs. Obviously, the security benefits come as a side-effect. But don’t rely purely on what I am presenting here if you would like to takle the security side of disk free space. In the article, I will first convey the idea of what can be done and then I will also provide a potential solution using Adiscon’s MonitorWare Agent software.
Shortage on disk space does not (necessarily) come in an instant. Typically, free space decreases by a little every day. If left undetected, some day no space may be left at all. This is where we start at. In my point of view, a good disk space monitoring script must work with at least two thresholds:
- disk space is low, but still acceptable
- disk space is too low, problems will occur very soon (or already exist)
The first level is a warning level, the second level is a real error level. In a typical setup, the warning level may not cause any big action. Typically, a notification email is sent to the administrator and that’s it. Again, in a typcial sample, the error level eventually causes more serious action. Now, the warning message may be sent to a pager email address. But a good disk space monitoring solution might also initiate some corrective action. For example, on a file server, many temporary files may fill up the disk. It may be agreed policy that such files (and eventually .bak backup files) can be automatically deleted – without asking each user. If so, a script can be started that tries to delete as many temporary files as possible, thereby freeing up disk space. In an optimal case, such a script may even delete enough space to recover from the very-low disk space condition. Ideally, it would even recover from the warning level, too. Now let’s consider that the very-low space condition triggered a pager alarm to the administrator. Poor John Admin is at the beach when his pager beeps. Too bad… Now consider he jumps off the beach and drives into his data center … just to see that the configured auto-action has already solved the issue. How would you feel in John’s place? I bet you’d be really happy and go back to the beach,wouldn’t you? I also guess you would have been even happier when the system had notified you that the low space condition was solved. So this is one more thing that we need to do within our free space monitoring: not only send an alert when things go worse, but also send an alert when the system has recovered from such a condition. Please note that the recovery case may even happen if no corrective action has been configured – just imagine a file server: a user may copy a hughe file set just to try something out. Later, he himself deletes it. Again, the low space condition is solved. Finally, a monitoring solution should only notify you once when the problem occurs and not continously (yes, I have seen solutions which do it ever and ever again…). The same goes for the "recovered" message, which obviously should only be sent once and only after a problem message has been sent first. So to sum up, a good disk free space monitoring solution must provide:
- at least two thresholds for disk space shortages
- notifications that only occur ones these thresholds are crossed
- optionally automatically-triggered corrective actions
- notifications when the shortage conditions have been triggered
Of course, the system should be able to send different types of notifications. For example, you may want to send some of these via email while others are forwarded to a pager or a simple "net send" type notifications.
A potential Solution
As always in life, there are many ways to implement the disk space monitor. I am using a solution based on Adiscon’s MonitorWare Agent here. This is because it is a good fit to our requested functionality and it is also easy to setup and run. MonitorWare Agent is a multi-monitoring solution. It can monitor Windows Event logs, syslog devices, databases, files … and disk space. With MonitorWare, we create a so-called disk space monitor which then is bound to a "rule set". The disk space monitor is the part actually checking disk free space. It does this in intervals. Each time, it creates an event, which includes the free space information. That event is then passed to the rule set, where the actuall processing takes place. This is where we implement our requirements. Inside the rule set, we just need a few rules to create our scenario. Basically, we utilize MonitorWare Agent’s status variables to keep track if we have a low or a very-low space condition. With this knowledge, we check the disk space report. If it is below the thresholds and the status variable is not yet set, we create an alert (and potentially action) and set the status variable. Similarily, when free space goes up, we check if we had one of the low conditions and, if so, create another alert. We utilize MonitorWare Agent’s other action types to start the low space recovery script. Of course, I could provide you with detailled setup instructions here and also include numerous screen shots. But this article should not become a product manual… For your convenience, though, I have created a the configuration with MonitorWare Agent. You can simply download it and try it yourself. I’ve placed plenty of comments inside the rule set in that configuration. If you review the comments, you will know pretty well what I have been doing.
2004-07-22 Initial version created. 2004-10-19 Updated sample and added hyperlink to it.
Rainer Gerhards Adiscon GmbH rgerhards @ adiscon.com www.adiscon.com
The information within this paper may change without notice. Use of this information constitutes acceptance for use in an AS IS condition. There are NO warranties with regard to this information. In no event shall the author be liable for any damages whatsoever arising out of or in connection with the use or spread of this information. Any use of this information is at the user’s own risk.