T-SQL Tuesday #66 – Monitoring
When I started as a DBA, I equated monitoring with alerting. I put everything that I might ever need to know in an alert and, as you can imagine, I received far too many alerts. I've always thought that too many alerts are just as bad as no alerts: It's too much to take in and they start getting ignored.
Over time, though, I looked at monitoring in a broader fashion. Looking at Dictionary.com, one of the definitions of monitor is "to watch closely for purposes of control, surveillance, etc.; keep track of; check continually." So monitoring, in addition to alerting, is to keep track of what is happening in your environment. Effective monitoring can actually reduce the number of alerts because you're able to take steps to maintain your system before it's in trouble.
Go through your environment and see what you really need to have instant alerts on, and what you can group into reports. At a previous job, there were scheduled jobs that would fail, but none so serious that it would impact the business. I created a report that e-mailed a list of all of the failed jobs from the previous day. A single e-mail listing the 25 failed jobs was definitely easier to manage than 25 unique alert e-mails. Over time, we decided that a few of the jobs were critical, and I created individual alerts for just those processes. In addition, each Friday, I received disk space and file space reports. I could go through and allocate space for anything that was running low, or give our SAN admins a heads up that disk space was filling up.
I wouldn't be doing my job if I didn't mention that SQL Sentry has just announced a Power BI Content pack that does most of the reporting that I mentioned above for you (you can see some other details here). One of the graphs available allows you to see disk space usage – much easier to see than the plain text report that I e-mailed myself years ago:
We also have very robust and flexible alerting in our core product suite – you can dictate which elements in your system are crucial and deserve immediate attention, and even customize the conditions that cause alerts (and who gets them) right down to the table or index level (for a lot more detail, see this post). Determining those events that truly need an alert, and differentiating them from the information that could be handled through regular on-demand reporting, can definitely define the line between an out-of-control inbox / stressed-out DBA and an environment that is calm and under control.