Today I want to take a minute to talk about gaps. Not the kind where you find trendy clothes for the young (and young at heart) folks, but the kind that you might see in a performance chart on the SQL Sentry Performance Advisor Dashboard.
I've found myself explaining this to a few customers in recent months, and I wanted to get this gap analysis down somewhere so that others may be able to find it on their own, or if they do ask I have reference material for them.
The general feeling I get from the customer base is that seeing gaps on these charts is very bad, and when you see them something must be going terribly wrong with your monitoring environment, or you are losing valuable information. After reading this post, I hope that you can take one idea away from it that will stick with you. That one idea is that gaps are good!
What are the gaps?
On the Performance Advisor history view, the area and line charts will have a hole or “gap” any time there is more than 90 seconds between 2 data points. This is explained in greater depth in one of our Q&A threads.
This could go on for quite some time, or it could be just one "slot" depending on exactly why the data points are going missing. Either way, it will appear similar to the shot below:
Gaps are good!
As I mentioned once already, the knee-jerk reaction is to think that you've lost some valuable information during that time. The reality is just the opposite though! These gaps are a well planned feature, not a problem.
There are a finite number of reasons that this can happen, and they generally involve something that needs either immediate attention, or will need attention soon.
- SQL Sentry could have been restarted, or went offline for some reason
This will manifest as a gap in ALL charts, for every monitored server in a given site.
- You've experienced transient network or network interface problems
Network communications are important to the monitoring topology. If the network or NIC is down, collections can and will be missed, and gaps will appear if it stays down long enough.
- The monitored server experienced severe resource contention
SQL Sentry uses less than 1% of system resources on a monitored server in general for performance monitoring. If the server is under so much stress that it is not even able to answer a request for performance data, there is something very wrong. You should begin troubleshooting by using SQL Sentry to determine what was happening on the server just before and just after the gap. In this case, the gap is quite literally telling you something important, by displaying nothing.
- Writing to the SQL Sentry repository could be taking too long
SQL Sentry's collection services use a series of write queues for sending batches to the repository. These write queues are capped at a maximum depth. If the maximum depth is exceeded, new values will be dropped, ultimately resulting in gaps. This is a clear indicator that your current settings are too aggressive, maintenance and/or tuning settings on the repository instance need to be looked at, or you need to scale or redistribute your monitoring environment to allow for greater throughput to the repository database.
- You experienced a cluster or AG failover.
Depending on how long the failover took, you may see a gap in charts for the Windows Server Failover Cluster (WSFC), SQL Server Failover Cluster Instance (FCI) or both.
If you experience chart gaps, and you contact SQL Sentry support, these are the things the support engineer is going to be looking for to start. Not every single case is this simple, as there are some edge cases, but the vast majority of reports concerning chart gaps turn out to be one of these things.
Why are gaps good?
In most reports, area and line charts will simply continue on from data point to data point. Performance Advisor is designed with the idea that it is supposed to be able to collect performance metrics at specified intervals. In the event that enough of those metrics are not collected, Performance Advisor will make it overly obvious that something is wrong, by creating a gap in the history charts. If we simply connected the dots, you would actually never even be aware, in most cases, that something caused data points to go missing. You would see a full chart, and you would have no reason to look any further. We believe that would be deceptive as to the true performance history of the server you are monitoring. The gaps also provide you a way to be more effective as a DBA since you will generally see them before you start getting calls about outages from customers and/or users.
If you are a SQL Sentry user, and you have experienced these chart gaps before, I hope that you will see them in a new light after having read this. We created the gaps as a way to provide you with actionable information, even though we have no real data for that time period.
Until next time,