This post has been updated to include a link to our Strategies and Best Practices for Virtualizing SQL Server guide.
(This is a branch off of my overall v9.0 announcement post.) Enabling VMware virtualization counters in SQL Sentry is quite simple; all you need is read-only access to a vCenter (vROPS) server. Start by ensuring the virtualization hierarchy is visible in the SQL Sentry Navigator.
Then right click the VMware folder, and select "Add vCenter."
You'll need a few standard bits of information, such as the server, port, and login information. You'll also have the opportunity to specify which Site is responsible for collecting data from vCenter.
Clicking OK will perform some connection tests; once those pass, the vCenter will be registered, and will start collecting configuration and performance data. One important thing to remember is that, if you use SSPI authentication, both you and the SQL Sentry monitoring service account will need to have at least read-only access to the vCenter server. If you don't already have it, the VMware admin at your company can grant this to you.
Virtual Machine Enhancements
Now that the vCenter is registered, any virtual machine that SQL Sentry is monitoring from that environment will have some additional information available. To start, the Performance Advisor dashboard now has more information in the label area at the top. We can see that we’re viewing a VMware VM, as well as what host is currently managing the VM.
We also have some additional performance metrics. For these, we wanted to display them in a way that reflects how the virtualization layer can affect performance in the context of a specific, relevant resource. For the CPU USAGE area, we've added both Co-Stop and Ready Time. These are two important measurements related to the amount of time virtual CPUs are waiting for physical processor time on the host. Both metrics are converted to a percentage and displayed on top of the CPU Usage charts. This way, Ready Time can tell you that there is a certain level of CPU wait time, even when CPU utilization is low. The same is true for Co-Stop, except that it is tied more to VMs with multiple virtual CPUs. While higher Ready Time (> 3%) might indicate over-commitment of host CPU, higher Co-Stop (> 3%) might indicate that a particular VM would run better with fewer virtual CPUs. Note also that Co-Stop might be naturally elevated during snapshot activities.
In the above screen capture, we created heavy load on both the VMware host and the virtual machine. You can safely assume that while we maintained around 80% CPU utilization on the virtual machine, the few minutes of heavy load on the host had major impact on the actual performance of the virtual machine. Having this information presented this way could completely change how you approach troubleshooting a performance bottleneck. Another area where we commonly see issues for virtualized SQL Servers is related to memory management. Since SQL Server performance so often depends on memory being available in the buffer pool, it’s important to know when something is affecting the amount of memory available. There is the concept of memory ballooning which involves the host reclaiming memory for one purpose or another. For VMware this happens more often when the host is under memory pressure due to over-commitment of memory resources. In SQL Sentry v9.0, we've added the ballooned memory category to system memory on the Performance Advisor dashboard.
As you can see in the sample view shown above, we display this as a portion taken out of "Other" memory. The reason for this is that, on the virtual machine, it will appear as memory being take up by a driver. In history view, you can see the ballooned memory at the top to indicate ballooning creates downward pressure on the amount memory that is readily available to the server.
The values we're using for these visualizations are available in custom performance reports providing an easy way to report on history or create comparison reports.
They can also be used in custom conditions with metrics related to other areas as well. Here I'm testing to see if we have correlation between increased memory ballooning and decreased SQL Server Page Life Expectancy.
Until next time,