Q&A from Webinar "How to Reduce Latency and Boost I/O Performance on SQL Server"
There were a lot of great questions during the MSSQLTips.com webinar sponsored by Tegile entitled "How to Reduce Latency and Boost Your SQL Server Performance". You can find links to the slides, Transact-SQL scripts, and video of the webcast in my previous blog post.
Questions About Flash, SSD, and Tegile
Our friends at Tegile sponsored the webinar. And on top of sponsoring the webinar, they were giving away a PS4 with all kinds of cool Star Wars branding all over it. Good times! =^)
Got a question? Drop an email to my friend Sumeet Bansal at email@example.com.
Want to request a no-obligation demo of the super-fast and very affordable Tegile Array? Go to
General SQL Server Questions
Q: Antonio asked "About keys, how many should or shouldn't be set in order to keep the tables in best performance?"
A: The art of physical database design and the placement of indexes is no simple task. Rather than craft a multi-page answer of my own, I'll simply point you to my favorite resource, the blogs of Paul Randal (b | t) and his team at SQLskills.com. I find that this blog entry on Cluster Key Size goes heart of your question, but you might enjoy reading multiple entries in from SQLskills on Indexes from Every Angle. But don't forget the general rules of thumb when choosing a clustered index key. It should be, if possible: 1) Narrow, 2) Static, 3) Ever-increasing, and 4) Unique. Btw, do NOT choose a GUID for your clustered key. Enough said.
Q: Chris and Martin both asked "What if your Data and Log files are all on the same SAN, does it matter if they are on separate "drives?"
A: It depends. If you're running on newer SAN technology, say no more than 2-3 years old, then for all but the most extreme workloads it won't generally matter because the SAN's cache will even out all of the bumps in I/O performance. For many new SANs, it's possible to simply configure the I/O subsystem as JBOD (Just a Bunch of Disks) in a single LUN and let the very smart auto-tiering and management software of the SAN handle everything else. However, when we're looking at older SANs, putting the MDFs and LDFs can become an issue on very heavy workloads – the older the SAN, the more likely it is to make a measurable difference in performance. That's because bigger issue with older SANs is that they do not automatically take care of LUN configuration, nor do they have enough cache to even out the stresses of the heavy I/O workload. Please take the time to read your SAN vendor's information. Most of the high-end vendors in this space have configuration documents and benchmarks specifically for SQL Server.
In addition, there's some confusion as to whether the segregation of files should be pushed down to the LUN- or volume-level? Again, it depends on how the SAN administrator has configured the LUNs themselves. (Like above, this recommendation typically applies only on older SAN technologies that do not have strong auto-tiering features). If the LUNs do not share any given set of underlying disks with any other LUNs, then you need go no further. The LUN-level is fine in that case. If however, LUN01 is allocated 50% of Volume_AA, while LUN02 and LUN03 each have 25% of Volume_AA, then YES. Segregate the MDF and LDF file(s) all the way to the volume-level. (I'm going to post more about this in an upcoming blog post).
Q: Raghuram and Maria asked "If we are using VMware still we need to have data files and log files to be in separate drives?"
A: Ask with the previous question about placement of data and log files on a SAN, the answer is 'It depends'. (Indeed, that's almost always the case with any technology question). One point of dependency is whether the VMs have their storage on a SAN. If so, then refer to question above. If not, you should probably do some I/O tuning. And as above, you should definitely read the vendor's documentation on the subject. For example, this VMware support article called Tips for Configuring Microsoft SQL Server in a Virtual Machine says …
Separate data and logs on different physical disks:
A standard RDBMS best practice is to place SQL data and logs onto separate physical disks. This ensures that I/O intensive logging does not interfere with data queries.
In a virtual machine, create one (or more) .vmdk(s) for data and another for logs. Pre-allocate disk space when you create the virtual disks.
Note: An auto-growing .vmdk sacrifices some performance for the added flexibility.
Put the .vmdk on a different LUN to spread activity across multiple spindles. This allows sequential writes to occur as fast as possible, helping to boost I/O performance.
Q: Harsha asks "Can we make data IO to be sequential just like to log to get best throughput?"
A: I think you're misunderstanding the fundamental premise of the randomized I/O behavior of the data files compared to the sequential I/O behavior of the log file. Forcing the data files to behave in a sequential fashion will NOT give you the best throughput in 99.9% of applications. Why? Because there is more than one user and/or process active in that database. The only time doing so would benefit an application would be if it had a single user or process. So, in that case, you could probably get close to what you're asking, which is to put a given database into single-user mode. But whose application can support that kind of behavior? Very, very few. Remember – SQL Server's I/O behavior has been refined over decades of research and field testing. It's best to let it retain its default functionality unless you have powerful, empirical evidence that an alternative will perform better.
Q: Luis asks "Where does MAXDOP fit in all of this? And how do you configure this effectively
A: This is just my opinion, but I'd say that it does not really make a difference. MAXDOP is the maximum degree of parallelism that SQL Server allows on operations which can be multi-threaded. So let's say you have an 8-core SQL Server and you've just issued a SELECT statement that's going to retrieve 1m rows. And the query optimizer thinks it's going to take 8 minutes to complete. SQL Server, under default settings, is allowed to use of to N-1 cores (n = the total number of cores it has access to). So let's say that SQL Server decides to parallelize the query across 4 of the CPUs available to it out of the 7 (8 cores – 1 for the OS) which it could normally use. Now, due to multi-threading, it completes in just over 4 minutes since it sliced up the work 4 ways. In this example, the query did complete in about half the time as the unparallelized version of the query. However, it had the same overall I/O and had to churn through the same number of data pages in the MDF and write the same amount of details in the LDF. So, all in all, the total I/O was about the same.
To go one step further, I don't usually tinker with the default setting for MAXDOP. Most experts agree with this stance, for example, in this article from SQLskills.com. There is a bit of nuance to my thinking, in that I almost always tell SQL Server to be less aggressive with parallelism by changing the default setting of Cost Threshold for Parallelism (CTP) from 5 to 30. If I still believe SQL Server is parallelizing too much, I set CTP to 60. This setting allows SQL Server to use as many parallel threads as it might want, but it only parallelizes batches that it thinks will take greater than 5 seconds to complete (default), or 30 seconds to complete (my default), or 60 seconds to complete (my typical higher-end setting). In other words, SQL Server CTP by default says "Parallelize everything that might take long to process!", while I say "Parallelize any thing that takes a pretty long / really long time to process." Make sense?
Q: Nagesh asks Index Optimizing & Integrity Jobs Question: Our SQL Server jobs are taking more time running Index Optimizing & Integrity jobs. Will it be helpful to divide these jobs to run from a different SQL Server.
A:No. The index maintenance always takes place on the SQL Server where the indexes are being maintained, not on the server the job is launched from. So dividing the jobs across many servers won't help at all. OTOH, there's no reason you can't start multiple concurrent index maintenance jobs concurrently. For example, let's say you have a database with lots of small tables and three really big tables where most of the transactions happen. You might want to schedule one task to groom all of the small tables, one after the other, while three other jobs handle one of the big tables each. That way, they complete much faster than a single job that processes all of the tables consecutively.
A couple quick tips. First, don't bother with updating statistics after rebuilding your clustered index. Rebuilding a clustered index automatically updates statistics. So you're wasting time and processing power if you do both processes. Second, depending on the version and edition of SQL Server you're using, many maintenance processes are single threaded. So make sure you do not schedule more maintenance processes that run simultaneously than you have CPUs on the SQL Server.
Q: Martin asks "What's the best way to measure I/O? Are there other ways than perfmon?
A: Make sure you download the Transact-SQL demo scripts I showed in the presentation. They will help you measure most everything related to I/O that you'll need. If you haven't already seen them, they're at THIS BLOG POST.
Of course, I'd be remiss if I didn't point out all the reasons you should try our monitoring product, Performance Adviser, over and above the scripts. When it comes to I/O monitoring, you'll get three years of historic information (by default), built-in correlation to the specific SQL statement, job, or process that caused the I/O issues you're interested in, baselining so you know what normal I/O is for your SQL Servers, and powerful alerting for those conditions when you need to get in there and fix things (often before the end-users ever notice). Grab a free trial version!
Q: Tim asks "How do we get check point to go faster than 1 minute?"
A: Needing that capability is a very rare situation indeed. I've been working with SQL Server since 1995 (good grief, that's two decades!!!) and have only seen it used in one incredibly high-end production system. (You will see it from time to time in benchmarking disclaimers for the industry-standard TPC benchmarks).
You can adjust the default checkpoint frequency by changing the Recover Interval configuration setting. The setting represents the number of minutes that pass before an automatic checkpoint is issued and the default is 1. SQL Server itself might issue more than one checkpoint per minute if the situation warrants. But the only way to checkpoint more often than once per minute without relying on automated behavior is to manually issue the CHECKPOINT statement. However, tinkering with checkpoints is fairly advanced stuff. Don't do it lightly and read all about it before you end up accidentally making performance works, hurting recoverability (since it has a strong effect on transaction logging), or otherwise shooting yourself in the foot. Start with the SQLskills article on How Do Checkpoints Work.
Q:George asks "How much mem should you set aside for the OS in a SQL box?
A:I'm going to skip all the caveats and details to give you my easy rule of thumb. Unless I have a really busy system, I try to leave 2Gb for the Windows OS. Let me know if you have a very busy system that needs memory tuning and, if so, I'll follow-up with a detailed blog just for that topic.
Q:George is back with another question "How can you speed up XML queries? Indexing has not worked!"
A:Querying XML using SQL is known to be slow, hard to tune, and resource intensive. And I should point out that I'm no XML expert by any stretch. You can try things like XML indices and XML typed columns. However, my colleagues and friends who ARE experts in XML tell me to make sure you're using FOR XML before OPENXML, and OPENXML before nodes(). AFAIK, there's not a lot of empirical testing on the topic. So I'm probably propagating marginal advice. Take it with a grain of salt and do some tests for each different approach. This is especially important because your mileage may vary by version, edition, and event the Service Pack of your SQL Server.
Q: Tiger Li asks "You have any idea on SharePoint Database tuning?"
A: I'm rather knowledgeable about SharePoint since it runs on top of SQL Server. I'd even done a couple presentations on this exact topic a couple releases ago. However, I haven't done a good job keeping up with all of the new technology. My normal recommendation is to simply use SharePoint in the cloud so that you never have to mess with all of the backend annoyances that are par for the course in SharePoint. However you didn't mention whether your situation is in the cloud or on-premises, I'll point you to a true expert – Daniel Glenn (b | t) – who has posted a slide deck which you can download called Tuning SQL Server for SharePoint. Hope that helps!
Well, there was loads of random I/O-related advice in this blog post. I hope it helps answer some of your questions or solves some problems that are vexing you. Got more question about SQL Server I/O? Sound off here and I'll do my best to answer. As I mentioned, I'm planning some more posts on the topic soon.