This month’s T-SQL Tuesday is hosted by friend and Zen Master Chris Yates (b/t). Since this T-SQL Tuesday falls on Chris' birthday, he's made it a present to all of us. Basically write about something you want to learn/teach/have/whatever in SQL Server. Since I've spent most of this year studying, I'm going to write about what I want to learn.
Some back story – at the PASS Summit last year, I had the good fortune (or possible misfortune) to spend time talking with Buck Woody[b/t] from the Microsoft Data Science and Machine Learning team. I've always been interested in the idea of data analytics and what does all of this stuff that we're collecting mean. Our conversation just happened to take place in a time when I was up for a challenge. So with some ideas from Buck and some research on my own, I've started down the data science path.
Since my bachelor's degree was entirely liberal arts, I deftly got around taking really any math at all. The last few months have been a combination of learning R as well as going back and learning the statistics and all of the math in between that I've avoided until now. Strangely enough, I've really enjoyed it. I also had the opportunity to attend the inaugural MS Data Insights Summit last March.
Now (finally) to the point of this T-SQL Tuesday. I'm really looking forward to using the R-Services that are available in SQL Server 2016. R along with add-on libraries does some pretty amazing data analysis and visualization on its own (see some here), but there are some limitations. Working with large data sets in R can be problematic because all of the work is done in resident memory. I have an okay sized laptop, but I might want to analyze data sets that are larger than the memory set available. Also, the common implementation is single threaded, which can make things difficult.
With the SQL Server R Services available in SQL Server 2016, there are components that allow you to make use of the resources on the SQL Server instance for heavy calculations and working with large data sets. Additionally, obviously, you have access to the data that you're analyzing, and you can build R modules into stored procedures. R uses data frames, which are a similar concept to the tables that we're already used to working with.
It'll probably be a little while yet until I get working with the R services, but I'm really excited about going down that path soon.