So last year was a big one for me as a software developer……first last fall was HDC 2008 in Omaha which exceeded my expectations again and gave me 4 new specific technologies that I wanted to apply at work
- Velocity – Microsofts framework for distributed caching
- SQL Server 2008 Integrated Services new enhancements – specifically parallel processing, C# scripting, and new Data Profiling tasks
- New version of SQL Server 2008 ReportBuilder
Then it was off to PDC 2008 – in Los Angeles, an incredible opportunity that I am very thankful to my employer Innovative Systems for where the new Parallel Processing features in .NET and new Team System enhancements caught my eye the most.
Then back to work, and an extremely busy 1st 5 months of the year so far and a nagging feeling that I haven’t applied much of my new knowledge yet – life was definitely getting in the way and now I was determined to start studying for the Microsoft Business Intelligence Exam and get that passed this year – and thats when, strangely, things started to sync up…..wouldn’t you know it that the first 4 chapters of the Exam prep book cover SSIS and the 2nd main topic that I wanted to dig into from HDC 2008 was the SSIS topics that Kent Tegels covered expertly in his HDC talks last fall.
Side topic – Over the years I had run into books Kent Tegels had written but had not actually seen a presentation by him before last fall – well, I highly recommend anyone interested in database-centric topics to not miss presentations he gives – he certainly gives the developer community alot in both sharing knowledge and in doing so with an effective presentation style – and if he will be presenting in SF at user groups – I’ll certainly try to get there as often as I can. And it excites me that he is part of a core group bringing .NET User Groups to SF.
But back to SSIS – lately my job has been introducing me to data and formats from all types of platforms and clients. Understanding what the data represents to different people, understanding their thinking into how and what they stored as data is HUGE and usually 80% of the entire task……the remaining work is trivial compared to data investigation and knowledge. So tools that help along this line are great – and if they are easy to use, and re-use – then they become awesome and soon part of your everyday routine which is the highest praise any utility or tool can earn in our field.
Kent had demonstrated the new SSIS Data Profiling utility at HDC and I loved it and started playing with it today while preparing for the exam. Running this tool against a set of foreign data imported into SQL Server tables, I could start reviewing each of the tables and get a feel for the types of data represented by each table – the distribution of values, the percentage of NULL values, the length of fields, and then the ability to drill down into specific outliers….for example, if the Zipcode field usually contains data of 5 or 9 characters (lets just assume we are processing US addresses) but the tool shows me the length of values varies with some values having 5, 6, 8, 9 character zipcodes – now I can quickly drilldown into these outliers – and better yet, if we are doing a data discovery conference call, I can quickly find and ask about the outliers during the call since all of the data is well displayed inside the SSIS Data Profile Viewer. Love it and I will be using it- my only complaint so far is the lack of a good reporting tool for the results – I may be missing something so if you find somethings that allows some good summarized reports from the Data Profiler Viewer – I’d appreciate it.. So the next thing I’ll be doing is spreading the word about it – just as Kent shared it with us at HDC, I’ll be sharing it at work and wanted to share it with you as well except there are some people who can do that far better and since I used them as a resource, I want to pass them on to you as a resource as well
Specifically Jamie Thomson published a series of great detailed explanations of each of the features of the Data Profiling Task here and with far better screenshots than I am able to (I really struggle getting and formatting good images inserted into the blog pages well – of course, not taking the proper time to read any documentation certainly is helping my cause). But why reinvent the wheel – Jamie’s blog does a great job digging into each of the features nd I’ll be keeping a link to it for future reference.
But if I find more details and tidbits about SSIS Data Profiling not covered in these blog posts – then I’ll share them here
Oh yeah – about forgot to explain the blogs title today….am reminded by a favorite quote of mine – "you don’t get credit for correctly predicting floods, you only get credit if you build an ark for that flood"….I like that, having knowledge and applying knowledge are 2 very different things which is why interviewing software developers can be extremely difficult……..my other favorite quote along that line is "great plans without any execution is no different than a hallucination" which leads me to my other favorite saying I saw on a sign at a craft show a few years ago "Sure, you could build it, but will you?"…I can just imagine how many times a day a craft shop owner hears people mumbling to each other "we could make that ourself"……yes…knowledge and the application of knowledge are 2 far different things.
(PS – What things have been taking my time this first part of the year? Co-ed volleyball, Den Leader for Cub Scouts, Soccer coach, and every other event associated with having 2 grade school children. So while its been a crazy, busy first half of the year so far – no regrets – its just with our industry changing so much each year – I am contantly reminded of Jeff Atwood’s blog from already 3 years ago which I keep a copy of near my desk Everything You Know will Be Obsolete in Five Years – that was a great article and I welcome seeing subseqent and related articles by Scott Hanselman and others on topics like Sharpening the Saw, and again discussed at Coding Horror just this year………if you are a software developer and don’t grok that – your career may be short or very limited in scope – 2 things I will be attempting to avoid. Until next time.