It’s July, the last real month of winter, and the start of the second half of 2022 (Holy Smokes!). So here we are, on the 2nd Tuesday of July, and another chance to contribute to the #sqlcommunity, and to write about the all things SQL Family related. Thanks, as always, to Adam Machanic (t|b) for starting T-SQL Tuesday, and to Steve Jones (t|b), for his contribution (and website).
Pull out your soapbox and tell us all about it.From the T-SQL Tuesday invitation
I have lots of rants, but there’s one thing that I am going to moan about today, because this causes frustration for the data professionals, the business, and the end customers. Simply put, NOT doing this will only cause you and everyone that you interact with, pain. And judge-y giraffe will judge you!
ONE DATA MODEL PER BUSINESS AREA!
I cannot stress this enough. If people are consuming your data in multiple places, the data needs to come from the same data model. That can be an Enterprise Data Warehouse, a Data Mart, a Power BI Model, or any other data source, but at some point you need to be able to track the data back to a single place. If you don’t do this, you will spend THE REST OF YOUR DAYS explaining the differences between the data models to business and customers, and reconciling the differences over and over again.
Here are some of the reasons why I feel so passionate about this topic
Customers will compare data.
It’s a well known fact that the most used function in any business intelligence application is , “Export To Excel”. It follows that customers are going to import data from multiple reports, and try to build their own enhanced “models”. And at some point, someone is going to ask the dreaded question, “Why don’t these numbers align?”. When this happens, you need to be able to explain the exact data lineage of the two (or more) reports WITHOUT having to deconstruct multiple ETL processes, or track down documentation that may have been written many years ago, or may not have been written at all.
Your data will get out of sync
Change happens! Seriously, you can build the most complete, brilliant data model that the world has ever seen, and at some point, someone will want you to change some part of the data flow from source. If you have a single model, this is a fairly simple process, but if you have to track down all the other data sources that have repeated some business rule, source system exception, or other hidden domain knowledge, this task becomes exponentially harder. Not only do you have to know all the processes that may be affected, but you also have to account for potential small differences that were introduced when the second, third, etc. processes were built. So not only will you have to modify multiple processes, but you’re also going to have to test and reconcile multiple outputs against source and each other. If you have a single process for bringing data into the data model, and different outputs just use aggregations, filters etc. to deal with different requirements, then the entire process is a simple one.
One day, you will move on from your current job. You’ll move to a different department, a different company, a different country, or just a different phase of your life. And the person who takes over the data model will not have the same knowledge of the process that you did. Learning the intricacies of a single process is hard enough. Learning 2 or more, and trying to understand why there are multiple, when to use each one, etc. is a monumental task. So do it for the person who takes over from you.
To paraphrase Monty Python
First shall thou decide to build a data model, and then shall you ensure that you build only one. One shall be the number of the data models, and the number of the data models shall be one. Two shalt thou not build, neither build thou three. Five is RIGHT OUT! Once data model 1, being the first data model is built, then lobbest thou the data model at business, who, having clean and accurate data, shall love it.Monty, the Python Developer