When Should You Get a Data Warehouse?
Here's a question that we get asked rather often: "when should we consider getting a data warehouse?"
The people who pose this question to us tend to be small business owners or startup operators. Typically, these companies are starting to consider better data analytics in their operations for the very first time. Perhaps they do their financials in Excel, or their site analysis in Mixpanel or Google Analytics. The thought of combining these data sources into a single store of data seems like a scary amount of work to them.
This blog post will explain when and why to consider purchasing a data warehouse. As it turns out, this decision is a lot simpler than you'd think!
The Four Reasons for Getting a Data Warehouse
There are really only four good reasons for getting a data warehouse.
First, you should get a data warehouse if you need to analyse data from different sources. At some point in your company's life, you would need to combine data from different internal tools in order to make better, more informed business decisions.
For instance, you might want to track your most valuable customers on a weekly basis — which requires you to combine payment information from your credit card processor, financial information from your accounting system, and the activity data your customers generate within your product. This is a lot easier to do if your data is located in one central location than if you were to go to three separate places for analysis.
The second reason you should get a data warehouse is if you need to separate your analytical data from your transactional data.
If you collect activity logs or other potentially useful pieces of information in your app, it's probably not a good idea to store this data in your app's database and have your analysts work on the production database directly. Instead, it's a much better idea to purchase a data warehouse — one that's designed for complex querying — and transfer the analytical data there instead. That way, the performance of your app isn't affected by your analytics work.
The third reason you should get a data warehouse is if your original data source is not suitable for querying.
For example, the vast majority of business intelligence (BI) tools do not work well with NoSQL data stores like MongoDB. This means that applications that use MongoDB on the backend need their analytical data to be transferred to a data warehouse, in order for data analysts to work effectively with it.
Fourth, you should get a data warehouse if you want to increase the performance of your most-used analytical queries.
If your transactional data consists of hundreds of thousands of rows, it's probably a good idea to create summary tables that aggregate that data into a more queryable form. Not doing so will cause queries to be incredibly slow — not to mention having them being an unnecessary burden on your database.
These four reasons leads us to our second, follow-up question: if you're convinced that one of the above reasons apply to your company, which data warehouse should you purchase?
Which Data Warehouse Should You Get?
This question turns out to have a simple answer.
What we tell many of our prospective clients to do is to go to one of the three major cloud providers — that is, Google Cloud, Amazon Web Services, or Microsoft Azure — and pick their cloud-based data warehousing solution. It doesn't really matter which it is; they're all decent.
For Google Cloud, this is BigQuery. For AWS, this is Redshift. For Azure, this is Azure SQL Data Warehouse.
To be clear, there are tradeoffs between the three options. But if your company currently uses one of the three platforms, it would probably do to just pick the data warehousing solution for that platform and call it a day.
Now, don't get us wrong: we aren't making this recommendation lightly. The truth is that cloud data warehousing solutions today are simply a lot more advanced than in the past. Just a decade ago, this answer would've involved a very convoluted comparison of various expensive software products.
But that situation has changed radically:
- Modern cloud data warehouses are very powerful, given the development of 'massively parallel processing' (MPP) systems. All three solutions that we recommend above are MPP databases.
- They are extremely cost effective — you only pay for what you use. This stands in stark contrast to data warehousing solutions of the past, which required you to commit thousands of dollars in fees up-front.
- They are compatible with many BI tools — making it easier for you to pick your setup later.
- And, as we've mentioned before — if you're a small company that's getting started in data analytics, it really is a no-brainer to just set up your data warehouse in the cloud in a matter of minutes. You can spend the rest of your time evaluating the BI tools that will operate on top of the warehouse.
Now, how about the other cloud data warehousing solutions out there? Some of our customers are extremely happy users of Snowflake — an MPP data warehousing solution with a focus on sheer performance. We think that Snowflake is an amazing product — but, again, if you're new to data analytics, we think you should keep things simple, at least for now.
After all, when you're just starting out with your data analytics efforts, it's more important for you to deliver business value than it is for you to pick the best possible platform for your company's data needs. You can always rearchitect later, when it becomes clear that your needs are specific to a certain type of warehousing solution.
Last, but not least: if you'd like to purchase a single soup-to-nuts data analytics platform that's designed to grow with your business's needs, we hope you'd consider checking us out over at Holistics. We've helped hundreds of small companies scale up their data analytics practices, from small five-person teams to unicorns at thousands of employees. We think you'll enjoy our pragmatic approach to BI; learn more about us here.