Data Servicing — A Tale of Three Jobs
This chapter is about delivering data to your users.
There are many people-oriented considerations when it comes to data delivery, especially when compared to the topics we’ve covered in the previous three chapters. For instance, we may talk about how to structure your data team for the organization you’re in, how to push acceptance of metrics and data-oriented thinking within your company, and how to not feel like an English-to-SQL translation monkey whenever your CEO asks for metrics for the nth time.
This makes sense: when you’re delivering data to business people, it helps if we talk a little about the people end of things, not just the data end of things. The end goal of the analytics process is to discover some actionable insight about the business. It is reasonable to think that we should spend at least an equal amount of time thinking about the effectiveness of our business intelligence as we have about the infrastructure that goes into delivering it.
Alas, we do not plan to dive into many of these people-oriented topics. Our goal with this book is to give you a high-level overview of analytics stacks — no more, no less. With that goal in mind, this chapter will give you three things:
- It will explain to you certain shifts that have happened in the past three decades, so that you will have a basic historical understanding of the tools you will encounter throughout your career. This will help orient you, and prevent you from feeling lost.
- It will give you a lay-of-the-land view of the entire business intelligence reporting market. This way, you’ll be able to slot the tools into a couple of neat buckets in your head, and evaluate them accordingly.
- It will give you a taste of the evolution of reporting requirements you will see in your own company.
In some of these sections, we will discuss the people side of things, because it is inevitable that we do so. But you should understand that our focus is on giving you a lay-of-the-land orientation on this topic, and that you may find our discussion of the people side of things a little lacking. This is by design — a proper treatment of the people element will take a full book to tackle adequately!
Anyway, with that out of the way, let’s get started.
A Tale of Three Jobs
In order to understand the current landscape of Business Intelligence tools, it helps to have a rudimentary understanding of the history of BI tools. To illustrate this, we’ll return to our fictional data analyst Daniel from Chapter 3, and give you a rundown of all that he has experienced over the course of his career.
Daniel’s First Job: Cognos and the Eternal Wait
When Daniel first started in business intelligence, he landed a job at a multinational and financial services corporation named the Divine People’s Bank. ‘Multinational and financial services corporation’ is a long-winded way of saying that, yes, DPB was a bank, with all the normal trappings and regulatory requirements of a bank. Daniel started as a data analyst in the consumer banking division. He wore a shirt and tie to work.
Daniel’s day-to-day job was to deliver various performance reports to his manager, one of the VPs in the consumer banking arm. Above Daniel’s manager was the head of consumer banking at DPB. In 1996, when Daniel was hired, the head of consumer banking was rumored to be a real taskmaster, and remarkably data-driven for his time.
Daniel hated his job.
Much of DPB’s data was stored in a central data warehouse. This was a set of massive RDBMS databases that had been bought in a wave of digitization that DPB underwent in the late 80s. Daniel didn’t know what these databases were — in fact, he never interacted directly with them. Instead, an ‘IT services’ team was assigned to him, and he interacted primarily with Cognos — at the time, one of the most dominant business intelligence tools on the market.
A typical day would look like this: Daniel’s boss would ask for a set of numbers, and Daniel would go to his Cognos system to check the PowerPlay cubes that were available to him. Most of the time, the data would be in one or more cubes that had already been built by the ‘IT Services’ team (the cube would be built with a subset of the main data in the data warehouse). Daniel would point his PowerPlay client to the cubes he wanted on the bank’s internal Cognos server, and then sliced and diced the data within the cube to extract the desired numbers for his boss. Most of the time, this went out in the form of an Excel spreadsheet — because Daniel’s boss would want to do some additional analysis of his own.
(Note to less-technical readers: OLAP cubes or data cubes were efficient data structures that were optimized for data analysis. In the era that Daniel operated in, such cubes could not be very large, because many operations were done in memory; in fact, the maximum limit on Cognos PowerPlay cubes today remains at 6GB, due to internal limitations built into the data structure.)
The problems with Daniel’s job emerged whenever Daniel’s boss asked for numbers that he didn’t have access to. Whenever that happened, Daniel would have to start a process which he quickly learned to hate. The process went something like this:
- Verify that none of the cubes he currently had to access to contain the numbers that he needed to generate the report for his boss.
- Contact the IT Services department with a work order. Within the work order, Daniel would input his request for new data to be added to an existing cube (or materialized into a new cube). This work order would then be sent to a central enterprise resource planning system, and the work order would count as an instance of inter-departmental resource usage; at the end of the month, a certain dollar amount would be taken out of Daniels’s department’s budget, and be marked up a payment to the IT Services department.
- Wait three weeks.
- At the end of three weeks, Daniel would be notified that the work order had been processed, and that his new data was waiting for him within the Cognos system. He might have to wait a few hours for the data to be refreshed, because Cognos Transformer servers took four hours on average to build a new PowerPlay cube.
- If Daniel had made any mistake in his request, or left any ambiguity, he would have to go back to step 2 and start over.
Naturally, Daniel had to obsess over his work orders. The cost of delay with one bad request would be incredibly bad, because his boss would be expecting numbers by the end of the reporting period. Daniel lived under constant fear that the IT Services department would assign him a dim-witted data engineer; he also felt helpless that he had to rely on someone else to give him the resources he needed to do his job well.
What made things worse was when Daniel’s boss’s boss (yes, he of the fearsome data-driven reputation) dominated the requests of the other data analysts in Daniel’s department. During such events, both the data analysts and the IT Services department would prioritize the big boss’s request, leaving Daniel to fight over leftover resources at the services scrap table. It was during times like these that he was most likely to be assigned a dim-witted data engineer; over the course of a few years, Daniel learned to be SUPER careful with his work order requests whenever the big boss went on one of his data-requesting sprees.
Eventually, Daniel rose high enough in the ranks to count himself a senior data analyst. After 10 years at DPB, he left.
Daniel’s Second Job: The Metrics Knife Fight
In 2006, Daniel joined an early video streaming company named YouDoo. YooDoo had a slightly updated business intelligence stack compared to the Divine People’s Bank — they used Microsoft SQL Server as the basis for their datastore, built cubes in Microsoft SQL Server Analysis Services (or SSAS), and then fed data extracts from these systems to Tableau Desktop. Daniel also stopped wearing a tie to work.
At Youdoo, Daniel reported directly to the head of data, a veteran of the Cognos-style paradigm named Joe. “The goal here”, said Joe, when Daniel came in on his first day of work, “The goal here is to give the business users and the PMs direct access to the company’s data. If we can get them to do self-service, we would have less busy work to do!”
Daniel thought back to all the hell he went through in his previous job, and agreed that this sounded like a good idea.
Tableau desktop was and still is a beautiful piece of business intelligence software. It worked in the following manner: you would pull data out from your SQL database and dump it into a copy of Tableau running on your desktop machine. You would pull Excel spreadsheets and dump them into Tableau. You would pull up CSV files — sent to you by the data team — and dump them into Tableau.
Occasionally — though with a little more trepidation — you would connect Tableau to an OLAP cube, or directly to an SQL database itself.
Then, you would use Tableau to create beautiful, beautiful visualizations for the company to consume. These would come in the form of colorful heatmaps, slick graphs, and shiny bar charts, delivered straight from the hands of the Tableau user to the business people in the company. The best bit about this was that Tableau was completely drag-and-drop. This meant that non-technical business users could learn to use Tableau and — assuming they got the right data extracts from the data team — could come up with fancy graphs for the rest of the company to consume.
From his time at YouDoo, Daniel learned that Tableau was essentially the best tool in a new breed of BI tools, all of which represented a new approach to analytics. This new approach assumed that the data team’s job was to prepare data and make them available to business users. Then, capable business users could learn and use intuitive tools like Tableau to generate all the reports they needed.
But then came the problems.
It was six months into Daniel’s tenure at YouDoo that he was first dragged into a metric knife fight. Apparently, marketing and sales were at loggerheads over something numbers-related for a couple of weeks now. Daniel and Joe were booked for a meeting with the respective heads of sales and marketing. They learned quickly that marketing’s numbers (presented in a beautiful Tableau visualization, natch) didn’t match sales’s. Sales had exported their prospects from the same data sources as marketing’s — what was going on?
Daniel dug into the data that week. Over the course of a few hours, he realized that marketing was using a subtly different formula to calculate their numbers. Sales was using the right definitions for this particular dispute — but they, too, had made subtle errors in a few other metrics. To his dawning horror, Daniel realized that multiple business departments had defined the same metrics in slightly different ways … and that there was no company-wide standardization for measures across the company.
Daniel alerted Joe. Joe alerted the CEO. And the CEO called them into his office and exploded at both of them, because he had just presented the wrong numbers to the board in a quarterly meeting that had concluded the previous week. Daniel and Joe were forced to work overtime that day to get the numbers right. The CEO had to update the board members with a follow-up email, and then he issued a directive that all metric definitions were to be stored in a central location, to be maintained by the business intelligence team.
Daniel realized that this new Tableau workflow may have solved some problems … but it led to others as well.
Daniel’s Third Job: The Data Modeling Layer
Eight years later, in 2016, Daniel left YouDoo to work at a mid-stage startup named PropertyHubz. PropertyHubz used a relatively new Amazon cloud data warehouse called RedShift, along with a (then) two-year-old business intelligence tool named Looker. Along with the new stack, Daniel made other changes to his life: he dropped the shirt from his dress code entirely and came into work in a polo tee and pants.
Looker was amazing. Unlike the Cognos workflow Daniel started in, or the Tableau workflow he grappled with at his previous company, Looker assumed a completely different approach to data analytics. At PropertyHubz, Looker was hooked up to RedShift, and ran SQL queries directly against the database. Daniel’s team of data analysts spent most of their time creating data models in LookML, the proprietary data modeling language in Looker. They then handed those models off to less-technical members of the organization to turn into dashboards, reports, and self-service interfaces.
Daniel could immediately see the benefits of building with this workflow. Unlike in the Tableau paradigm, business logic was written once — by the data team — in the data modeling layer. These models were then recombined by other analysts and by non-technical business users to produce the reports and exploratory interfaces the business needed. Daniel thought this was a step-up, because it sidestepped all the metrics drift they fought so hard to contain at YouDoo. But Daniel was also wary. He had seen his fair share of gotchas to know that nothing was perfect.
At this point in our story, Daniel had spent 18 years in business intelligence. He was scarred by the pain of waiting on IT Services in his first job, in the mid-90s. He spent many years grappling with metrics drift in his second, in the mid-2000s. As Daniel settled into his new role in PropertyHubz, he looked on all that he had experienced and thought that the new tools were definitely better — and easier! — than the old tools. He was eager to see what new problems this new paradigm would bring — and in turn, what the next paradigm would do to solve them.
Frank Bien’s Three Waves
In 2017, Looker CEO Frank Bien wrote a piece titled Catching the Third Wave of Business Intelligence. In it, he described exactly the three waves that we’ve just illustrated for you, above.
In his essay, Bien writes in a matter-of-fact way, as if each successive approach to BI emerged as surely as the tide, ebbing away the way it arrived. He — and the team at Looker — deserve much credit for creating the third wave of business intelligence. But the waves aren’t as clearly delineated as real waves on a beach. The truth is more complicated.
Let’s recap Bien’s argument, before picking apart the parts that don’t fully match with reality. Those parts will have real implications on how you view the business intelligence tool landscape today.
In the first wave of data analytics, companies developed and sold monolithic stacks — that is, an all-in-one solution that came with data warehouse, data transformer, data cube solution, and visualization suite. This approach evolved out of technical necessity as much as anything else.
What do we mean by this? Well, it is very easy to forget just how expensive hardware was in the 90s. In 1993, for instance, 1GB of RAM cost $32,300 — an insane amount of money for what seems like a piddling amount of memory today! Thus, the Cognos era of business intelligence tools had no choice but to have data engineers take subsets of data and build them out into cubes: data warehouses were simply too expensive and too slow to be used for day-to-day analysis.
Naturally, this caused the analysts of that generation to bottleneck on data engineers, who were called on to build data cubes in response to business demands. The pains that Daniel experienced in his first job led to the development of the ‘self-service’-oriented tools like Tableau that he experienced in his second.
In this ‘second wave’ of business intelligence, data cubes and Cognos-like stacks continued to evolve, but new tools championed a ‘self-service’ orientation. Tools like Tableau gave business users beautiful dashboards and visualizations with not a line of code in sight. These tools were in turn fed by data exports drawn from the earlier first-wave environment. The basic idea was that analysts and business users would download datasets from these central data systems, and then load these datasets into tools that they could install on their own computers.
Of course, as we’ve seen from Daniel’s story, these tools came with their own set of problems.
It’s important to note that even as these ‘second-wave’ tools came into prominence, the Cognos-type first-wave environments continued to gain ground within large corporations. It wasn’t as if people adopted Tableau, and then the monolithic workflows went away. In fact, Cognos still exists today, albeit under IBM’s umbrella of business intelligence tools. (Even the PowerPlay cubes that Daniel used at the beginning of his career are still part of the product!)
When Bien talks about the ‘second-wave emerging’, it’s important to understand that reality is messier than the picture he paints. In our story with Daniel, for instance, his bank — like many other Cognos clients — continued to use and expand its usage of the product in the years since. Similar tools that emerged in competition with Cognos, like Microsoft’s SSAS suite of tools, may be rightly considered first-wave or second-wave, but are still going strong in large enterprises today.
But some things have changed.
In the past decade, two major technological breakthroughs have shifted the landscape yet again:
- Massively parallel processing (MPP) data warehouses began to be a thing, and
- Columnar datastores began to match OLAP cubes in analytical performance.
The first is easy to understand: MPP data warehouses are data warehouses that are not limited to one machine. They may instead scale up to hundreds or thousands of machines as is needed for the task at hand. These data warehouses were often also coupled with another innovation — that is, the cloud vendor pricing model. Business today only have to pay for the storage and the computing power that they use: no more, no less.
The second breakthrough is only slightly more difficult to understand. Generally speaking, data warehouses of the past adopted a row-oriented relational database architecture. This architecture was not well-suited to analytical workloads, because analytical workloads required rollups and aggregations over thousands of rows. This was the main reason that early BI vendors opted to slice off a small portion of data and load them into efficient data cubes, instead of running them inside databases.
In recent years, however, data warehouses have adopted what is called a columnar storage architecture. These columnar databases are built for analytical workloads, and are finally comparable in performance to data cube solutions.
This doesn’t mean that the cube-oriented systems and the decentralized Tableau-type analytical workflows have vanished from our industry. In fact, many companies have doubled down on their investments in earlier generations of these tools. They have layered on more tools, or have had departments add additional tools from different waves in order to augment their existing BI capabilities.
But for new companies — and large tech companies like Google, Uber, Facebook and Amazon — business intelligence that is implemented today is often built entirely within the third wave. This is the viewpoint that this book has attempted to present.
In a sentence: modern data warehouses have finally become cheap enough and powerful enough to stand on their own. Looker was the first BI vendor to realize this. They built their entire offering around the MPP data warehouse … and we’ve never looked back.
The Major Takeaways
We have presented Daniel’s story in narrative form because we think it captures some of the nuances that are lost in Bien’s presentation. Daniel is, of course, not real. But his story is a pastiche of real events and real problems that were taken from analysts we know. The pains that Daniel felt were real pains experienced by thousands of analysts in the previous decade.
Why is this important to know? It is important to know because the ideas that BI tools adopt are more important to understand than the selection of tools themselves. As we walked you through Daniel’s story, and then Bien’s argument, three trends seem to emerge:
- First, approaches in business intelligence tools are limited by the technology of the day.
- Second, approaches in business intelligence are often reactions to pains in the previous generation of tools.
- Third, as we’ve mentioned in the previous section on Frank Bien’s essay: each generation sticks around for a long, long time.
In a very particular sense, the business intelligence world is confusing today for that third reason: tools and approaches stick around for a long time. A new purchaser in the market would be confused by the mix of terminologies, ideas, architectural diagrams and approaches available to her. Daniel’s story should help explain why that is: many of these tools were developed in successive generations, yet co-exist uncomfortably today.
In the next section of this chapter, we will give you a taxonomy of business intelligence tools — that is, categories or buckets to lump things in. Many of these tools will reflect the ideas that we have presented here. Others are recombinations of old ideas, applied to new paradigms. But it is important to understand that all of them — Holistics included — are shaped by the three factors above.
The insight we want to give you here is that ideas or approaches change slower than tools do. If you understand this, it will be easier to evaluate a new BI tool when it comes to the market. You will be equipped to cut through all the industry hype, the noisy whitepapers, the expensive conferences, the industry jargon, the breathless Medium posts and the shouty vendor presentations. You will be able to find the signal in the noise.