Redshift, Snowflake, and the Two Philosophies of Cost

On June 17th, about two months ago, Lars Kamp of Intermix.io published a blog post titled Why we sold intermix.io to private equity in a shifting market.

The post title sounds like a startup retrospective (which it was), but a close read reveals trends that the rest of us in data should pay close attention to. Key to Intermix.io’s fate was the fact that it had bet the house on AWS’s Redshift. In the years afterwards, however, changes in the cloud data warehousing space made Redshift’s value proposition less and less attractive. To hear Kamp tell it, he had been tying his entire company to the wrong horse:

Snowflake addressed our core segment — SMBs that use Redshift and were unhappy due to performance issues. Snowflake offered a better product that made the migration from Redshift (the market leader by 10x in terms of number of customers) worthwhile. And so our addressable market cratered within just two quarters.

The fact that Kamp had to write an email to his investors, and sell off his company to private equity buyers reflect some underlying market shifts that we should all pay attention to.

Let’s dig in.

Redshift vs Snowflake

Lars Kamp started intermix.io with Paul Lappas in February 2016. A Techcrunch article from 2018 describe the startup as ‘Intermix.io looks to help data engineers find their worst bottlenecks’. The article went on to describe the service as:

For any company built on top of machine learning operations, the more data it has, the better it is off — as long as it can keep it all under control. But as more and more information pours in from disparate sources, gets logged in obscure databases and is generally hard (or slow) to query, the process of getting that all into one neat place where a data scientist can actually start running the statistics is quickly running into one of machine learning’s biggest bottlenecks.

That’s a problem Intermix.io and its founders, Paul Lappas and Lars Kamp, hope to solve.

(…)

Intermix.io works in a couple of ways: First, it tags all of that data, giving the service a meta-layer of understanding what does what, and where it goes; second, it taps every input in order to gather metrics on performance and help identify those potential bottlenecks; and lastly, it’s able to track that performance all the way from the query to the thing that ends up on a dashboard somewhere. The idea here is that if, say, some server is about to run out of space somewhere or is showing some performance degradation, that’s going to start showing up in the performance of the actual operations pretty quickly — and needs to be addressed.

To be exact, Intermix.io worked only with Redshift. Kamp and team had looked out at the cloud data warehousing space circa 2016, and saw that Redshift was the only real option they needed to support — and good thing too, since they had limited engineering resources. (Sharp readers would note that BigQuery was already around at the time, but as Kamp says: “With AWS about 10x the size of GCP at the time, it was a no-brainer to go with Redshift (…) Redshift was the only game in town.”)

In late 2016, Snowflake emerged out of seemingly nowhere. Like BigQuery, Snowflake was a massively parallel processing (MPP) columnar data warehouse that was built on top of a ‘shared-nothing’ architecture. This architecture meant that compute and storage was decoupled from each other — and, more importantly, could be scaled completely separately from each other.

Kamp notes that Snowflake’s real advantage over Redshift was what he called its ‘serverless’ model, and what other people call ‘elastic scaling’: when you use Redshift, you have to provision servers and watch those provisioned servers carefully, because scaling Redshift wasn’t automatic; in contrast, Snowflake (and BigQuery) could invisibly scale up to however much compute or storage it needed to execute your query, without any intervention from you.

This almost magical quality was what caused Intermix.io’s target market (SMBs who were unhappy with Redshift performance) to plateau. SMBs and long-time Redshift users began switching over to Snowflake. And it’s worth noting that Snowflake took only two years to achieve this — in the post, Kamp outlines in great detail Snowflake’s flawless execution from launch in 2016 to aggressive growth in 2018.

The Two Philosophies of Cost

Intermix.io’s fate tells us something about of the two philosophies of cost that exists in data.

I’ve written about the two philosophies on this blog before, but it’s been interesting to see this dynamic play out in another company’s story. Briefly stated, the two philosophies are:

  1. Pay more for tools that take care of themselves, pay less for people to do the upkeep.
  2. Pay more for people to do the upkeep, avoid expensive auto-scaling tools.

People in the first camp say things like “Hiring is bloody expensive, I don’t want build out a sizeable data engineering team to baby our stack. Give me simple tools that run themselves.” People in the second camp say things like “Man, these new pay-as-you-go tools are crazy expensive. Better to do it old school: a perfectly tuned, carefully selected, tried-and-true data stack, maintained by a conventional data engineering team, so we can save on long-term operational costs.”

But if we take Intermix.io’s story as an indicator, it appears that the former group is winning out.

Why?

Kamp’s conclusion is that SMBs preferred something that ‘just worked’:

(…) it turns out that the wider data / analytics engineering community does not want to have to tune their databases at all. They do not want to worry about it all, and just have the database itself do the job. And they don’t want to buy an add-on product — they want that functionality as part of the database.

And if that ‘magical’ experience came with a higher price point, so what?

Kamp’s conclusion is a convincing one, and given his experiences, I wouldn’t argue against anything he says. But I think it helps to take a step back and ask ourselves a simple question: what, exactly, is the goal of a data department?

The answer to this question is actually easy to state: the goal of a data department is to put data into the hands of business operators, so that they may make better decisions. Fine-tuning a database in order to deliver that insight is decidedly not part of this goal. Why spend so much time performance-tuning and maintaining your data warehouses when you could better spend that time on higher-value activities? Why set up notifications for your data engineers to make sure Redshift doesn’t trip over itself, when they could be building models that help the business?

In the end, this new — expensive! — paradigm of invisibly scaleable, ‘shared-nothing’ architectures seem more in line with the ultimate goal of business intelligence. No question, then, that we’re seeing an industry-wide shift to ‘massively elastic solutions’.

Intermix.io’s fate may be the canary in the coal mine for this shift. We would do well to pay it close attention.