Why is Data Analytics So Far Behind Software Engineering?
There are no answers in this piece, only two anecdotes in the service of a single question.
Onboarding in a Hot Local Startup
I was talking to a software engineer friend a few days ago. He had been watching his partner set up at their new job as a data analyst.
“The handover was pretty terrible!” he said, wringing his hands. “They literally gave her a .txt file with a SQL queries. No comments! Hardcoded dates for ‘past 30 days’! And this was at <name of well-known local eCommerce startup>!”
I laughed. “That’s actually pretty normal”, I said. “Did she have a mentor?”
“Well, yes”, my friend said. “She was assigned one, and the ramp up was pretty gentle.”
“Then that’s not too bad.”
I explained to him that data departments were different from software engineering departments.
“They’re behind on process and tooling when compared to programmers. I’m not entirely sure why. I’ll give you an example: in data, it’s considered a novel idea to throw everything into version control.”
Now it was his turn to laugh. “You can’t be serious!”
“Well, ask your partner — was the txt file in git? Was she given credentials to some central repository?”
And indeed she wasn’t.
Palantir and Everyone Else
A few days later I was talking to another friend, this time about the Snowflake IPO. It seemed like the data warehousing company had captured the attention of so many people in the financial and tech press; it was certainly the most talked-about tech IPO in recent memory. (For more on the hype, check out this Twitter thread).
My friend had spent a good number of years at Palantir. And he said, effectively: “All of this talk about Snowflake, when what I really want is Foundry.”
“What’s Foundry?” I asked.
“It’s Palantir’s data thing. Here.” he said, sending me a link.
I looked through, and came back with lots of questions. He gamely answered them, like I was some noob stuck in the outside world.
My friend explained to me that Foundry did everything: ETL, transformations, lineage, a metadata hub, data quality, visualizations, storage. You name it; they'd built it.
“What data store does it use?”
"Multiple" my friend said.
“So you’re telling me that Palantir has an all-in-one tool, with everything in it, and that you guys have been selling this for years?”
“Well … yes.”
Why is Data Analytics So Behind?
Why is data analytics behind software engineering? Why are the processes and tools in software engineering a lot further along when compared to business intelligence?
I bring up the two anecdotes above because I think they highlight some interesting ideas — like the fact that, if I were an outsider, I would expect that Palantir's product be the norm for business intelligence tooling. Instead, it turns out to be an exception.
The truth is that I don’t have a good answer to this. None of us do. When we kick back and philosophise about data at Holistics, we often wonder at this discrepancy ourselves.
The difference between data and software engineering is especially clear when we hop between data analytics and product development in our day-to-day at the office — on the one hand, when working with data, we grapple with less mature, less widely-accepted best practices. (We’ve made our views on the history of data modeling best practices quite clear — we think they’re all pretty out of date). When we do product development on the Holistics platform, however, we use a 10-year-old development methodology and trust that it kinda works. Working between the two domains often feels like crossing the border between two countries with very different GDPs. Software engineering has had a plethora of development approaches over the past two decades. The Agile Manifesto was penned in 2001. Github was created in 2008. Today, we know more or less what works and what doesn’t. The same cannot be said for data.
My working theory is this: I think that the development of a domain is correlated with the amount of money thrown at it. Software engineering has seen more money spent on it than data analytics has. Or, more accurately, software engineering is sometimes seen as a profit center, whereas data is nearly always seen as a cost center. The incentives drive the innovation.
This doesn’t explain everything, of course. Data literacy is probably another reason for this difference — most organizations find it easier to take on software engineering (we build some software in-house now!) than they do data-driven thinking (we have to get everyone to use data for their decisions). I suspect one reason Palantir could build their tools to the degree they did was because they were selling into intelligence agencies and military orgs with an existing data culture. These organizations were highly motivated to use data to accomplish their jobs, and were willing to pay big bucks for it — whereas in most companies, data literacy and data use isn’t yet as big a thing.
As data scientist Randy Au notes, on being laid off twice in the past:
Being let go just meant the organization was willing to fly blind without detailed analytics insights for a period of time. That risk of making bad decisions (that can always be fixed with a patch) wasn’t worth my salary when a manager somewhere needed to hit a cut quota.
How many businesses would be willing to make that call? I'd imagine quite a few.
Of course, I’m not entirely sure that these reasons are the truth. Others in Holistics have their pet theories. You probably have some pet theories of your own.
What I know is this: if there’s a will to spend, then there should be more than enough incentive to innovate. Fortunately for us, if the Snowflake IPO is any indication, things look good for the data world ahead.