Power to the (Data) People!

Dave Melillo
5 min readDec 19, 2022

--

So much has changed lately.

In work, life, technology … no facet of our lives is the same as it was just a few years ago, and the world of data is no exception.

In 2021 I wrote some blogs about data platforms and professionals, and while much hasn’t changed drastically, the meta narrative of what data people do and what tools they use is starting to evolve.

Data professionals are no longer mysterious unicorns. Data engineers and data scientists have clear swim lanes and are no longer mistaken for one another. The days of data analysts performing herculean tasks to create a simple dashboard are long gone, as data organizations and data infrastructure have scaled out beyond our wildest dreams. Business users are now empowered to make decisions with self service data tools and embedded AI, rendering ad hoc analysis obsolete.

While much of this is positive and good, some concepts that enabled data professionals to do amazing things are starting to erode. In the following post I attempt to call out those core concepts coming under fire and offer some alternative solutions to the path we find ourselves on.

No Platforms

This may seem like a very hypocritical message from someone who published a post titled How To Build a Data Platform, but hear me out:

I am not suggesting that data platforms are bad. I am a firm believer in a composable data platform, meaning that you use the best in class tools to create a workbench that is tailored to your needs. Sometimes that means you need a separate tool for data streaming, a separate tool for batch ETL, and another tool specific to reverse ETL, which can all be tied together through data orchestration and good architecture.

The freedom that data people have historically had to create their own solutions led to exceptional ingenuity, but as big companies have started to fill the space I am starting to hear calls for data stack consolidation to reduce redundancy and maximize ROI.

I have a couple of specific gripes with this type of messaging:

  1. Committing to one platform limits your flexibility and creativity. The big platforms solve problems in very specific ways, and while I am sure they all work, it might not be the best solution for your situation.
  2. Although many of these big platforms offer free or low costs versions, they are generally expensive. The data industry has been built on the backs of prominent open source firms like Apache, which benefitted from crowdsourcing development to anyone with a keyboard. All of this progress goes away if access to world class data tools is only reserved for people working at companies who can afford them.

Don’t get me wrong. Companies like Snowflake and DataBricks have revolutionized the data game, and I would not have a career without them. Just remember that there are no free lunches when it comes to commercialized products. If data people are forced to use an all in one platform we will see creativity cease and our data products will be worse off for it.

Life In The Fast Lane

In the early days of data, people hired as data scientists, data engineers and data analysts were generally given immunity from bureaucracy. This was necessary as the data charter was ever changing, requiring data professionals to have maximum flexibility in order to complete their mission.

Data professionals were allowed to operate in the “fast lane” of organizations, which allowed them to deliver fast results. This freedom was a major reason the data industry boomed and until recently, it was understood that the risk of allowing data people to operate outside of the norm was worth the ROI.

But this is where I see the most changes recently; Data freedom is being revoked at an alarming rate. This has been a natural response to the growth of the industry and unique issues like handling PII data, but in my opinion the response has been an overcorrection.

Every time a data person has to overcome red tape, it’s a moment they aren’t building a data product that generates value. If data people lose the freedom they have enjoyed over the past 10 years you can expect innovation to wane.

The Best Data People Aren’t Engineers

Every exemplary data professional I have met or hired had one common trait; they weren’t software engineers. They were business people, scientists, academics and creatives who saw data as a way to improve their situation and deliver on previously unfulfilled goals.

Historically, the data professional has been tasked with getting the job done by any means necessary. This has resulted in creative and brilliant solutions that are built by subject matter experts who are technical enough for the task at hand. But more and more, data professionals are expected to behave, create and work like software engineers.

If the only way for a data person to contribute to data products is through rote engineering workflows, my guess is that you won’t get much out of them. That’s not to say data people should be exempt from standard QA processes, but treating data people like software engineers is going to omit the ideas of people who are too busy solving problems and answering questions to follow a rigorous engineering process.

Optimization not Automation

With the continued emergence of AI, people are starting to wonder if data people will even be necessary in the future, but anyone with experience in ML/AI will tell you that automated answers leave much to be desired because they lack transparency and context. For example, even with the advances in AI image generation, the robots still can’t get basic things like human fingers right.

AI Images generated by Lensa with interesting hands …

The right person, with the right skills and the right technology can give you the most contextually accurate answer in a reasonable amount of time. If you increase the number of right people in the room and effectiveness of the platform, you can expect more accurate answers even faster. If you keep going down that path, sooner or later you’ll start having answers to questions before you even ask them.

That’s why the focus for data people should be on optimization vs automation. Instead of figuring out how NOT to do something (automation), figure out how data people can be enabled to reach their maximum potential.

Conclusion

There are two words that came up in almost every section of this post:

Creativity and Freedom.

  • If you give data people the freedom to choose which tools they want to use to solve data problems, they will pleasantly surprise you with creative solutions.
  • If you allow data people to cut through red tape, they will continue to deliver data products at scale and speed.
  • If you give data people the freedom to be successful, even if they don’t fit the archetype of a software engineer, they will reward you with creative conclusions.
  • If you stop trying to automate and focus on optimizing, you will empower your data team to be the best they can be.

--

--

Dave Melillo

The Full Data Stack! Data Engineer, Data Architect, Data Scientist ++ practical application of data science 🛠