Avoiding the Complexity and Cost of the Modern Data Stack

Written by Matt Aslett | Dec 19, 2023 11:00:00 AM

I recently discussed how fashion has a surprisingly significant role to play in the data market as various architectural approaches to data storage and processing take turns enjoying a phase in the limelight. Pendulum swing is a theory of fashion that describes the periodic movement of trends between two extremes, such as short and long hemlines or skinny and baggy/flared trousers. Pendulum swing theory is similarly a factor in data technology trends, with an example being the oscillation between the use of best-of-breed tools and consolidated platforms.

For the past decade, the use of best-of-breed tools has been in the ascendency in the data operations and data management sectors. These are respectively focused on the automation and orchestration of data integration and processing pipelines and the delivery of data integrity and reliability through data quality and master data management. Evidence of the popularity of best-of-breed tools comes from our recent DataOps Buyer’s Guide research and the popularity of the modern data stack, a loosely defined term used to describe a variety of cloud-based data management and integration tools used in conjunction with cloud data platforms. There are advantages to using an assortment of tools rather than a single monolithic platform, not least the avoidance of becoming over-reliant on one provider. The more tools an organization uses, however, the greater the potential for integration complexity and maintenance overhead. That is one reason we believe the data fashion pendulum may be swinging back toward consolidated platforms.

The origin of the term modern data stack is unclear, but it began to enter common parlance in the data management sector about 10 years ago following the emergence of cloud-native software-as-a-service products to address data warehousing, data integration (ETL/ELT) and data visualization. While the initial focus of the modern data stack was on data warehousing, data integration and data visualization, it has come to include many other categories of products that need to be deployed to deliver end-to-end data processing, including development, testing and deployment of data pipelines, data orchestration, data observability, data and event streaming, data cataloging/governance and overall DataOps.

An important point to note about the modern data stack is that it is not a single stack of tools at all. A more appropriate name might be the modern data smorgasbord, in that it is more akin to an assortment of tools that could potentially be used in combination to address the various requirements for data management and integration.

Another important point to note is that while some of the tools that are part of the modern data stack are now more than a decade old, it is the ability of users to quickly start using self-service cloud services that defines them as modern, in comparison with more traditional on-premises data management products. The ease of adoption associated with modern data stack tools made it simple for enterprise data engineering teams to pick and choose from the numerous products available in the various categories, accelerating the adoption of new and more agile approaches to data processing and management. I assert that by 2026, three-quarters of organizations will adopt data engineering processes that span data integration, transformation and preparation, producing repeatable data pipelines that create more agile information architectures.

However, the abundance of choice has arguably become a curse rather than a blessing, leading to complexity, cost and confusion. Using seven different tools for data integration, data transformation, data warehousing, data orchestration, data observability, data cataloging and data visualization means seven different interfaces, seven different user experiences, seven different skill sets and seven different license or subscription bills as well as additional cost and complexity related to integration and maintenance requirements to keep multiple components operating together.

The term “stack” implies a neatly arranged, organized and integrated collection of tools. Instead, many modern data stacks could better be described as loosely coupled collections of disparate tools held together with custom coding as well as manual data engineering and management efforts. There are additional time, effort and economic costs involved with using multiple best-of-breed tools, and this combination of costs is one of the reasons we have seen some recent pushback to the concept of the modern data stack. Having reached the peak of its arc, the pendulum of fashion is swinging back towards the advantages of consolidated platforms, giving rise to the emergence of what some are (not entirely seriously) calling the post-modern data stack. What is the post-modern data stack, aside from an amusing play on words? It is an attempt to deliver the benefits of the modern data stack (cloud-native, self-service agility) without the challenges (complexity and associated costs).

Delivering the benefits of various components of the modern data stack without the complexity is easier said than done, but vendors attempting to do so have the advantage that many, if not most, components of the modern data stack are open-source projects. Rather than creating proprietary versions, vendors can use the available open-source functionality, focusing on providing a single unified interface and interaction layer and integrating the various components with an emphasis on automation. I assert that by 2026, more than three-quarters of organizations’ data management processes will be enhanced with artificial intelligence and machine learning to increase automation, accuracy, agility and speed.

Achieving a balance between the benefit of choice and the challenge of complexity is something of a holy grail. However, combining AI-driven automation with the functionality available across multiple components of the modern stack will go a long way toward helping organizations realize the advantages of self-service data management cloud services without the cost and maintenance overhead. I recommend that enterprises evaluating multiple tools that form aspects of the modern data stack also consider vendors providing consolidated platforms.

Regards,

Matt Aslett

View full post