DataOps Buyers Guide: Market Observations

Written by Matt Aslett | Oct 16, 2023 10:00:00 AM

The 2023 Ventana Research Buyers Guide for DataOps research enables me to provide observations about how the market has advanced.

Data Operations (DataOps) is a methodology focused on the delivery of agile business intelligence (BI) and data science through the automation and orchestration of data integration and processing pipelines, incorporating improved data reliability and integrity via data monitoring and observability. DataOps has been part of the lexicon of the data market for almost a decade and takes inspiration from DevOps, which describes a set of tools, practices and philosophy used to support the continuous delivery of software applications in the face of constant changes.

Interest in DataOps is growing. Ventana Research asserts that by 2025, one-half of organizations will have adopted a DataOps approach to their data engineering processes, enabling them to be more flexible and agile. A variety of products, practices and processes enable DataOps, including products that support agile and continuous delivery of data analytics and continuous measurable improvement. An emphasis on agility, collaboration and automation separates DataOps from traditional approaches to data management, which were typically based on tools and practices that were batch-based, manual and rigid.

This distinction between DataOps and traditional data management tools is clearer in theory than it is in practice. There is a level of opacity as traditional data management vendors have, in recent years, incorporated capabilities that make their products more automated, collaborative and agile. There is no industry-wide consensus on the level of agility, collaboration and automation that must be provided for products be to be considered part of the DataOps category. While traditional data management vendors have also adopted the term DataOps, many have adopted a broader definition that describes DataOps as the combination of people, process and technology needed to automate the delivery of data to users in an organization and enable collaboration to facilitate data-driven decisions. This definition is broad enough that it could be interpreted to encompass all products and services that address data management and data governance, including many traditional batch-based, manual products that do not support agile and continuous delivery and continuous measurable improvement.

A narrower definition of DataOps focuses on the practical application of agile development, DevOps and lean manufacturing to the tasks and skills employed by data engineering professionals in support of data analytics development and operations. This definition emphasizes specific capabilities such as continuous delivery of analytic insight, process simplification, code generation, automation to avoid repeated errors and reduce repetitive tasks, the incorporation of stakeholder feedback and advancement, and measurable improvement in the efficient generation of insight from data. As such, the narrow definition of DataOps provides a set of criteria for agile and collaborative practices that products and services can be measured against.

Ventana Research’s perspective, based on our interaction with the vendor and user communities, aligns with the narrow definition. While traditional data management and data governance are complementary, our DataOps coverage focuses specifically on the delivery of agile BI and data science through the automation and orchestration of data integration and processing pipelines, incorporating improved data reliability and integrity via data monitoring and observability.

To be more specific, we believe that DataOps products and services provide functionality that addresses a particular set of capabilities: agile and collaborative data operations; the development, testing and deployment of data and analytics pipelines; data orchestration and data observability. These are the key criteria that we used to assess DataOps products and services as part of this Buyer’s Guide. This research is comprised of parallel evaluations of products addressing each of the three core areas of functionality: data pipelines, data orchestration and data observability. Vendors with products that address at least two of these three core areas were deemed to provide a superset of functionality to address DataOps overall. Additionally, we evaluated all products in all categories in relation to their support for agile and collaborative practices.

The development, testing and deployment of data pipelines is essential to generating intelligence from data, ensuring that data is integrated and processed in the correct sequence to generate the required intelligence. Just as a physical pipeline is used to transport water between stages in the generation of hydroelectric power, data pipelines are used to transport data between the stages involved in data processing and analytics to generate business insight. The transportation of data has traditionally been a batch process that has moved data from one environment to another. However, data-driven organizations are increasingly thinking of the steps involved in extracting, integrating, aggregating, preparing, transforming and loading data as a continual process that is orchestrated to facilitate data-driven analytics. We assert that by 2026, three-quarters of organizations will adopt data engineering processes that span data integration, transformation and preparation, producing repeatable data pipelines that create more agile information architectures.

Data orchestration provides the capabilities to automate and accelerate the flow of data from multiple sources to support analytics initiatives and drive business value. At the highest level of abstraction, data orchestration covers three key capabilities: collection (including data ingestion, preparation and cleansing); transformation (additionally including integration and enrichment); and activation (making the results available to compute engines, analytics and data science tools or operational applications). By 2026, more than one-half of organizations will adopt data orchestration technologies to automate and coordinate data workflows and increase efficiency and agility in data and analytics projects.

Meanwhile, the need to monitor the pipelines and processes in data processing and analytics environments has driven the emergence of a new category of software: data observability. Monitoring the quality and reliability of data used for analytics and governance projects is not new, but data pipeline observability utilizes machine learning (ML) to automate the monitoring of data to ensure that it is complete, valid and consistent, as well as relevant and free from duplication. Data pipeline observability also addresses monitoring not just the data stored in an individual data warehouse or data lake, but also the associated upstream and downstream data pipelines. Through 2025, data observability will continue to be a priority for the evolution of data operations products as vendors deliver more automated approaches to data engineering and improving trust in enterprise data.

In combination, data orchestration and data observability products address two of the most significant impediments to generating value from data. Participants in Ventana Research’s Analytics and Data Benchmark Research cite preparing data for analysis (69%) and reviewing data for quality and consistency issues (64%) as the two most time-consuming tasks in analyzing data.

As always, however, products are only one aspect of delivering on the promise of DataOps. New approaches to people, process and information are also required to deliver agile and collaborative development, testing and deployment of data and analytics workloads, as well as data operations. To improve the value that they are generating from their analytics and data initiatives, organizations need to investigate the potential benefits of data pipeline development, data orchestration and data observability products alongside processes and methodologies that support rapid innovation and experimentation, automation, collaboration, measurement and monitoring, and high data quality.

This research evaluates the following vendors that offer products that address at least two of the three core areas of DataOps functionality (data pipeline development, testing and deployment; data pipeline orchestration; and data pipeline observability): Alteryx, AWS, Astronomer, BMC, Databricks, DataKitchen, Google, Hitachi Vantara, IBM, Infoworks, Matillion, Prefect, Rivery, SAP, Stonebranch, StreamSets and Y42.

You can find more details on our site as well as in the Buyers Guide Market Report.

Regards,

Matt Aslett

View full post