Data Observability Buyers Guide: Market Observations

Written by Matt Aslett | Oct 25, 2023 12:00:00 PM

The 2023 Ventana Research Buyers Guide for Data Observability research enables me to provide observations about how the market has advanced.

The need to monitor the pipelines and processes in data-processing and analytics environments has driven the emergence of a new category of software: data observability.

Inspired by the observability platforms that provide an environment for monitoring metrics, traces and logs to track application and infrastructure performance, data observability software provides an environment for monitoring the quality and reliability of data used for analytics and governance projects.

There has been a Cambrian explosion of data observability software vendors in recent years and while each is slightly different, they also have fundamental capabilities in common. To monitor and measure anything, it must first be instrumented, so a baseline requirement for data observability software is that it collects and measures metrics from data pipelines, data warehouses, data lakes and other data-processing platforms.

Data observability software also collects, monitors and measures information on data lineage (dependencies between data), metadata (describing the attributes of the data, such as its age, volume, format, schema), and logs of human- or machine-based interaction with the data. In addition to collecting and monitoring this information, some data observability software also enables the creation of models that can be applied to the various metrics, logs, dependencies and attributes to automate the detection of anomalies.

Data observability software may also offer root cause analysis and the provision of alerts, explanations and recommendations to enable data engineers and data architects to accelerate the correction of issues.

Data observability addresses one of the most significant impediments to generating value from data. Maintaining data quality and trust is a perennial data management challenge, often preventing organizations from operating at the speed of business. Almost two-thirds (64%) of participants in Ventana Research’s Analytics and Data Benchmark Research cite reviewing data for quality and consistency issues as the most time-consuming task in analyzing data.

The importance of trust in data has arguably never been greater. As organizations aspire to be more data-driven, it is critical to trust the data used to make those decisions. Without data quality processes and tools, organizations may make decisions based on old, incomplete, incorrect or poorly organized data. Assessing the quality of data used to make business decisions is not only more important than ever but also increasingly difficult given the growing range of data sources and the volume of data that needs to be evaluated. Poor data quality processes can result in security and privacy risks as well as unnecessary data storage and processing costs due to data duplication.

Monitoring the quality and reliability of data used for analytics and governance projects is not a new challenge. Data quality software has been extant for decades. Organizations that have made investments in data quality might reasonably ask whether they need data observability, while those that have invested in data observability might wonder whether they can eschew traditional data quality tools.

To understand the difference between data quality and data observability it is important to recognize data quality is both a discipline and a product category. As a discipline, data quality refers to the processes, methods and tools used to measure the suitability of a dataset for a specific purpose. The precise measure of suitability will depend on the individual use case, but important characteristics include accuracy, completeness, consistency, timeliness and validity. The data quality product category is comprised of the tools used to evaluate data in relation to these characteristics.

Data observability, meanwhile, has emerged as a separate product category. It includes software focused on automating the monitoring of data to assess its health based on key attributes including freshness, distribution, volume, schema and lineage.

The use of automation expands the volume of data that can be monitored while also improving efficiency compared to manual data monitoring and management by automating data quality checks and recommended remediation actions. As such, automation is often cited as a distinction between data observability and data quality software. Focusing on automation as a distinction, however, relies on an outdated view of data quality software.

Although data quality software has historically provided users with an environment to manually check and correct data quality issues, the use of machine learning (ML) to automate the monitoring of data is also being integrated into data quality tools and platforms. Automating data monitoring ensures it is complete, valid and consistent as well as relevant and free from duplication. Automation using ML is not, therefore, a defining characteristic that separates data quality from data observability.

A clearer distinction can be drawn from the scope and focus of the functionality. Data quality software is concerned with the suitability of the data to a given task. In comparison, data observability is concerned with the reliability and health of the overall data environment.

Data observability tools monitor not just the data in an individual environment for a specific purpose at a given point in time, but also the associated upstream and downstream data pipelines. In doing so, data observability software ensures that data is available and up to date, avoiding downtime caused by lost or inaccurate data due to schema changes, system failures or broken data pipelines.

To put it another way, while data quality software is designed to help users identify and resolve data quality problems, data observability software is designed to automate the detection and identification of the causes of data quality problems, potentially enabling users to prevent data quality issues before they occur.

The two are largely complementary. For example, when the data being assessed remains consistent, data quality tools might not detect a failed pipeline until the data has become out of date. Data observability tools could detect the failure long before the data quality issue arises. Conversely, a change in address might not be identified by data observability tools if the new information adhered to the correct schema. It could be detected — and remediated — using data quality tools.

The reciprocal nature of data quality and data observability software products is supported by the fact that some vendors offer products in both categories while others offer products that could be said to offer functionality associated with both data observability and data quality. In addition to the emergence of standalone data observability software specialists, we also see this functionality being included in wider DataOps platforms. This is a trend we expect to continue. Through 2025, data observability will continue to be a priority for the evolution of data operations products as vendors deliver more automated approaches to data engineering and improving trust in enterprise data.

The relative immaturity of the market for data observability software means that it is difficult for organizations today to evaluate potential suppliers. Only a handful of vendors met the inclusion criteria for this Buyer’s Guide, while there is a very long list of Vendors of Note that were considered. Many of the emerging vendors are likely to be acquired while a few will fall by the wayside. That should not stop organizations from evaluating the potential benefits of data observability, however. It has a critical role to play in evaluating the performance and reliability of data pipelines, as well as the quality and validity of data, to deliver the benefits of investment in data and analytics.

The evolution of data observability is still in its early stages. Potential adopters of data observability are advised to pay close attention and evaluate purchases carefully. Some data observability products offer quality resolution and remediation functionality traditionally associated with data quality software, albeit not to the same depth and breadth. Additionally, some vendors previously associated with data quality have adopted the term data observability but may lack the depth and breadth of pipeline monitoring and error detection capabilities.

This research evaluates the following vendors that offer products that address key elements of data observability as we define it: Acceldata, Collibra, DataKitchen, IBM, Monte Carlo, Precisely and Stonebranch.

You can find more details on our site as well as in the Buyers Guide Market Report.

Regards,

Matt Aslett

View full post