Monte Carlo Bets on the Future of Data Observability

Written by Matt Aslett | Dec 13, 2022 11:00:00 AM

Earlier this year, I wrote about the increasing importance of data observability, an emerging product category that takes advantage of machine learning (ML) and Data Operations (DataOps) to automate the monitoring of data used for analytics projects to ensure its quality and lineage. Monitoring the quality and lineage of data is nothing new. Manual tools exist to ensure that it is complete, valid and consistent, as well as relevant and free from duplication. Data observability vendors, including Monte Carlo Data, have emerged in recent years with the goal of increasing the productivity of data teams and improving organizations’ trust in data using automation and artificial intelligence and machine learning (AI/ML).

Monte Carlo was founded in 2019 by CEO Barr Moses and Chief Technology Officer Lior Gavish, who were previously VP of customer operations at Gainsight and SVP of engineering at Barracuda, respectively. In those roles, both had witnessed that while tools and platforms were readily available to enable IT engineers to identify and resolve software and infrastructure failures and performance problems, there was a paucity of offerings available to data engineers to monitor the validity of data pipelines. The founders were inspired by the observability platforms that provide software and infrastructure engineers with an environment for monitoring metrics, traces and logs to track application and infrastructure performance and set out to create a similar environment for monitoring the quality and reliability of data used for analytics and governance projects. Unlike existing data-quality software, which typically provides users with an environment to manually check and correct data-quality issues, data observability software is designed to automate the monitoring of data used for analytics projects. In addition to improving trust in data, this has the potential to reduce time to insight. Almost two-thirds of participants (64%) in our Analytics and Data Benchmark Research cited reviewing data for quality issues as being the most time-consuming aspect of analytics initiatives, second only to preparing data for analysis. Monte Carlo delivered the first commercial version of its Data Observability Platform in 2020 and has since established an impressive roster of customers, including Asics, Compass, CNN, Fox, JetBlue, Hippo, PagerDuty and Vimeo. Development of the platform, as well as Monte Carlo’s sales and marketing capabilities, have been funded by venture capital financing. Most recently, the company announced a $135 million funding round involving IVP, Accel, GGV Capital, Redpoint Ventures, ICONIQ Growth, Salesforce Ventures, and GIC Singapore. The Series D funding round brought the total raised by the company to $226 million at a valuation of $1.6 billion.

Data observability may be a new term, but the benefits of automating data quality mean that it is unlikely to be a passing fad. I assert that through 2025, data observability will continue to be a priority for the evolution of DataOps products as vendors deliver more automated approaches to data engineering and improving trust in enterprise data. Monte Carlo’s Data Observability Platform was created on the premise that for organizations to trust the data in their systems, they need to be able to monitor the data and assess its health based on five key attributes: freshness, distribution, volume, schema and lineage. Freshness is a measure of how recently data tables were updated, while distribution is a measure of whether the data is within an anticipated range. Volume relates to the completeness of the data tables, while schema takes into account changes to how the data is organized. Lineage examines which teams have been responsible for generating and accessing the data, as well as any impact on upstream sources or downstream dashboards. Monte Carlo’s Data Observability Platform addresses these attributes with functional capabilities in four areas: detection, resolution, prevention and integration.

At the heart of the platform is ML-powered monitoring, anomaly detection and notification, which automatically assesses fields and tables based on known issues or business rules to detect and alert on data freshness, volume and schema changes. Incident resolution is addressed by automated field-level lineage, root cause analysis and workflow tools, which can also be used to proactively make changes to data assets to prevent data-quality issues. Additionally, the identification of fields, tables, and queries that are unused or used inefficiently can be utilized to proactively manage compute and storage costs. Monte Carlo uses a Data Collector deployed in a customer’s Amazon Web Services (AWS) environment to extract metadata, logs and statistics from analytic data platforms and business intelligence (BI) tools, and it provides integration with data orchestration tools while notifications can also be sent to productivity tools and notification systems. Monte Carlo’s Data Observability Platform is delivered as a cloud-managed service and is targeted at data engineers, providing them with the visibility they need to detect, resolve and prevent data-quality and data-lineage issues. However, it is designed to ensure that organizations have a higher level of trust to drive data-driven decision-making by ensuring that data owners have greater visibility into how their data is used across the organization and data users have confidence in the integrity of the data used to make decisions.

Data observability is a new approach to an established problem, but it is by no means a matter of slapping a new label on existing data-quality products. Automation and intelligence are critical to data observability platforms in terms of the expanding volume of data to be monitored and efficiency compared to manual techniques. These factors are likely to become increasingly important to businesses as data volumes continue to grow and they become increasingly reliant on DataOps and the orchestration of data pipelines to support data-driven decision-making. I recommend that organizations exploring approaches to improving trust in data evaluate the emerging group of data observability providers, including Monte Carlo.

Regards,

Matt Aslett

View full post