Ventana Research Analyst Perspectives

Enhancing Data Catalog with AI

Posted by David Menninger on Sep 22, 2022 3:00:00 AM

Organizations are collecting data from multiple data sources and a variety of systems to enrich their analytics and business intelligence (BI). But collecting data is only half of the equation. As the data grows, it becomes challenging to find the right data at the right time. Many organizations can’t take full advantage of their data lakes because they don’t know what data actually exists. Also, there are more regulations and compliance requirements than ever before. It is critical for organizations to understand the kind of data they have, who is handling it, what it is being used for and how it needs to be protected. They also have to avoid putting too many layers and wrappers around the data as it can make the data difficult to access. These challenges create a need for more automated ways to discover, track, research and govern the data.

Ventana_Research_Benchmark_Research_Analytics_01_Data_Catalog_Satisfaction_Analytics_20220909Data catalogs have become the standard for maintaining metadata for data analytics and self-service BI. They provide a roadmap to all the data sources, both internal and external, and enable data professionals and business users to swiftly sort through the inventory of data assets to find the information they need. Our Analytics and Data Benchmark Research finds that organizations with adequate data catalog technologies reported significantly higher rates of satisfaction (60% compared to 20% of those with inadequate technologies). But, considering the volume and variety of data that organizations deal with today, it becomes difficult to keep the data catalogs regularly updated. Manually searching the database and linking all metadata to the data catalog is a time-consuming and resource-intensive process. And the challenges are further magnified when organizations look to scale such manual methods as the data volume, complexity and sources increase.

Artificial intelligence and machine learning (AI/ML) are transforming the data catalog landscape. Machine learning algorithms can be trained to browse through data catalogs to collect metadata and keep it updated with new, incoming information. AI/ML learns from data patterns, queries and interactions to support both data quality and governance. It enables organizations to expand the available data variety, standardize data semantics and simplify data accessibility. Many organizations have started embracing more sophisticated catalogs with AI/ML capabilities to scale operations and harness insights that would otherwise be overlooked.

VR_2022_Data_Assertion_1_Square-pngData management vendors are continuously improving AI/ML capabilities and integrating them into their offerings to enable users to discover, refine, explore and analyze data sets more rapidly and efficiently. AI/ML make it possible for organizations to democratize data, allowing users to easily navigate unstructured data, uncover patterns in data sets and understand the movement and transformation of data through time and data lineage. We assert that by 2025, more than two-thirds of all data processes will use AI/ML to boost the value that can be derived from the data.

Additionally, AI/ML can automate identification and tagging of data. It detects unusual usage as well as identify sensitive data and assign protection schemes. AI/ML can assist with many data processes beyond just data privacy. It provides recommendations on what data sources to use and how to use data. It can access data quality and resolve quality issues. With an AI-driven data catalog, organizations can simplify data compliance and governance and standardize the way data is stored and labelled.

A data catalog should be a component of every organization’s data governance framework. Many vendors offer integrated data catalog capabilities for data governance, analytics and BI. Organizations should evaluate the AI/ML capabilities of the data catalog technologies they consider and how they can make it easier to find and use data, which will help improve operational results.


David Menninger

Topics: Business Intelligence, Data Governance, Data Management, AI and Machine Learning, data operations

David Menninger

Written by David Menninger

David is responsible for the overall research direction of data, information and analytics technologies at Ventana Research covering major areas including Analytics, Big Data, Business Intelligence and Information Management along with the additional specific research categories including Information Applications, IT Performance Management, Location Intelligence, Operational Intelligence and IoT, and Data Science. David is also responsible for examining the role of cloud computing, collaboration and mobile technologies as they affect these areas. David brings to Ventana Research over twenty-five years of experience, through which he has marketed and brought to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes. Prior to joining Ventana Research, David was the Head of Business Development & Strategy at Pivotal a division of EMC, VP of Marketing and Product Management at Vertica Systems, VP of Marketing and Product Management at Oracle, Applix, InforSense and IRI Software. David earned his MS in Business from Bentley University and a BS in Economics from University of Pennsylvania.