Generative AI Boosts Value Creation with Outcome-Led Data Integration

Written by Matt Aslett | Sep 27, 2023 10:00:00 AM

Despite a focus on being data-driven, many organizations find that data and analytics projects fail to deliver on expectations. These initiatives can underwhelm for many reasons, because success requires a delicate balance of people, processes, information and technology. Small deviations from perfection in any of those factors can send projects off the rails.

For data projects, placing too much emphasis on the how data is produced — such as the tools, technologies and techniques used to collect, manage and process data — means that business goals can quickly get forgotten. There are many examples of successful data warehouse and data lake initiatives. However, a disproportionate focus on the production of data, rather than its consumption, has led many companies to waste time and money on projects that might be technically elegant in terms of assembling and integrating data inputs into a single repository but fail to meet the original business requirements in terms of extracting value from the collected data.

Since most data projects are implemented by the IT group, technical staff tend to focus on technical capabilities and the production of data. More than two-thirds (69%) of participants in Ventana Research’s Analytics and Data Benchmark Research spend most of their analytic time preparing data for analysis, compared to only 27% who spend most of their time determining how changes impact the business.

One reason data teams focus disproportionately on data inputs rather than data outcomes is the preponderance of “left-to-right” thinking about the flow of data through data integration pipelines. Typically, any illustration of data pipelines starts with the extraction of data from source applications on the left, passing through data integration and transformation in the center, with analysis, insight and the generation of value illustrated on the right. This left-to-right thinking reflects the way data practitioners topically think about data flows, reinforcing the focus on data sources and repositories as a starting point, and has implications on the way people approach a project.

Take the example of creating a data pipeline that facilitates personalized marketing and improves customer satisfaction among the top 10% of customers for a given product. A data engineer adopting a left-to-right approach will start by identifying the data sources and integration processes required to calculate 10% of customers. This will likely determine the ability to deliver personalized marketing to the correct set of customers. However, measuring the value delivered by the initiative via the ability to identify and measure customer satisfaction can potentially become an afterthought.

“Right-to-left” thinking is an approach to project planning that focuses on business outcomes even while participants are working on the technical capabilities to deliver them. In our personalized marketing example, an outcome-led approach would initially focus on measuring customer satisfaction resulting from personalized marketing before defining the data sources and transformation steps required to facilitate the delivery of personalized marketing to the correct set of customers.

Adopting right-to-left thinking can be difficult for data specialists trained to think from left to right. Old habits die hard. Right-to-left thinking requires discipline to focus on business outcomes and work backward toward the capabilities to deliver them.

Many organizations attempt to apply outcome-led approaches to IT projects, only to get waylaid by a fixation on technical elegance. However, generative AI and large language models facilitate outcome-driven data integration. As my colleague David Menninger explains, generative AI can create content, including text, digital images, audio, video or computer programs and models. When used with information about an organization’s data sources and business processes, plus documentation and best practices related to its data integration technology provider, generative AI can also be used to automatically generate data pipelines in response to declared business requirements. Data experts will want to validate the output of any automatically generated data pipeline plan.

An advantage of generative AI is that it can be prompted to focus on business outcomes and work backward to capabilities. It also cannot be waylaid by the human tendency to lean to the left.

More “traditional” machine learning is already used to provide recommendations for building data pipelines. The use of generative AI in data integration is nascent in data management. However, I assert that by 2026, more than three-quarters of organizations’ data management processes will be enhanced with artificial intelligence and machine learning to increase automation, accuracy, agility and speed.

Many organizations are examining the potential use cases for generative AI related to data and analytics. Early use cases include the automated generation of SQL queries from natural language prompts as well as automated data tagging and data preparation. The ability to automatically generate data pipelines is in the early stages. I recommend that organizations explore the potential benefits and evaluate data integration vendors offering capabilities that support right-to-left data integration to increase the focus on consumption rather than production-driven data and analytics.

Regards,

Matt Aslett

View full post