Data Pipelines Integrate Data Processing and Enable AI

Written by Matt Aslett | Apr 2, 2024 10:00:00 AM

The development, testing and deployment of data pipelines is a fundamental accelerator of data-driven strategies, enabling enterprises to extract data from the operational applications and data platforms designed to run the business and load, integrate and transform it into the analytic data platforms and tools used to analyze the business. As I explained in our recent Data Pipelines Buyers Guide, data pipelines are essential to generating intelligence from data. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence (BI) and support the development and deployment of applications driven by artificial intelligence (AI).

Traditionally, data pipelines have involved batch extract, transform and load (ETL) processes, but the need for real-time data processing is driving demand for continuous data processing and more agile data pipelines that are adaptable to changing business conditions and requirements, including the increased reliance on streaming data and events. I assert that through 2026, approaches to data operations (DataOps) will continue to evolve as enterprises adapt their utilization of data processing pipelines to reflect increased adoption of event-driven architecture and microservices.

More than two-thirds of participants in our Analytics and Data Benchmark Research cite preparing data for analysis as consuming most of the time spent analyzing data. As such, the benefits associated with accelerating data pipelines can be considerable. There are multiple approaches to increasing the agility of data pipelines. For example, we see an increased focus on extract, load and transform (ELT) processes that reduce upfront delays in transforming data by pushing transformation execution to the target data platform. I also recently discussed the emergence of zero-ETL approaches, which can be seen as a form of ELT that automates extraction and loading and has the potential to remove the need for transformation in some use cases. Additionally, reverse ETL tools can help improve actionable responsiveness by extracting transformed and integrated data from the analytic data platforms and loading it back into operational systems.

Both ETL and ELT approaches can be accelerated using change data capture (CDC) techniques that reduce complexity and increase agility by only synchronizing changed data rather than the entire dataset, while we also see the application of generative AI (GenAI) to automatically generate or recommend data pipelines in response to natural language explanations of desired outcomes. The development of agile data platforms is an important aspect of DataOps, which focuses on the application of agile development, DevOps and lean manufacturing by data engineering professionals in support of data production.

Agile and collaborative practices were a core component of the Capabilities criteria we used to assess data pipeline tools in our Data Pipelines Buyers Guide, alongside the functionality required to support data pipeline development, deployment and test automation, as well as integration with the wider ecosystem of DevOps, data management, DataOps and BI and AI tools and applications.

The development, testing, and deployment of data pipelines is just one aspect of improving the use of data within an enterprise. DataOps also encompasses data orchestration and data observability, and I will explore these in greater detail in forthcoming Analyst Perspectives. Nevertheless, I recommend that all enterprises explore how the development and deployment of agile data pipelines can help increase the potential for improved data-driven decision-making.

Regards,

Matt Aslett

View full post