I recently wrote about the need for organizations to take a holistic approach to the management and governance of data in motion alongside data at rest. As adoption of streaming data and event processing increases, it is no longer sufficient for streaming data projects to exist in isolation. Data needs to be managed and governed regardless of whether it is processed in batch or as a stream of events. This requirement has resulted in established data management vendors increasing their focus on streaming data and event processing through product development as well as acquisitions. It has also resulted in streaming and event specialists, such as Confluent, adding centralized management and governance capabilities to their existing offerings as they seek to establish or reinforce the strategic importance of streaming data as part of a modern approach to data management.
I recently noted that as demand for real-time interactive applications becomes more pervasive, the use of streaming data is becoming more mainstream. Streaming data and event processing has been part of the data landscape for many decades, but for much of that time, data streaming was a niche activity. Although adopted in industry segments with high-performance, real-time data processing and analytics requirements such as financial services and telecommunications, data streaming was far less common elsewhere. That has changed significantly in recent years, fueled by the proliferation of open-source and cloud-based streaming data and event technologies that have lowered the cost and technical barriers to developing new applications able to take advantage of data in-motion. This is a trend we expect to continue, to the extent that streaming data and event processing becomes an integral part of mainstream data-processing architectures.
I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
We’ve recently published our latest Benchmark Research on Data Governance and it’s fair to say, “you’ve come a long way, baby.” Many of you reading this weren’t around when that phrase was introduced in 1968 to promote Virginia Slims cigarettes, but you may have heard the phrase because it went on to become a part of popular culture. We’ve learned a lot about cigarettes since then, and we’ve learned a lot about data governance, too.
Organizations face various challenges with analytics and business intelligence processes, including data curation and modeling across disparate sources and data warehouses, maintaining data quality and ensuring security and governance. Traditional processes are slow when transforming large and diverse datasets into something which is easily consumable in BI. And, it can take days or weeks to create reports and dashboards — maybe longer if processes change and new data sources are introduced. Our Analytics and Data Benchmark Research shows that the most time-consuming processes are preparing data, reviewing it for quality issues and preparing reports for presentation and distribution.