Almost all organizations are investing in data science, or planning to, as they seek to encourage experimentation and exploration to identify new business challenges and opportunities as part of the drive toward creating a more data-driven culture. My colleague, David Menninger, has written about how organizations using artificial intelligence and machine learning (AI/ML) report gaining competitive advantage, improving customer experiences, responding faster to opportunities and threats, and improving the bottom line with increased sales and lower costs. One-quarter of participants (25%) in Ventana Research’s Analytics and Data Benchmark Research are already using AI/ML, while more than one-third (34%) plan to do so in the next year, and more than one-quarter (28%) plan to do so eventually. As organizations adopt data science and expand their analytics initiatives, they face no shortage of options for AI/ML capabilities. Understanding which is the most appropriate approach to take could be the difference between success and failure. The cloud providers all offer services, including general-purpose ML environments, as well as dedicated services for specific use cases, such as image detection or language translation. Software vendors also provide a range of products, both on-premises and in the cloud, including general-purpose ML platforms and specialist applications. Meanwhile, analytic data platform providers are increasingly adding ML capabilities to their offerings to provide additional value to customers and differentiate themselves from their competitors. There is no simple answer as to which is the best approach, but it is worth weighing the relative benefits and challenges. Looking at the options from the perspective of our analytic data platform expertise, the key choice is between AI/ML capabilities provided on a standalone basis or integrated into a larger data platform.
I have previously written about growing interest in the data lakehouse as one of the design patterns for delivering hydroanalytics analysis of data in a data lake. Many organizations have invested in data lakes as a relatively inexpensive way of storing large volumes of data from multiple enterprise applications and workloads, especially semi- and unstructured data that is unsuitable for storing and processing in a data warehouse. However, early data lake projects lacked structured data management and processing functionality to support multiple business intelligence efforts as well as data science and even operational applications.
I have written recently about the similarities and differences between data mesh and data fabric. The two are potentially complementary. Data mesh is an organizational and cultural approach to data ownership, access and governance. Data fabric is a technical approach to automating data management and data governance in a distributed architecture. There are various definitions of data fabric, but key elements include a data catalog for metadata-driven data governance and self-service, agile data integration.
In their pursuit to be data-driven, organizations are collecting and managing more data than ever before as they attempt to gain competitive advantage and respond faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. As data is increasingly spread across multiple data centers, clouds and regions, organizations need to manage data on multiple systems in different locations and bring it together for analysis. As the data volumes increase and more data sources and data types are introduced in the organization, it creates challenges to storing, managing, connecting and analyzing the huge set of information that is spread across multiple locations. Having a strong foundation and scalable data management architecture in place can help alleviate many of the challenges organizations face when they are scaling and adding more infrastructure. We have written about the potential for hybrid and multi-cloud platforms to safeguard data across heterogenous environments, which plays to the strengths of companies, such as Actian, that provide a single environment with the ability to integrate, manage and process data across multiple locations.
I have written a few times in recent months about vendors offering functionality that addresses data orchestration. This is a concept that has been growing in popularity in the past five years amid the rise of Data Operations (DataOps), which describes more agile approaches to data integration and data management. In a nutshell, data orchestration is the process of combining data from multiple operational data sources and preparing and transforming it for analysis. To those unfamiliar with the term, this may sound very much like the tasks that data management practitioners having been undertaking for decades. As such, it is fair to ask what separates data orchestration from traditional approaches to data management. Is it really something new that can deliver innovation and business value, or just the rebranding of existing practices designed to drive demand for products and services?