Almost all organizations are investing in data science, or planning to, as they seek to encourage experimentation and exploration to identify new business challenges and opportunities as part of the drive toward creating a more data-driven culture. My colleague, David Menninger, has written about how organizations using artificial intelligence and machine learning (AI/ML) report gaining competitive advantage, improving customer experiences, responding faster to opportunities and threats, and improving the bottom line with increased sales and lower costs. One-quarter of participants (25%) in Ventana Research’s Analytics and Data Benchmark Research are already using AI/ML, while more than one-third (34%) plan to do so in the next year, and more than one-quarter (28%) plan to do so eventually. As organizations adopt data science and expand their analytics initiatives, they face no shortage of options for AI/ML capabilities. Understanding which is the most appropriate approach to take could be the difference between success and failure. The cloud providers all offer services, including general-purpose ML environments, as well as dedicated services for specific use cases, such as image detection or language translation. Software vendors also provide a range of products, both on-premises and in the cloud, including general-purpose ML platforms and specialist applications. Meanwhile, analytic data platform providers are increasingly adding ML capabilities to their offerings to provide additional value to customers and differentiate themselves from their competitors. There is no simple answer as to which is the best approach, but it is worth weighing the relative benefits and challenges. Looking at the options from the perspective of our analytic data platform expertise, the key choice is between AI/ML capabilities provided on a standalone basis or integrated into a larger data platform.
Ventana Research’s Data Lakes Dynamics Insights research illustrates that while data lakes are fulfilling their promise of enabling organizations to economically store and process large volumes of raw data, data lake environments continue to evolve. Data lakes were initially based primarily on Apache Hadoop deployed on-premises but are now increasingly based on cloud object storage. Adopters are also shifting from data lakes based on homegrown scripts and code to open standards and open formats, and they are beginning to embrace the structured data-processing functionality that supports data lakehouse capabilities. These trends are driving the evolution of vendor product offerings and strategies, as typified by Cloudera’s recent launch of Cloudera Data Platform (CDP) One, described as a data lakehouse software-as-a-service (SaaS) offering.
Topics: Business Intelligence, Data Governance, Data Management, Data, AI and Machine Learning, data operations, Analytics and Data, Streaming Data & Events, operational data platforms, Analytic Data Platforms
I have written before about the continued use of specialist operational and analytic data platforms. Most database products can be used for operational or analytic workloads, and the number of use cases for hybrid data processing is growing. However, a general-purpose database is unlikely to meet the most demanding operational or analytic data platform requirements. Factors including performance, reliability, security and scalability necessitate the use of specialist data platforms. I assert that through 2026, and despite increased demand for hybrid operational and analytic processing, more than three-quarters of data platform use cases will have functional requirements that encourage the use of specialized analytic or operational data platforms. It is for that reason that specialist database providers, including Ocient, continue to emerge with new and innovative approaches targeted at specific data-processing requirements.
The data catalog has become an integral component of organizational data strategies over the past decade, serving as a conduit for good data governance and facilitating self-service analytics initiatives. The data catalog has become so important, in fact, that it is easy to forget that just 10 years ago it did not exist in terms of a standalone product category. Metadata-based data management functionality has had a role to play within products for data governance and business intelligence for much longer than that, of course, but the emergence of the data catalog as a product category provided a platform for metadata-based data inventory and discovery that could span an entire organization, serving multiple departments, use cases and initiatives.
Breaking into the database market as a new vendor is easier said than done given the dominance of the sector by established database and data management giants, as well as the cloud computing providers. We recently described the emergence of a new breed of distributed SQL database providers with products designed to address hybrid and multi-cloud data processing. These databases are architecturally and functionally differentiated from both the traditional relational incumbents (in terms of global scalability) and the NoSQL providers (in terms of the relational model and transactional consistency). Having differentiated functionality is the bare minimum a new database vendor needs to make itself known in a such a crowded market, however.