Organizations across various industries collect multiple types of data from disparate systems to answer key business questions and deliver personalized experiences for customers. The expanding volume of data increases complexity, and data management becomes a challenge if the process is manual and rules-based. There can be numerous siloed, incomplete and outdated data sources that result in inaccurate results. Organizations must also deal with concurrent errors – from customers to products to suppliers – to create a complete view of the data. Many vendors, including Tamr, have turned to artificial intelligence and machine learning to overcome the challenges associated with maintaining data quality amid the growing volume and variety of data. I assert that by 2026, more than three-quarters of organizations’ data management processes will be enhanced with artificial intelligence and machine learning to increase automation, accuracy, agility and speed.
Despite the emphasis on organizations being more data-driven and making an increasing proportion of business decisions based on data and analytics, it remains the case that some of the most fundamental questions about an organization are difficult to answer using data and analytics. Ostensibly simple questions such as, “how many customers does the organization have?” can be fiendishly difficult to answer, especially for organizations with multiple business entities, regions, departments and applications. Increasing volumes and sources of data can hinder, rather than help. Only 1 in 5 participants (20%) in Ventana Research’s Analytics and Data Benchmark research are very confident in their organization’s ability to analyze the overall quantity of data. This is a perennial issue that data and application integration vendors, such as SnapLogic, are aiming to address – increasingly through automation and products for business users as well as data management professionals.
I am happy to share insights from our latest Ventana Research Value Index research, which assesses how well vendors’ offerings meet buyers’ requirements. The 2023 Analytic Data Platforms Value Index is the distillation of a year of market and product research by Ventana Research. Drawing on our Benchmark Research, we apply a structured methodology built on evaluation categories that reflect real-world criteria incorporated in a request for proposal to data platform vendors supporting the spectrum of analytic use-cases. Using this methodology, we evaluated vendor submissions in seven categories: five relevant to the Product Experience: Adaptability, Capability, Manageability, Reliability and Usability, and two related to the Customer Experience: Total Cost of Ownership/Return on Investment and Validation. This research-based index evaluates the full business and information technology value of analytic data platforms offerings. I encourage you to learn more about our Value Index and its effectiveness as a vendor selection and request for information/requestion for proposal tool.
Ventana Research recently published the 2023 Analytic Data Platforms Value Index. As organizations strive to be more data-driven, increasing reliance on data as a fundamental factor in business decision-making, the importance of the analytic data platform has never been greater. In this post, I’ll share some of my observations about how the analytic data platforms market is evolving.
I am happy to share insights from our latest Ventana Research Value Index research, which assesses how well vendors’ offerings meet buyers’ requirements. The 2023 Operational Data Platforms Value Index is the distillation of a year of market and product research by Ventana Research. Drawing on our Benchmark Research, we apply a structured methodology built on evaluation categories that reflect real-world criteria incorporated in a request for proposal to data platform vendors supporting the spectrum of operational use cases. Using this methodology, we evaluated vendor submissions in seven categories: five relevant to the Product Experience: Adaptability, Capability, Manageability, Reliability and Usability, and two related to the Customer Experience: Total Cost of Ownership/Return on Investment and Validation.
Ventana Research recently announced its 2023 Market Agenda for Data, continuing the guidance we have offered for two decades to help organizations derive optimal value and improve business outcomes.
Ventana Research recently published the 2023 Operational Data Platforms Value Index. The importance of the operational data platform has never been greater as organizations strive to be more data-driven, incorporating intelligence into operational applications via personalization and recommendations for workers, partners and customers. In this post, I’ll share some of my observations on how the operational data platforms market is evolving.
I am happy to share insights from our latest Ventana Research Value Index, which assesses how well vendors’ offerings meet buyers’ requirements. The 2023 Data Platforms Value Index is the distillation of a year of market and product research by Ventana Research. Drawing on our Benchmark Research, we apply a structured methodology built on evaluation categories that reflect real-world criteria incorporated in a request for proposal to data platform vendors that support the spectrum of operational and analytic use cases. Using this methodology, we evaluated vendor submissions in seven categories: five relevant to the Product Experience: Adaptability, Capability, Manageability, Reliability and Usability, and two related to the Customer Experience: Total Cost of Ownership/Return on Investment and Validation.
Data observability is a hot topic and trend. I have written about the importance of data observability for ensuring healthy data pipelines, and have covered multiple vendors with data observability capabilities, offered both as standalone and part of a larger data engineering system. Data observability software provides an environment that takes advantage of machine learning and DataOps to automate the monitoring of data quality and reliability. The term has been adopted by multiple vendors across the industry, and while they all have key functionality in common – including collecting and measuring metrics related to data quality and data lineage – there is also room for differentiation. A prime example is Acceldata, which takes a position that data observability requires monitoring not only data and data pipelines but also the underlying data processing compute infrastructure as well as data access and usage.
Having recently completed the 2023 Data Platforms Value Index, I want to share some of my observations about how the market is evolving. Although this is our inaugural assessment of the market for data platforms, the sector is mature and products from many of the vendors we assess can be used to effectively support operational and analytic use cases.
The shift from on-premises server infrastructure to cloud-based and software-as-a-service (SaaS) models has had a profound impact on the data and analytics architecture of many organizations in recent years. More than one-half of participants (59%) in Ventana Research’s Analytics and Data Benchmark research are deploying data and analytics workloads in the cloud, and a further 30% plan to do so. Customer demand for cloud-based consumption models has also had a significant impact on the products and services that are available from data and analytics vendors. Data platform providers, both operational and analytic, have had to adapt to changing customer demand. The initial response — making existing products available for deployment on cloud infrastructure — only scratched the surface in terms of responding to emerging expectations. We now see the next generation of products, designed specifically to deliver innovation by taking advantage of cloud-native architecture, being brought to market both by emerging startups, and established vendors, including InterSystems.
Topics: Business Intelligence, Cloud Computing, Data Management, Data, natural language processing, AI and Machine Learning, data operations, Analytics & Data, operational data platforms, Analytic Data Platforms
There is always space for innovation in the data platforms sector, and new vendors continue to emerge at regular intervals with new approaches designed to serve specialist data storage and processing requirements. Factors including performance, reliability, security and scalability provide a focal point for new vendors to differentiate from established vendors, especially for the most demanding operational or analytic data platform requirements. It is never easy, however, for developers of new data platform products to gain significant market traction, given the dominance of the established relational database vendors and cloud providers. Targeting requirements that are not well-served by general purpose data platforms can help new vendors get a toe in the door of customer accounts. The challenge to gaining further market traction is for new vendors to avoid having products become pigeon-holed as only being suitable for a niche set of requirements. This is precisely the problem facing the various distributed SQL database providers.
Earlier this year, I wrote about the increasing importance of data observability, an emerging product category that takes advantage of machine learning (ML) and Data Operations (DataOps) to automate the monitoring of data used for analytics projects to ensure its quality and lineage. Monitoring the quality and lineage of data is nothing new. Manual tools exist to ensure that it is complete, valid and consistent, as well as relevant and free from duplication. Data observability vendors, including Monte Carlo Data, have emerged in recent years with the goal of increasing the productivity of data teams and improving organizations’ trust in data using automation and artificial intelligence and machine learning (AI/ML).
One of the most significant considerations when choosing an analytic data platform is performance. As organizations compete to benefit most from being data-driven, the lower the time to insight the better. As data practitioners have learnt over time, however, lowering time to insight is about more than just high-performance queries. There are opportunities to improve time to insight throughout the analytics life cycle, which starts with data ingestion and integration, includes data preparation and data management, as well as data storage and processing, and ends with data visualization and analysis. Vendors focused on delivering the highest levels of analytic performance, such as SQream, understand that lowering time to insight relies on accelerating every aspect of that life cycle.
Organizations are increasingly utilizing cloud object storage as the foundation for analytic initiatives. There are multiple advantages to this approach, not least of which is enabling organizations to keep higher volumes of data relatively inexpensively, increasing the amount of data queried in analytics initiatives. I assert that by 2024, 6 in ten organizations will use cloud-based technology as the primary analytics data platform, making it easier to adopt and scale operations as necessary.
Almost all organizations are investing in data science, or planning to, as they seek to encourage experimentation and exploration to identify new business challenges and opportunities as part of the drive toward creating a more data-driven culture. My colleague, David Menninger, has written about how organizations using artificial intelligence and machine learning (AI/ML) report gaining competitive advantage, improving customer experiences, responding faster to opportunities and threats, and improving the bottom line with increased sales and lower costs. One-quarter of participants (25%) in Ventana Research’s Analytics and Data Benchmark Research are already using AI/ML, while more than one-third (34%) plan to do so in the next year, and more than one-quarter (28%) plan to do so eventually. As organizations adopt data science and expand their analytics initiatives, they face no shortage of options for AI/ML capabilities. Understanding which is the most appropriate approach to take could be the difference between success and failure. The cloud providers all offer services, including general-purpose ML environments, as well as dedicated services for specific use cases, such as image detection or language translation. Software vendors also provide a range of products, both on-premises and in the cloud, including general-purpose ML platforms and specialist applications. Meanwhile, analytic data platform providers are increasingly adding ML capabilities to their offerings to provide additional value to customers and differentiate themselves from their competitors. There is no simple answer as to which is the best approach, but it is worth weighing the relative benefits and challenges. Looking at the options from the perspective of our analytic data platform expertise, the key choice is between AI/ML capabilities provided on a standalone basis or integrated into a larger data platform.
I have previously written about growing interest in the data lakehouse as one of the design patterns for delivering hydroanalytics analysis of data in a data lake. Many organizations have invested in data lakes as a relatively inexpensive way of storing large volumes of data from multiple enterprise applications and workloads, especially semi- and unstructured data that is unsuitable for storing and processing in a data warehouse. However, early data lake projects lacked structured data management and processing functionality to support multiple business intelligence efforts as well as data science and even operational applications.
I have written recently about the similarities and differences between data mesh and data fabric. The two are potentially complementary. Data mesh is an organizational and cultural approach to data ownership, access and governance. Data fabric is a technical approach to automating data management and data governance in a distributed architecture. There are various definitions of data fabric, but key elements include a data catalog for metadata-driven data governance and self-service, agile data integration.
In their pursuit to be data-driven, organizations are collecting and managing more data than ever before as they attempt to gain competitive advantage and respond faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. As data is increasingly spread across multiple data centers, clouds and regions, organizations need to manage data on multiple systems in different locations and bring it together for analysis. As the data volumes increase and more data sources and data types are introduced in the organization, it creates challenges to storing, managing, connecting and analyzing the huge set of information that is spread across multiple locations. Having a strong foundation and scalable data management architecture in place can help alleviate many of the challenges organizations face when they are scaling and adding more infrastructure. We have written about the potential for hybrid and multi-cloud platforms to safeguard data across heterogenous environments, which plays to the strengths of companies, such as Actian, that provide a single environment with the ability to integrate, manage and process data across multiple locations.
I have written a few times in recent months about vendors offering functionality that addresses data orchestration. This is a concept that has been growing in popularity in the past five years amid the rise of Data Operations (DataOps), which describes more agile approaches to data integration and data management. In a nutshell, data orchestration is the process of combining data from multiple operational data sources and preparing and transforming it for analysis. To those unfamiliar with the term, this may sound very much like the tasks that data management practitioners having been undertaking for decades. As such, it is fair to ask what separates data orchestration from traditional approaches to data management. Is it really something new that can deliver innovation and business value, or just the rebranding of existing practices designed to drive demand for products and services?
Ventana Research’s Data Lakes Dynamics Insights research illustrates that while data lakes are fulfilling their promise of enabling organizations to economically store and process large volumes of raw data, data lake environments continue to evolve. Data lakes were initially based primarily on Apache Hadoop deployed on-premises but are now increasingly based on cloud object storage. Adopters are also shifting from data lakes based on homegrown scripts and code to open standards and open formats, and they are beginning to embrace the structured data-processing functionality that supports data lakehouse capabilities. These trends are driving the evolution of vendor product offerings and strategies, as typified by Cloudera’s recent launch of Cloudera Data Platform (CDP) One, described as a data lakehouse software-as-a-service (SaaS) offering.
Topics: Business Intelligence, Data Governance, Data Management, Data, AI and Machine Learning, data operations, Analytics and Data, Streaming Data & Events, operational data platforms, Analytic Data Platforms
I have written before about the continued use of specialist operational and analytic data platforms. Most database products can be used for operational or analytic workloads, and the number of use cases for hybrid data processing is growing. However, a general-purpose database is unlikely to meet the most demanding operational or analytic data platform requirements. Factors including performance, reliability, security and scalability necessitate the use of specialist data platforms. I assert that through 2026, and despite increased demand for hybrid operational and analytic processing, more than three-quarters of data platform use cases will have functional requirements that encourage the use of specialized analytic or operational data platforms. It is for that reason that specialist database providers, including Ocient, continue to emerge with new and innovative approaches targeted at specific data-processing requirements.
Earlier this year I described the growing use-cases for hybrid data processing. Although it is anticipated that the majority of database workloads will continue to be served by specialist data platforms targeting operational and analytic workloads respectively, there is increased demand for intelligent operational applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. There are multiple data platform approaches to delivering real-time data processing and analytics, including the use of streaming data and event processing and specialist, real-time analytic data platforms. We also see operational data platform providers, such as Aerospike, adding analytic processing capabilities to support these application requirements via hybrid operational and analytic processing.
I have recently written about the organizational and cultural aspects of being data-driven, and the potential advantages data-driven organizations stand to gain by responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. I have also explained that data-driven processes require more agile, continuous data processing, with an increased focus on extract, load and transform processes — as well as change data capture and automation and orchestration — as part of a DataOps approach to data management. Safeguarding the health of data pipelines is fundamental to ensuring data is integrated and processed in the sequence required to generate business intelligence. The significance of these data pipelines to delivering data-driven business strategies has led to the emergence of vendors, such as Astronomer, focused on enabling organizations to orchestrate data engineering pipelines and workflows.
The data catalog has become an integral component of organizational data strategies over the past decade, serving as a conduit for good data governance and facilitating self-service analytics initiatives. The data catalog has become so important, in fact, that it is easy to forget that just 10 years ago it did not exist in terms of a standalone product category. Metadata-based data management functionality has had a role to play within products for data governance and business intelligence for much longer than that, of course, but the emergence of the data catalog as a product category provided a platform for metadata-based data inventory and discovery that could span an entire organization, serving multiple departments, use cases and initiatives.
I recently wrote about the need for organizations to take a holistic approach to the management and governance of data in motion alongside data at rest. As adoption of streaming data and event processing increases, it is no longer sufficient for streaming data projects to exist in isolation. Data needs to be managed and governed regardless of whether it is processed in batch or as a stream of events. This requirement has resulted in established data management vendors increasing their focus on streaming data and event processing through product development as well as acquisitions. It has also resulted in streaming and event specialists, such as Confluent, adding centralized management and governance capabilities to their existing offerings as they seek to establish or reinforce the strategic importance of streaming data as part of a modern approach to data management.
I have written recently about increased demand for data-intensive applications infused with the results of analytic processes, such as personalization and artificial intelligence (AI)-driven recommendations. Almost one-quarter of respondents (22%) to Ventana Research’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. There are multiple data platform approaches to delivering real-time data processing and analytics and more agile data pipelines. These include the use of streaming and event data processing, as well as the use of hybrid data processing to enable analytics to be performed on application data within operational data platforms. Another approach, favored by a group of emerging vendors such as Rockset, is to develop these data-intensive applications on a specialist, real-time analytic data platform specifically designed to meet the performance and agility requirements of data-intensive applications.
I recently noted that as demand for real-time interactive applications becomes more pervasive, the use of streaming data is becoming more mainstream. Streaming data and event processing has been part of the data landscape for many decades, but for much of that time, data streaming was a niche activity. Although adopted in industry segments with high-performance, real-time data processing and analytics requirements such as financial services and telecommunications, data streaming was far less common elsewhere. That has changed significantly in recent years, fueled by the proliferation of open-source and cloud-based streaming data and event technologies that have lowered the cost and technical barriers to developing new applications able to take advantage of data in-motion. This is a trend we expect to continue, to the extent that streaming data and event processing becomes an integral part of mainstream data-processing architectures.
I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
I recently explained how emerging application requirements were expanding the range of use cases for NoSQL databases, increasing adoption based on the availability of enhanced functionality. These intelligent applications require a close relationship between operational data platforms and the output of data science and machine learning projects. This ensures that machine learning and predictive analytics initiatives are not only developed and trained based on the relationships inherent in operational applications, but also that the resulting intelligence is incorporated into the operational application in real time to support capabilities such as personalization, recommendations and fraud detection. Graph databases already support operational use cases such as social media, fraud detection, customer experience management and recommendation engines. Graph database vendors such as Neo4j are increasingly focused on the role that graph databases can play in supporting data scientists, enabling them to develop, train and run algorithms and machine learning models on graph data in the graph database, rather than extracting it into a separate environment.
Streaming data has been part of the industry landscape for decades but has largely been focused on niche applications in segments with the highest real-time data processing and analytics performance requirements, such as financial services and telecommunications. As demand for real-time interactive applications becomes more pervasive, streaming data is becoming a more mainstream pursuit, aided by the proliferation of open-source streaming data and event technologies, which have lowered the cost and technical barriers to developing new applications that take advantage of data in motion. Ventana Research’s Streaming Data Dynamic Insights enables an organization to assess its relative maturity in achieving value from streaming data. I assert that by 2024, more than one-half of all organizations’ standard information architectures will include streaming data and event processing, allowing organizations to be more responsive and provide better customer experiences.
When joining Ventana Research, I noted that the need to be more data-driven has become a mantra among large and small organizations alike. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. Being data-driven is clearly something to aspire to. However, it is also a somewhat vague concept without clear definition. We know data-driven organizations when we see them — the likes of Airbnb, DoorDash, ING Bank, Netflix, Spotify, and Uber are often cited as examples — but it is not necessarily clear what separates the data-driven from the rest. Data has been used in decision-making processes for thousands of years, and no business operates without some form of data processing and analytics. As such, although many organizations may aspire to be more data-driven, identifying and defining the steps required to achieve that goal are not necessarily easy. In this Analyst Perspective, I will outline the four key traits that I believe are required for a company to be considered data-driven.
Topics: embedded analytics, Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, natural language processing, data lakes, AI and Machine Learning, data operations, Digital Business, Streaming Analytics, data platforms, Analytics & Data, Streaming Data & Events
I recently wrote about the growing range of use cases for which NoSQL databases can be considered, given increased breadth and depth of functionality available from providers of the various non-relational data platforms. As I noted, one category of NoSQL databases — graph databases — are inherently suitable for use cases that rely on relationships, such as social media, fraud detection and recommendation engines, since the graph data model represents the entities and values and also the relationships between them. The native representation of relationships can also be significant in surfacing “features” for use in machine learning modeling. There has been a concerted effort in recent years by graph database providers, including TigerGraph, to encourage and facilitate the use of graph databases by data scientists to support the development, testing and deployment of machine learning models.
I previously described the concept of hydroanalytic data platforms, which combine the structured data processing and analytics acceleration capabilities associated with data warehousing with the low-cost and multi-structured data storage advantages of the data lake. One of the key enablers of this approach is interactive SQL query engine functionality, which facilitates the use of existing business intelligence (BI) and data science tools to analyze data in data lakes. Interactive SQL query engines have been in use for several years — many of the capabilities were initially used to accelerate analytics on Hadoop — but have evolved along with data lake initiatives to enable analysis of data in cloud object storage. The open source Presto project is one of the most prominent interactive SQL query engines and has been adopted by some of the largest digital-native organizations. Presto managed-services provider Ahana is on a mission to bring the advantages of Presto to the masses.
I previously explained how the data lakehouse is one of two primary approaches being adopted to deliver what I have called a hydroanalytic data platform. Hydroanalytics involves the combination of data warehouse and data lake functionality to enable and accelerate analysis of data in cloud storage services. The term data lakehouse has been rapidly adopted by several vendors in recent years to describe an environment in which data warehousing functionality is integrated into the data lake environment, rather than coexisting alongside. One of the vendors that has embraced the data lakehouse concept and terminology is Dremio, which recently launched the general availability of its Dremio Cloud data lakehouse platform.
As I recently described, it is anticipated that the majority of database workloads will continue to be served by specialist data platforms targeting operational and analytic workloads, albeit with growing demand for hybrid data processing use-cases and functionality. Specialist operational and analytic data platforms have historically been the since preferred option, but there have always been general-purpose databases that could be used for both analytic and operational workloads, with tuning and extensions to meet the specific requirements of each.
I recently wrote about the potential benefits of data mesh. As I noted, data mesh is not a product that can be acquired, or even a technical architecture that can be built. It’s an organizational and cultural approach to data ownership, access and governance. While the concept of data mesh is agnostic to the technology used to implement it, technology is clearly an enabler for data mesh. For many organizations, new technological investment and evolution will be required to facilitate adoption of data mesh. Meanwhile, the concept of the data fabric, a technology-driven approach to managing and governing data across distributed environments, is rising in popularity. Although I previously touched on some of the technologies that might be applicable to data mesh, it is worth diving deeper into the data architecture implications of data mesh, and the potential overlap with data fabric.
I recently described the use cases driving interest in hybrid data processing capabilities that enable analysis of data in an operational data platform without impacting operational application performance or requiring data to be extracted to an external analytic data platform. Hybrid data processing functionality is becoming increasingly attractive to aid the development of intelligent applications infused with personalization and artificial intelligence-driven recommendations. These applications can be used to improve customer service; engagement, detect and prevent fraud; and increase operational efficiency. Several database providers now offer hybrid data processing capabilities to support these application requirements. One of the vendors addressing this opportunity is SingleStore.
The server is a key component of enterprise computing, providing the functional compute resources required to support software applications. Historically, the server was so fundamentally important that it – along with the processor, or processor core – was also a definitional unit by which software was measured, priced and sold. That changed with the advent of cloud-based service delivery and consumption models.
Over a decade ago, I coined the term NewSQL to describe the new breed of horizontally scalable, relational database products. The term was adopted by a variety of vendors that sought to combine the transactional consistency of the relational database model with elastic, cloud-native scalability. Many of the early NewSQL vendors struggled to gain traction, however, and were either acquired or ceased operations before they could make an impact in the crowded operational data platforms market. Nonetheless, the potential benefits of data platforms that span both on-premises and cloud resources remain. As I recently noted, many of the new operational database vendors have now adopted the term “distributed SQL” to describe their offerings. In addition to new terminology, a key trend that separates distributed SQL vendors from the NewSQL providers that preceded them is a greater focus on developers, laying the foundation for the next generation of applications that will depend on horizontally scalable, relational-database functionality. Yugabyte is a case in point.
I recently described how the operational data platforms sector is in a state of flux. There are multiple trends at play, including the increasing need for hybrid and multicloud data platforms, the evolution of NoSQL database functionality and applicable use-cases, and the drivers for hybrid data processing. The past decade has seen significant change in the emergence of new vendors, data models and architectures as well as new deployment and consumption approaches. As organizations adopted strategies to address these new options, a few things remained constant – one being the influence and importance of Oracle. The company’s database business continues to be a core focus of innovation, evolution and differentiation, even as it expanded its portfolio to address cloud applications and infrastructure.
I recently wrote about the importance of data pipelines and the role they play in transporting data between the stages of data processing and analytics. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence. The concept of the data pipeline is nothing new of course, but it is becoming increasingly important as organizations adapt data management processes to be more data driven.
Topics: Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, Digital transformation, data lakes, AI and Machine Learning, data operations, Digital Business, data platforms, Analytics & Data, Streaming Data & Events
Data governance is an issue that impacts all organizations large and small, new and old, in every industry, and every region of the world. Data governance ensures that an organization’s data can be cataloged, trusted and protected, improving business processes to accelerate analytics initiatives and support compliance with regulatory requirements. Not all data governance initiatives will be driven by regulatory compliance; however, the risk of falling foul of privacy (and human rights) laws ensures that regulatory compliance influences data-processing requirements and all data governance projects. Multinational organizations must be cognizant of the wide variety of regional data security and privacy requirements, not least the European Union’s General Data Protection Regulation (GDPR). The GDPR became enforceable in 2018, protects the privacy of personal or professional data, and carries with it the threat of fines of up to 20 million euros ($22 million) or 4% of a company’s global revenue. Europe is not alone in regulating against the use of personally identifiable information (other similar regulations include The California Consumer Privacy Act) but Ventana Research’s Data Governance Benchmark Research illustrates that there are differing attitudes and approaches to data governance on either side of the Atlantic.
I recently described the growing level of interest in data mesh which provides an organizational and cultural approach to data ownership, access and governance that facilitates distributed data processing. As I stated in my Analyst Perspective, data mesh is not a product that can be acquired or even a technical architecture that can be built. Adopting the data mesh approach is dependent on people and process change to overcome traditional reliance on centralized ownership of data and infrastructure and adapt to its principles of domain-oriented ownership, data as a product, self-serve data infrastructure and federated governance. Many organizations will need to make technological changes to facilitate adoption of data mesh, however. Starburst Data is associated with accelerating analysis of data in data lakes but is also one of several vendors aligning their products with data mesh.
Data mesh is the latest trend to grip the data and analytics sector. The term has been rapidly adopted by numerous vendors — as well as a growing number of organizations —as a means of embracing distributed data processing. Understanding and adopting data mesh remains a challenge, however. Data mesh is not a product that can be acquired, or even a technical architecture that can be built. It is an organizational and cultural approach to data ownership, access and governance. Adopting data mesh requires cultural and organizational change. Data mesh promises multiple benefits to organizations that embrace this change, but doing so may be far from easy.
Topics: Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, Digital transformation, data lakes, data operations, Digital Business, data platforms, Analytics & Data, Streaming Data & Events
Despite widespread and increasing use of the cloud for data and analytics workloads, it has become clear in recent years that, for most organizations, a proportion of data-processing workloads will remain on-premises in centralized data centers or distributed-edge processing infrastructure. As we recently noted, as compute and storage are distributed across a hybrid and multi-cloud architecture, so, too, is the data it stores and relies upon. This presents challenges for organizations to identify, manage and analyze all the data that is available to them. It also presents opportunities for vendors to help alleviate that challenge. In particular, it provides a gap in the market for data-platform vendors to distinguish themselves from the various cloud providers with cloud-agnostic data platforms that can support data processing across hybrid IT, multi-cloud and edge environments (including Internet of Things devices, as well as servers and local data centers located close to the source of the data). Yellowbrick Data is one vendor that has seized upon that opportunity with its cloud Data Warehouse offering.
I recently examined how evolving functionality had fueled the adoption of NoSQL databases, recommending that organizations evaluate NoSQL databases when assessing options for data transformation and modernization efforts. This recommendation was based on the breadth and depth of functionality offered by NoSQL database providers today, which has expanded the range of use cases for which NoSQL databases are potentially viable. There remain a significant number of organizations that have not explored NoSQL databases as well as several workloads for which it is assumed NoSQL databases are inherently unsuitable. Given the advances in functionality, organizations would be well-advised to maintain up-to-date knowledge of available products and services and an understanding of the range of use cases for which NoSQL databases are a valid option.
The various NoSQL databases have become a staple of the data platforms landscape since the term entered the IT industry lexicon in 2009 to describe a new generation of non-relational databases. While NoSQL began as a ragtag collection of loosely affiliated, open-source database projects, several commercial NoSQL database providers are now established as credible alternatives to the various relational database providers, while all the major cloud providers and relational database giants now also have NoSQL database offerings. Almost one-quarter (22%) of respondents to Ventana Research’s Analytics and Data Benchmark Research are using NoSQL databases in production today, and adoption is likely to continue to grow. More than one-third (34%) of respondents are planning to adopt NoSQL databases within two years (21%) or are evaluating (14%) their potential use. Adoption has been accelerated by the evolving functionality offered by NoSQL products and services, the growing maturity of specialist NoSQL vendors, and new commercial offerings from cloud providers and established database providers alike. This evolution is exemplified by the changing meaning of the term NoSQL itself. While it was initially associated with a rejection of the relational database hegemony, it has retroactively been reinterpreted to mean “Not Only SQL,” reflecting the potential for these new databases to coexist with and complement established approaches.
As businesses become more data-driven, they are increasingly dependent on the quality of their data and the reliability of their data pipelines. Making decisions based on data does not guarantee success, especially if the business cannot ensure that the data is accurate and trustworthy. While there is potential value in capturing all data — good or bad — making decisions based on low-quality data may do more harm than good.
I recently described the emergence of hydroanalytic data platforms, outlining how the processes involved in generating energy from a lake or reservoir were analogous to those required to generate intelligence from a data lake. I explained how structured data processing and analytics acceleration capabilities are the equivalent of turbines, generators and transformers in a hydroelectric power station. While these capabilities are more typically associated with data warehousing, they are now being applied to data lake environments as well. Structured data processing and analytics acceleration capabilities are not the only things required to generate insights from data, however, and the hydroelectric power station analogy further illustrates this. For example, generating hydroelectric power also relies on pipelines to ensure that the water is transported from the lake or reservoir at the appropriate volume to drive the turbines. Ensuring that a hydroelectric power station is operating efficiently also requires the collection, monitoring and analysis of telemetry data to confirm that the turbines, generators, transformers and pipelines are functioning correctly. Similarly, generating intelligence from data relies on data pipelines that ensure the data is integrated and processed in the correct sequence to generate the required intelligence, while the need to monitor the pipelines and processes in data-processing and analytics environments has driven the emergence of a new category of software: data observability.
As I stated when joining Ventana Research, the socioeconomic impacts of the pandemic and its aftereffects have highlighted more than ever the differences between organizations that can turn data into insights and are agile enough to act upon it and those that are incapable of seeing or responding to the need for change. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. One of the key methods that accelerates business decision-making is reducing the lag between data collection and data analysis.
I recently described how the data platforms landscape will remain divided between analytic and operational workloads for the foreseeable future. Analytic data platforms are designed to store, manage, process and analyze data, enabling organizations to maximize data to operate with greater efficiency, while operational data platforms are designed to store, manage and process data to support worker-, customer- and partner-facing operational applications. At the same time, however, we see increased demand for intelligent applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. The need for real-time interactivity means that these applications cannot be served by traditional processes that rely on the batch extraction, transformation and loading of data from operational data platforms into analytic data platforms for analysis. Instead, they rely on analysis of data in the operational data platform itself via hybrid data processing capabilities to accelerate worker decision-making or improve customer experience.
Ventana Research recently announced its 2022 Market Agenda for Data, continuing the guidance we have offered for nearly two decades to help organizations derive optimal value and improve business outcomes.
Few trends have had a bigger impact on the data platforms landscape than the emergence of cloud computing. The adoption of cloud computing infrastructure as an alternative to on-premises datacenters has resulted in significant workloads being migrated to the cloud, displacing traditional server and storage vendors. Almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research currently use cloud computing products for analytics and data, and a further one-quarter plan to do so. In addition to deploying data workloads on cloud infrastructure, many organizations have also adopted cloud data and analytics services offered by the same cloud providers, displacing traditional data platform vendors. Organizations now have greater choice in relation to potential products and providers for data and analytics workloads, but also need to think about integrating services offered by cloud providers with established technology and processes. Having pioneered the concept, Amazon Web Services has arguably benefitted more than most from adoption of cloud computing, and is also in the process of expanding and adjusting its portfolio to alleviate challenges and encourage even greater adoption.
The need for data-driven decision-making requires organizations to transform not only the approach to business intelligence and data science but also accelerate the development of new operational applications that support greater business agility, enable cloud- and mobile-based consumption, and deliver more interactive and personalized experiences. To stay competitive, organizations need to prioritize the development of new, data-driven applications. As a result, many have been encouraged to invest in new data platforms designed to support agile development and cloud-based delivery. This is one of the factors driving the growth of MongoDB, and continues to drive the evolution of its document database into what is now described as a cloud-based application data platform.
The term NoSQL has been a misnomer ever since it appeared in 2009 to describe a group of emerging databases. It was true that a lack of support for Structured Query Language (SQL) was common to the various databases referred to as NoSQL. However, it was always one of a number of common characteristics, including flexible schema, distributed data processing, open source licensing, and the use of non-relational data models (key value, document, graph) rather than relational tables. As the various NoSQL databases have matured and evolved, many of them have added support for SQL terms and concepts, as well as the ability to support SQL format queries. Couchbase has been at the forefront of this effort, recognizing that to drive greater adoption of NoSQL databases in general (and its distributed document database in particular) it was wise to increase compatibility with the concepts, tools and skills that have dominated the database market for the past 50 years.
Data lakes have enormous potential as a source of business intelligence. However, many early adopters of data lakes have found that simply storing large amounts of data in a data lake environment is not enough to generate business intelligence from that data. Similarly, lakes and reservoirs have enormous potential as sources of energy. However, simply storing large amounts of water in a lake is not enough to generate energy from that water. A hydroelectric power station is required to harness and unleash the power-generating potential of a lake or reservoir, utilizing a combination of turbines, generators and transformers to convert the energy of the flowing water into electricity. A hydroanalytic data platform, the data equivalent of a hydroelectric power station, is required to harness and unleash the intelligence-generating potential of a data lake.
As I noted when joining Ventana Research, the range of options faced by organizations in relation to data processing and analytics can be bewildering. When it comes to data platforms, however, there is one fundamental consideration that comes before all others: Is the workload primarily operational or analytic? Although most database products can be used for operational or analytic workloads, the market has been segmented between products targeting operational workloads, and those targeting analytic workloads for almost as long as there has been a database market.
Breaking into the database market as a new vendor is easier said than done given the dominance of the sector by established database and data management giants, as well as the cloud computing providers. We recently described the emergence of a new breed of distributed SQL database providers with products designed to address hybrid and multi-cloud data processing. These databases are architecturally and functionally differentiated from both the traditional relational incumbents (in terms of global scalability) and the NoSQL providers (in terms of the relational model and transactional consistency). Having differentiated functionality is the bare minimum a new database vendor needs to make itself known in a such a crowded market, however.
It has been clear for some time that future enterprise IT architecture will span multiple cloud providers as well as on-premises data centers. As Ventana Research noted in the market perspective on data architectures, the rapid adoption of cloud computing has fragmented where data is accessed or consolidated. We are already seeing that almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research are using cloud computing for analytics and data, of which 42% are currently using more than one cloud provider.
Enterprises looking to adopt cloud-based data processing and analytics face a disorienting array of data storage, data processing, data management and analytics offerings. Departmental autonomy, shadow IT, mergers and acquisitions, and strategic choices mean that most enterprises now have the need to manage data across multiple locations, while each of the major cloud providers and data and analytics vendors has a portfolio of offerings that may or may not be available in any given location. As such, the ability to manage and process data across multiple clouds and data centers is a growing concern for large and small enterprises alike. Almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research study are using cloud computing for analytics and data, of which 42% are currently using more than one cloud provider.
I am very happy to announce that I have joined Ventana Research to help lead the expertise area of Digital Technology, including Analytics and Data, Cloud Computing, Artificial Intelligence and Machine Learning, the Internet of Things, Robotic Automation, and Collaborative and Conversational Computing. While the breadth of applications and technology covered by our Digital Technology practice is broad, I will naturally make use of my decades of experience covering data platforms and analytics to help organizations improve the readiness and resilience of business and IT operations.
Topics: Digital Technology