When joining Ventana Research, I noted that the need to be more data-driven has become a mantra among large and small organizations alike. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. Being data-driven is clearly something to aspire to. However, it is also a somewhat vague concept without clear definition. We know data-driven organizations when we see them — the likes of Airbnb, DoorDash, ING Bank, Netflix, Spotify, and Uber are often cited as examples — but it is not necessarily clear what separates the data-driven from the rest. Data has been used in decision-making processes for thousands of years, and no business operates without some form of data processing and analytics. As such, although many organizations may aspire to be more data-driven, identifying and defining the steps required to achieve that goal are not necessarily easy. In this Analyst Perspective, I will outline the four key traits that I believe are required for a company to be considered data-driven.
Topics: embedded analytics, Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, natural language processing, data lakes, AI and Machine Learning, data operations, Digital Business, Streaming Analytics, data platforms, Analytics & Data, Streaming Data & Events
I previously described the concept of hydroanalytic data platforms, which combine the structured data processing and analytics acceleration capabilities associated with data warehousing with the low-cost and multi-structured data storage advantages of the data lake. One of the key enablers of this approach is interactive SQL query engine functionality, which facilitates the use of existing business intelligence (BI) and data science tools to analyze data in data lakes. Interactive SQL query engines have been in use for several years — many of the capabilities were initially used to accelerate analytics on Hadoop — but have evolved along with data lake initiatives to enable analysis of data in cloud object storage. The open source Presto project is one of the most prominent interactive SQL query engines and has been adopted by some of the largest digital-native organizations. Presto managed-services provider Ahana is on a mission to bring the advantages of Presto to the masses.
I previously explained how the data lakehouse is one of two primary approaches being adopted to deliver what I have called a hydroanalytic data platform. Hydroanalytics involves the combination of data warehouse and data lake functionality to enable and accelerate analysis of data in cloud storage services. The term data lakehouse has been rapidly adopted by several vendors in recent years to describe an environment in which data warehousing functionality is integrated into the data lake environment, rather than coexisting alongside. One of the vendors that has embraced the data lakehouse concept and terminology is Dremio, which recently launched the general availability of its Dremio Cloud data lakehouse platform.
I recently wrote about the importance of data pipelines and the role they play in transporting data between the stages of data processing and analytics. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence. The concept of the data pipeline is nothing new of course, but it is becoming increasingly important as organizations adapt data management processes to be more data driven.
Topics: Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, Digital transformation, data lakes, AI and Machine Learning, data operations, Digital Business, data platforms, Analytics & Data, Streaming Data & Events
I recently described the growing level of interest in data mesh which provides an organizational and cultural approach to data ownership, access and governance that facilitates distributed data processing. As I stated in my Analyst Perspective, data mesh is not a product that can be acquired or even a technical architecture that can be built. Adopting the data mesh approach is dependent on people and process change to overcome traditional reliance on centralized ownership of data and infrastructure and adapt to its principles of domain-oriented ownership, data as a product, self-serve data infrastructure and federated governance. Many organizations will need to make technological changes to facilitate adoption of data mesh, however. Starburst Data is associated with accelerating analysis of data in data lakes but is also one of several vendors aligning their products with data mesh.