Ventana Research Analyst Perspectives

SAP Data Hub Orchestrates Data for Business and IT

Written by David Menninger | Dec 14, 2017 4:22:13 PM

I recently attended SAP TechEd in Las Vegas to hear the latest from the company regarding its analytics and business intelligence offerings as well as its data management platform. The company used the event to launch SAP Data Hub and made several other data and analytics announcements that I’ll cover below.

SAP Data Hub is designed to help organizations deal with the variety of databases and data management tasks they face. The data landscape has become more complex, and it has become the norm for organizations to use multiple types of data-storage technologies. My recent perspective on Hadoop adoption suggests that 20 to 25 percent of organizations are using Hadoop in production environments to store and manage big data. These findings are based on six years of benchmark research, which also shows that in-memory databases such as SAP HANA and NoSQL databases are being used in similar percentages.

Data Hub enables organizations to access data from these diverse sources without requiring that the data be consolidated into a single repository. Its tasks operate on the data where it is, pushing down the execution of data operations where possible to source systems and combining results from these various systems. However, SAP emphasized that Data Hub is not a data virtualization product. I have written about data virtualization previously in this analyst perspective.

Data Hub provides three types of capabilities: managing and monitoring, data discovery and data pipelining. The management and monitoring tools enable users to connect to a variety of SAP and non-SAP data sources such as Hadoop and Amazon S3. Users can schedule and monitor the execution of data-related tasks.

The discovery capabilities allow users to connect to data sources and browse their contents. In addition, Data Hub provides data profiling statistics such as histograms, minimum and maximum values, and percent of null values to help users better understand the data.

The data pipelining components enable users to construct multi-step flows of data from origination to consumption. Three types of components are provided: connectors, flows and processes. Connectors establish links to data sources. Flows can be used to route the data to multiple components, for instance to create sample data sets to train and test models. Processes are used to enrich and transform the data. Transformations range from simple joins and filtering to machine learning, natural language processing and image processing. SAP expects the typical Data Hub pipeline will ingest data, cleanse it, transform it and then deliver the data to an application.

An SAP press release describes these capabilities as establishing a new software category. However, I consider many of these features data preparation tasks that I have written about previously and that we studied in our benchmark research. SAP approaches these tasks from a slightly different perspective, emphasizing the orchestration of data-related tasks. SAP describes Data Hub as complementing its self-service data preparation product, SAP Agile Data Preparation. Our research shows that data hub features are important aspects of data preparation with nearly half (48%) of organizations citing reusable tasks and 41 percent citing graphical workflows as critical capabilities.  We expect to see SAP bringing these sets of features together in the future rather than emphasizing them as distinct sets of capabilities.

SAP also made announcements about enhancements to its cloud platform and analytics capabilities. SAP Cloud Platform is now available as a beta release on Google Cloud Platform, rounding out support for the major public cloud providers including Amazon Web Service and Microsoft Azure in addition to SAP’s own data centers, which were expanded to Toronto and Moscow. SAP has made multiple investments in openness for its Cloud Platform. The product, which offers platform-as-a-service capabilities, is based on the open source Cloud Foundry project. SAP announced it has joined the Cloud Native Computing Foundation, [cncf.io] which hosts the Kubernetes project, and the Open API initiative, which focuses on standardizing REST APIs. The company also announced Cloud Platform support for SAP’s programming language, ABAP, which drew cheers and applause from TechEd attendees cheered by this indication of its continued widespread usage.

The company’s upgraded analytics platform has many components. SAP Analytics Cloud, the company’s cloud-based analytics offering, now supports live connections to more data sources such as SAP Business Warehouse and SAP S/4 HANA. And in this Year of Machine Learning, SAP has incorporated machine learning capabilities into data discovery and data preparation tasks, as have others in the industry. The product also has tighter integration with SAP Business Planning and Consolidation, including write-back capabilities, an area often ignored by other analytics products. SAP Digital Boardroom, built on the Analytics Cloud, has a new presentation builder to guide the flow of meetings. It also has new or enhanced content for four industries and four lines of business, making a total of 11 industries supported and 10 lines of business. Another component of the analytics platform, the on-premises data visualization and discovery product SAP Lumira, has been enhanced with a redesigned user interface in Version 2.0.

My colleague Rob Kugel has written about SAP Leonardo, the company’s “digital innovation system.” Leonardo includes, among other things, business computing services for machine learning and artificial intelligence. SAP now offers 14 industry accelerators and plans to offer more in the future. Accelerators are fixed-price engagements to deliver a functional pilot, a technical blueprint for implementation, a business case and a plan for full implementation. Given the scarcity of data science resources, these accelerators could help fill a critical need in many organizations. Our Next Generation Predictive Analytics benchmark research shows that more than three in five organizations (62%) don’t have enough skilled resources to deliver these types of analytics.

TechEd showed that SAP continues to invest and make progress in analytics, delivering breadth and depth that can help organizations improve their business performance. The portfolio can be complex and overwhelming. However, as I have written previously, large software vendors deserve some respect for tackling the challenges of making systems that work reliably in mission-critical business processes. SAP Data Hub and the other enhancements announced at TechEd help address some of those issues.

Regards,

David Menninger

SVP & Research Director

Follow Me on Twitter and Connect with me on LinkedIn.