PentahoWorld 2015, Pentaho’s second annual user conference, held in mid-October, centered on the general availability of release 6.0 of its data integration and analytics platform and its acquisition by Hitachi Data Systems (HDS) earlier this year. Company spokespeople detailed the development of the product in relation to the roadmap laid out in 2014 and outlined plans for its integration with those of HDS and its parent Hitachi. They also discussed Pentaho’s and HDS’s shared intentions regarding the Internet of Things (IoT), particularly in telecommunications, healthcare, public infrastructure and IT analytics.
Pentaho competes on the basis of what it calls a “streamlined data refinery” that enables a flexible way to access, transform and integrate data and embed and present analytic data sets in usable formats without writing new code. In addition, it integrates a visual analytic workflow interface with a business intelligence front end including customization extensions; this is a differentiator for the company since much of the self-serve analytics market in which it competes is still dominated by separate point products.
Pentaho 6 aims to provide manageable and scalable self-service analytics. A key advance in the new version is what Pentaho calls “virtualized data sets” that logically aggregate multiple data sets according to transformations and integration specified by the Pentaho Data Integration (PDI) analytic workflow interface. This virtual approach allows the physical processing to be executed close to the data in various systems such as Hadoop or an RDBMS, which relieves users of the burden of having to continually move data back and forth between the query and the response systems. In this way, logical data sets can be served up for consumption in Pentaho Analytics as well as other front-end interfaces in a timely and flexible manner.
One challenge that emerges when accessing multiple integrated and transformed data sets is data lineage. Tracking its lineage is important to establish trust in the data among users by enabling them to ascertain the origin of data prior to transformation and integration. This is particularly useful in regulated industries that may need access to and tracking of source data to prove compliance. This becomes even more complicated with events and completely sourcing them along with the large number of them as found in over a third of organizations in our operational intelligence benchmark research that examined operational centric analytics and business intelligence.
Similarly, Pentaho 6 uses Simple Network Management Protocol (SNMP) to deliver application programming interface (API) extensions so that third-party tools can help provide governance lower in the system stack to further enable reliability of data. Our benchmark research consistently shows that manageability of systems is important for user organizations and in particular for big data environments.
The flexibility introduced with virtual tables and improvements in Pentaho 6.0 around in-line modeling (a concept I discussed after last year’s event are two critical means to building self-service analytic environments. Marrying various data systems with different data models, sometimes referred to as big data integration, has proven to be a difficult challenge in such environments. Pentaho’s continued focus on big data integration and providing an integration backbone to the many business intelligence tools (in addition to its own) are potential competitive differentiators for the company. While analysts and users prefer integrated tool sets, today’s fragmented analytics market is increasingly dominated by separate tools that prepare data and surface data for consumption. Front-end tools alone cannot automate the big data integration process, which Pentaho PDI can do. Our research into big data integration shows the importance of eliminating manual tasks in this process: 78 percent of companies said it is important or very important to automate their big data integration processes. Pentaho’s ability to integrate with multiple visual analytics tools is important for the company, especially in light of the HDS accounts, which likely have a variety of front-end tools. In addition, the ability to provide an integrated front end can be attractive to independent software vendors, analytics services providers and certain end-user organizations that would like to embed both integration and visualization without having to license multiple products.
Going forward, Pentaho is focused on joint opportunities with HDS such as the emerging Internet of Things. Pentaho cites established industrial customers such as Halliburton, Intelligent Mechatonic Systems and Kirchoff Datensysteme Software as reference accounts for IoT. In addition, a conference participant from Caterpillar Marine Asset Intelligence shared how it embeds Pentaho to help analyze and predict equipment failure on maritime equipment. Pentaho’s ability to integrate and analyze multiple data sources is key to delivering business value in each of these environments, but the company also possesses a little-known asset in the Weka machine learning library, which is an integrated part of the product suite. Our research on next-generation predictive analytics finds that Weka is used by 5 percent of organizations, and many of the companies that use it are large or very large, which is Pentaho’s target market. Given the importance of machine learning in the IoT category, it will be interesting to see how Pentaho leverages this asset.
Also at the conference, an HDS spokesperson discussed its target markets for IoT or what the company calls “social innovation.” These markets include telecommunications, healthcare, public infrastructure and IT analytics and reflect HDS’s customer base and the core businesses of its parent company Hitachi. Pentaho Data Integration is currently embedded within major customer environments such as Caterpillar, CERN, FINRA, Halliburton, NASDAQ, Sears and Staples, but not all of these companies fit directly into the IoT segments HDS outlined. While Hitachi’s core businesses provide a fertile ground in which grow its business, Pentaho will need to develop integration with the large industrial control systems already in place in those organizations.
The integration of Pentaho into HDS is a key priority. The 2,000-strong global sales force of HDS is now incented to sell Pentaho, and it will be important for the reps to include it as they discuss their accounts’ needs. While Pentaho’s portfolio can potentially broaden sales opportunities for HDS, big data software is a more consultative sale than the price-driven hardware and systems that the sales force may be used to. Furthermore, the buying centers, which are shifting from IT to lines of business, can be significantly different based on the type of organization and their objectives. To address this will require significant training within the HDS sales force and with partner consulting channels. The joint sales efforts will be well served by emphasizing the “big data blueprints” developed by Pentaho over the last couple of years and developing of new ones that speak to IoT and the combined capabilities of the two companies.
HDS says it will begin to embed Pentaho into its product portfolio but has promised to leave Pentaho’s roadmap intact. This is important because Pentaho has done a good job of listening to its customers and addressing the complexities that exist in big data and open source environments. As the next chapter unfolds, I will be looking at how the company integrates its platform with the HDS portfolio and expands it to deal with the complexities of IoT, which we will be investigating in upcoming benchmark research study.
For organizations that need to use large-scale integrated data sets, Pentaho provides one of the most flexible yet mature tools in the market, and they should consider it. The analytics tool provides an integrated and embeddable front end that should be of particular interest to analytics services providers and independent software vendors seeking to make information management and data analytics core capabilities. For existing HDS customers, the Pentaho portfolio will open conversations in new areas of those organizations and potentially add considerable value within accounts.
VP and Research Director