You are currently browsing the tag archive for the ‘Hadoop’ tag.
Just a few years ago, the prevailing view in the software industry was that the category of business intelligence (BI) was mature and without room for innovation. Vendors competed in terms of feature parity and incremental advancements of their platforms. But since then business intelligence has grown to include analytics, data discovery tools and big data capabilities to process huge volumes and new types of data much faster. As is often the case with change, though, this one has created uncertainty. For example, only one in 11 participants in our benchmark research on big data analytics said that their organization fully agrees on the meaning of the term “big data analytics.”
There is little question that clear definitions of analytics and business intelligence as they are used in business today would be of value. But some IT analyst firms have tried to oversimplify the process of updating these definitions by merely combining a market basket of discovery capabilities under the label of analytics. In our estimation, this attempt is neither accurate nor useful. Discovery tools are only components of business intelligence, and their capabilities cannot accomplish all the tasks comprehensive BI systems can do. Some firms seem to want to reduce the field further by overemphasizing the visualization aspect of discovery. While visual discovery can help users solve basic business problems, other BI and analytic tools are available that can attack more sophisticated and technically challenging problems. In our view, visual discovery is one of four types of analytic discovery that can help organizations identify and understand the masses of data they accumulate today. But for many organizations visualization alone cannot provide them with the insights necessary to help make critical decisions, as interpreting the analysis requires expertise that mainstream business professionals lack.
In Ventana Research’s view, business intelligence is a technology managed by IT that is designed to produce information and reports from business data to inform business about the performance of activities, people and processes. It has provided and will continue to provide great value to business, but in itself basic BI will not meet the new generation of requirements that businesses face; they need not just information but guidance on how to take advantage of opportunities, address issues and mitigate the risks of subpar performance. Analytics is a component of BI that is applied to data to generate information, including metrics. It is a technology-based set of methodologies used by analysts as well as the information gained through the use of tools designed to help those professionals. These thoughtfully crafted definitions inform the evaluation criteria we apply in our new and comprehensive 2015 Analytics and Business Intelligence Value Index, which we will publish soon. As with all business tools, applications and systems we assess in this series of indexes, we evaluate the value of analytic and business intelligence tools in terms of five functional categories – usability, manageability, reliability, capability and adaptability – and two customer assurance categories – validation of the vendor and total cost of ownership and return on investment (TCO/ROI). We feature our findings in these seven areas of assessment in our Value Index research and reports. In the Analytics and Business Intelligence Value Index for 2015 we assess in depth the products of 15 of the leading vendors in today’s BI market.
The Capabilities category examines the breadth of functionality that products offer and assesses their ability to deliver the insights today’s enterprises need. For our analysis we divide this category into three subcategories for business intelligence: data, analytics and optimization. We explain each of them below.
The data subcategory of Capabilities examines data access and preparation along with supporting integration and modeling. New data sources are coming into being continually; for example, data now is generated in sensors in watches, smartphones, cars, airplanes, homes, utilities and an assortment of business, network, medical and military equipment. In addition, organizations increasingly are interested in behavioral and attitudinal data collected through various communication platforms. Examples include Web browser behavior, data mined from the Internet, social media and various survey and community polling data. The data access and integration process identifies each type of data, integrates it with all other relevant types, checks it all for quality issues, maps it back to the organization’s systems of record and master data, and manages its lineage. Master data management in particular, including newer approaches such as probabilistic matching, is a key component for creating a system that can combine data types across the organization and in the cloud to create a common organizational vernacular for the use of data.
Ascertaining which systems must be accessed and how is a primary challenge for today’s business intelligence platforms. A key part of data access is the user interface. Whether it appears in an Internet browser, a laptop, a smartphone, a tablet or a wearable device, data must be presented in a manner optimized for the interface. Examining the user interface for business intelligence systems was a primary interest of our 2014 Mobile Business Intelligence Value Index. In that research, we learned that vendors are following divergent paths and that it may be hard for some to change course as they continue. Therefore how a vendor manages mobile access and other new means impacts its products’ value for particular organizations.
Once data is accessed, it must be modeled in a useful way. Data models in the form of OLAP cubes and predefined relationships of data sometimes grow overly complex, but there is value in premodeling data in ways that make sense to business people, most of whom are not up to modeling it for themselves. Defining data relationships and transforming data through complex manipulations is often needed, for instance, to define performance indicators that align with an organization’s business initiatives. These manipulations can include business rules or what-if analysis within the context of a model or external to it. Finally, models must be flexible so they do not hinder the work of organizational users. The value of premodeling data is that it provides a common view for business users so they need not redefine data relationships that have already been thoroughly considered.
The analytics subcategory includes analytic discovery, prediction and integration. Discovery and prediction roughly map to the ideas of exploratory and confirmatory analytics, which I have discussed. Analytic discovery includes calculation and visualization processes that enable users to move quickly and easily through data to create the types of information they need for business purposes. Complementing it is prediction, which typically follows discovery. Discovery facilitates root-cause and historical analysis, but to look ahead and make decisions that produce desired business outcomes, organizations need to track various metrics and make informed predictions. Analytic integration encompasses customization of both discovery and predictive analytics and embedding them in other systems such as applications and portals.
The optimization subcategory includes collaboration, organizational management, information optimization, action and automation. Collaboration is a key consideration for today’s analytic platforms. It includes the ability to publish, share and coordinate various analytic and business intelligence functions. Notably, some recently developed collaboration platforms incorporate many of the characteristics of social platforms such as Facebook or LinkedIn. Organizational management attempts to manage to particular outcomes and sometimes provides performance indicators and scorecard frameworks. Action assesses how technology directly assists decision-making in an operational context. This includes gathering inputs and outputs for collaboration before and after a decision, predictive scoring that prescribes action and delivery of the information in the correct form to the decision-maker. Finally, automation triggers alerts in circumstances based on statistical triggers or rules and should be managed as part of a workflow. Agent technology takes automation to a level that is more proactive and autonomous.
This broad framework of data, analytics and optimization fits with a process orientation to business analytics that I have discussed. Our benchmark research on information optimization indicates that the people and process dimensions of performance are less well developed than the information and technology aspects, and thus a focus on these aspects of business intelligence and analytics will be beneficial.
In our view, it’s important to consider business intelligence software in a broad business context rather than in artificially separate categories that are designed for IT only. We advise organizations seeking to gain a competitive edge to adopt a multifaceted strategy that is business-driven, incorporates a complete view of BI and analytics, and uses the comprehensive evaluation criteria we apply.
VP and Research Director
In many organizations, advanced analytics groups and IT are separate, and there often is a chasm of understanding between them, as I have noted. A key finding in our benchmark research on big data analytics is that communication and knowledge sharing is a top benefit of big data analytics initiatives, but often it is a latent benefit. That is, prior to deployment, communication and knowledge sharing is deemed a marginal benefit, but once the program is deployed it is deemed a top benefit. From a tactical viewpoint, organizations may not spend enough time defining a common vocabulary for big data analytics prior to starting the program; our research shows that fewer than half of organizations have agreement on the definition of big data analytics. It makes sense therefore that, along with a technical infrastructure and management processes, explicit communication processes at the beginning of a big data analytics program can increase the chance of success. We found these qualities in the Chorus platform of Alpine Data Labs, which received the Ventana Research Technology Innovation Award for Predictive Analytics in September 2014.
Alpine Chorus 5.0, the company’s flagship product, addresses the big data analytics communication challenge by providing a user-friendly platform for multiple roles in an organization to build and collaborate on analytic projects. Chorus helps organizations manage the analytic life cycle from discovery and data preparation through model development and model deployment. It brings together analytics professionals via activity streams for rapid collaboration and workspaces that encourage projects to be managed in a uniform manner. While activity streams enable group communication via short messages and file sharing, workspaces allow each analytic project to be managed separately with capabilities for project summary, tracking and data source mapping. These functions are particularly valuable as organizations embark on multiple analytic initiatives and need to track and share information about models as well as the multitude of data sources feeding the models.
The Alpine platform addresses the challenge of processing big data by parallelizing algorithms to run across big data platforms such as Hadoop and making it accessible by a wide audience of users. The platform supports most analytic databases and all major Hadoop distributions. Alpine was an early adopter of Apache Spark, an open source in-memory data processing framework that one day may replace the original map-reduce processing paradigm of Hadoop. Alpine Data Labs has been certified by Databricks, the primary contributor to the Spark project, which is responsible for 75 percent of the code added in the past year. With Spark, Alpine’s analytic models such as logistic regression run in a fraction of the time previously possible and new approaches, such as one the company calls Sequoia Forest, a machine learning approach that is a more robust version of random forest analysis. Our big data analytics research shows that predictive analytics is a top priority for about two-thirds (64%) of organizations, but they often lack the skills to deploy a fully customized approach. This is likely a reason that companies now are looking for more packaged approaches to implementing big data analytics (44%) than custom approaches (36%), according to our research. Alpine taps into this trend by delivering advanced analytics directly in Hadoop and the HDFS file system with its in-cluster analytic capabilities that address the complex parallel processing tasks needed to run in distributed environments such as Hadoop.
A key differentiator for Alpine is usability. Its graphical user interface provides a visual analytic workflow experience built on popular algorithms to deliver transformation capabilities and predictive analytics on big data. The platform supports scripts in the R language, which can be cut and pasted into the workflow development studio; custom operators for more advanced users; and Predictive Model Markup Language (PMML), which enables extensible model sharing and scoring across different systems. The complexities of the underlying data stores and databases as well as the orchestration of the analytic workflow are abstracted from the user. Using it an analyst or statistician does not need to know programming languages or the intricacies of the database technology to build analytic models and workflows.
It will be interesting to see what direction Alpine will take as the big data industry continues to evolve; currently there are many point tools, each strong in a specific area of the analytic process. For many of the analytic tools currently available in the market, co-opetition among vendors prevails in which partner ecosystems compete with stack-oriented approaches. The decisions vendors make in terms of partnering as well as research and development are often a function of these market dynamics, and buyers should be keenly aware of who aligns with whom. For example, Alpine currently partners with Qlik and Tableau for data visualization but also offers its own data visualization tool. Similarly, it offers data transformation capabilities, but its toolbox could be complimented by data preparation and master data solutions. This emerging area of self-service data preparation is important to line-of-business analysts, as my colleague Mark Smith recently discussed.
Alpine Labs is one of many companies that have been gaining traction in the booming analytics market. With a cadre of large clients and venture capital backing of US$23 million in series A and B, Alpine competes in an increasingly crowded and diverse big data analytics market. The management team includes industry veterans Joe Otto and Steve Hillion. Alpine seems to be particularly well suited for customers that have a clear understanding of the challenges of advanced analytics and are committed to using it with big data to gain a competitive advantage. This benefit is what organizations find most in over two thirds (68%) of organizations according to our predictive analytics benchmark research. A key differentiator for Alpine Labs is the collaboration platform, which helps companies clear the communication hurdle discussed above and address the advanced analytics skills gap at the same time. The collaboration assets embedded into the application and the usability of the visual workflow process enable the product to meet a host of needs in predictive analytics. This platform approach to analytics is often missing in organizations grounded in individual processes and spreadsheet approaches. Companies seeking to use big data with advanced analytics tools should include Alpine Labs in their consideration.
VP and Research Director