You are currently browsing the tag archive for the ‘Hadoop’ tag.
While covering providers of business analytics software, it is also interesting for me to look at some that focus on the people, process and implementation aspects in big data and analytics. One such company is Nuevora, which uses a flexible platform to provide customized analytic solutions. I recently met the company’s founder, Phani Nagarjuna, when we appeared on a panel at the Predictive Analytics World conference in San Diego.
Nuevora focuses on big data and analytics from the perspective of the analytic life cycle; that is, it helps companies bring together data and process, visualize and model the data to reach specific business outcomes. Nuevora aims to package implementations of analytics for vertical industries by putting together data sources and analytical techniques, and designing the package to be consumed by a target user group. While the core of the analytic service may be the same within an industry category, each solution is customized to the particulars of the client and its view of the market. Using particular information sources and models depending on their industry, customers can take advantage of advances in big data and analytics including new data sources and technologies. For its part Nuevora does not have to reinvent the wheel for each engagement. It has established patterns of data processing and prebuilt predictive analytics apps that are based on best practices and designed to solve specific problems within industry segments.
The service is currently delivered via a managed service on Nuevora servers called the Big Data Analytics & Apps Platform (nBAAP), but the company’s roadmap calls for more of a software as a service (SaaS) delivery model. Currently nBAAP uses Hadoop for data processing, R for predictive analytics and Tableau for visualizations. This approach brings together best-of-breed point solutions to address specific business issues. As a managed service, it has flexibility in design, and the company can reuse existing SAS and SPSS code for predictive models and can integrate with different BI tools depending on the customer’s environment.
Complementing the nBAAP approach is the Big Data & Analytics Maturity (nBAM) Assessment Framework. This is an industry-based consulting framework that guides companies through their analytic planning process by looking at organizational goals and objectives, establishing a baseline of the current environment, and putting forward a plan that aligns with the analytical frameworks and industry-centric approaches in nBAAP.
From an operating perspective, Nagarjuna, a native of India, taps analytics talent from universities there and places strategic solution consultants in client-facing roles in the United States. The company focuses primarily on big data analytics in marketing, which makes sense since, according to our benchmark research on predictive analytics, revenue-generating functions such as forecasting (cited by 72% of organizations) and marketing (67%) are the two primary use cases for predictive analytics. Nuevora has mapped multiple business processes related to processes such as gaining a 360-degree view of the customer. For example, at a high-level, it divides marketing into areas such as retention, cross-sell and up-sell, profitability and customer lifetime value. These provide building blocks for the overall strategy of the organization, and each can be broken down into finer divisions, linkages and algorithms based on the industry. These building blocks also serve as the foundation for the deployment patterns of raw data and preselected data variables, metrics, models, visuals, model update guidelines and expected outcomes.
By providing preprocessing capabilities that automatically produce the analytic data set, then providing updated and optimized models, and finally enabling consumption of these models through the relevant user paradigm, Nuevora addresses some of the key challenges in analytics today. The first is data preparation, which our research shows takes from 40 to 60 percent of analysts’ time. The second is addressing outdated models. Our research on predictive analytics shows that companies that update their models often are much more satisfied with them than are those that do not. While the appropriate timing of model updates is relative to the business context and market changes, our research shows that about one month is optimal.
Midsize or larger companies looking to take advantage of big data and analytics matched with specific business outcomes, without having to hire data scientists and build a full solution internally, should consider Nuevora.
VP and Research Director
Datameer , a Hadoop-based analytics company, had a major presence at recent Hadoop Summit, led by CEO Stephan Groschupf’s keynote and panel appearance. Besides announcing its latest product release, which is an important advance for the company and its users, Datameer’s outspoken CEO put forth contrarian arguments about the current direction of some of the distributions in the Hadoop ecosystem.
The challenge for the growing ecosystem surrounding Hadoop, the open source processing paradigm, has been in accessing data and building analytics that serve business uses in a straightforward manner. Our benchmark research into big data shows that the two most pressing challenges to big data analytics are staffing (79%) and training (77%). This so-called skills gap is at the heart of the Hadoop debate since it often takes someone with not just domain skills but also programming and statistical skills to derive value from data in a Hadoop cluster. Datameer is dedicated to addressing this challenge by integrating its software directly with the various Hadoop distributions to provide analytics and access tools, which include visualization and a spreadsheet interface. My coverage of Datameer from last year covers this approach in more detail.
At the conference, Datameer made the announcement of version 3.0 of its namesake product with a celebrity twist. Olympic athlete Sky Christopherson presented a keynote telling how the U.S. women’s cycling team, a heavy underdog, used Datameer to help it earn a silver medal in London. Following that introduction, Groschupf, one of the original contributors to Nutch (Hadoop’s predecessor), discussed features of Datameer 3.0 and what the company is calling “Smart” analytics, which include a variety of advanced analytic techniques such as clustering, decision trees, recommendations and column dependencies.
Our benchmark research into predictive analytics shows that classification trees (used by 69% of participants), association rules (49%) and k-nearest neighbor (36%) are the techniques used most often; all are included in the Datameer product. Both on stage and in a private briefing, company spokespeople downplayed the specific techniques in favor of the usability aspects and examples of business use for each of them. Clustering of Hadoop data allows marketing and business analytics professionals to view how data groups together naturally while decision trees help analysts see how sets group and deconstruct from a linear subset perspective rather than from a framed Venn diagram perspective. In this regard clustering is more of a bottom-up approach and decision trees more of a top-down approach. For instance, in a cluster analysis, the analyst combines multiple attributes at one time to understand the dimensions upon which the data group. This can inform broad decisions about strategic messaging and product development. In contrast, with a decision tree, one can look, for instance, at all sales data to see which industries are most likely to buy a product, then follow the tree to see what size of companies within the industry are the best prospects, and then the subset of buyers within those companies who are the best targets.
Datameer’s column dependencies can show analysts relationships between different column variables. The output appears much like a correlation matrix, but uses a technique called Mutual Information. The key benefit of this technique over a traditional correlation approach is that it allows comparison between different types of variables, such as continuous and categorical variables. However, there is a trade-off in usability: The numeric output is not represented by the correlation coefficient with which many analysts are familiar. (I encourage Datameer to give analysts a quick reference of some type to help interpret the numbers associated with this less-known output.) Once the output is understood, it can be useful in exploring specific relationships and testing hypotheses. For instance, a company can test the hypothesis that it is more vertically focused than competitors by looking at industry and deal close rates. If there is no relationship between the variables, the hypothesis may be dismissed and a more horizontal strategy pursued.
The other technique Datameer spoke of is recommendation, also known as next best offer analysis; it is a relatively well known technique that has been popularized by Amazon and other retailers. Recommendation engines can help marketing and sales teams increase share of wallet through cross-sell and up-sell opportunities. While none of these four techniques is new to the world of analytics, the novelty is that Datameer allows this analysis directly on Hadoop, which incorporates new forms of data including Web behavior data and social media data. While many in the Hadoop ecosystem focus on descriptive analysis related to SQL, Datameer’s foray into more advanced analytics pushes the Hadoop envelope.
Aside from the launch of Datameer 3.0, Groschupf and his team used Hadoop Summit to espouse the position that the SQL approach of many Hadoop vendors is a mistake. The crux of the argument is that Hadoop is a sequential access technology (much like a magnetic cassette tape) in which a large portion of the data must be read before the correct data can be pulled off the disk. Groschupf argues that this is fundamentally inefficient and that current MPP SQL approaches do a much better job of processing SQL-related tasks. To illustrate the difference he characterized Hadoop as a freight train and an analytic appliance database as a Ferrari; each, of course, has its proper uses. Customers thus should decide what they want to do with the data from a business perspective and then chose the appropriate technology.
This leads to another point Groschupf made to me: that the big data discussion is shifting away from the technical details to a business orientation. In support of this point, he showed me a comparison of the Google search terms “big data” and “Hadoop.” The latter was more common in the past few years, when it was almost synonymous with big data, but now generic searches for big data are more common. Our benchmark research into business technology innovation shows a similar shift in buying criteria, with about two-thirds (64%) of buyers naming usability as the most important priority. By the way, a number of Ventana Research blogs including this one have focused on the trend of outcome based buying and decision making.
For organizations curious about big data and what they can do to take advantage of it, Datameer can be a low-risk place to start exploring. The company offers a free download version of its product so you can start looking at data immediately. The idea of time-to-value is critical with big data, and this is a key value proposition for Datameer. I encourage users to test the product with an eye to uncover interesting data that was never available for analysis before. This will help build the big data business use case especially in a bootstrap funding environment where money, skills and time are short.
VP and Research Director