Ventana Research Analyst Perspectives

Big Data Research Agenda and Trends are Bolder in 2015

Posted by Mark Smith on Feb 7, 2015 9:36:27 PM

Big data has become a big deal as the technology industry has invested tens of billions of dollars to create the next generation of databases and data processing. After the accompanying flood of new categories and marketing terminology from vendors, most in the IT community are now beginning to understand the potential of big data. Ventana Research thoroughly covered the evolving state of the big data and information optimization sector in 2014 and will continue this research in 2015 and beyond. As it progresses the importance of making big data systems interoperate with existing enterprise and information architecture along with digital transformation strategies becomes critical. Done properly companies can take advantage of big data innovations to optimize their established business processes and execute new business strategies. But just deploying big data and applying analytics to understand it is just the beginning. Innovative organizations must go beyond the usual exploratory and root-cause analyses through applied analytic discovery and other techniques. This of course requires them to develop competencies in information management for big data.

Read More

Topics: Big Data, MapR, Predictive Analytics, Sales Performance, SAP, Supply Chain Performance, Human Capital, Marketing, Mulesoft, Paxata, SnapLogic, Splunk, Customer Performance, Operational Performance, Business Analytics, Business Intelligence, Business Performance, Cloud Computing, Cloudera, Financial Performance, Hortonworks, IBM, Informatica, Information Management, Operational Intelligence, Oracle, Datawatch, Dell Boomi, Information Optimization, Savi, Sumo Logic, Tamr, Trifacta, Strata+Hadoop

SAS Innovates the Potential of Business Analytics

Posted by Ventana Research on Apr 3, 2014 12:57:28 AM

SAS Institute, a long-established provider analytics software, showed off its latest technology innovations and product road maps at its recent analyst conference. In a very competitive market, SAS is not standing still, and executives showed progress on the goals introduced at last year’s conference, which I covered. SAS’s Visual Analytics software, integrated with an in-memory analytics engine called LASR, remains the company’s flagship product in its modernized portfolio. CEO Jim Goodnight demonstrated Visual Analytics’ sophisticated integration with statistical capabilities, which is something the company sees as a differentiator going forward. The product already provides automated charting capabilities, forecasting and scenario analysis, and SAS probably has been doing user-experience testing, since the visual interactivity is better than what I saw last year. SAS has put Visual Analytics on a six-month release cadence, which is a fast pace but necessary to keep up with the industry.

Read More

Topics: Predictive Analytics, IT Performance, LASR, Operational Performance, Analytics, Business Analytics, Business Intelligence, Business Performance, Cloudera, Customer & Contact Center, Hortonworks, IBM, Information Applications, SAS institute, Strata+Hadoop

Cloudera Makes Hadoop a Big Player in Big Data

Posted by Mark Smith on Mar 24, 2014 9:16:06 PM

I had the pleasure of attending Cloudera’s recent analyst summit. Presenters reviewed the work the company has done since its founding six years ago and outlined its plans to use Hadoop to further empower big data technology to support what I call information optimization. Cloudera’s executive team has the co-founders of Hadoop who worked at Facebook, Oracle and Yahoo when they developed and used Hadoop. Last year they brought in CEO Tom Reilly, who led successful organizations at ArcSight, HP and IBM. Cloudera now has more than 500 employees, 800 partners and 40,000 users trained in its commercial version of Hadoop. The Hadoop technology has brought to the market an integration of computing, memory and disk storage; Cloudera has expanded the capabilities of this open source software for its customers through unique extension and commercialization of open source for enterprise use. The importance of big data is undisputed now: For example, our latest research in big data analytics finds it to be very important in 47 percent of organizations. However, we also find that only 14 percent are very satisfied with their use of big data, so there is plenty of room for improvement. How well Cloudera moves forward this year and next will determine its ability to compete in big data over the next five years.

Read More

Topics: Big Data, Teradata, Zoomdata, IT Performance, Business Intelligence, Cloudera, Hortonworks, IBM, Information Applications, Information Management, Location Intelligence, Operational Intelligence, Oracle, Hive, Impala, Strata+Hadoop

Tidemark Leads New Wave of Innovation in Planning and Performance Excellence

Posted by Mark Smith on Mar 29, 2013 11:36:46 AM

Organizations succeed through continuous planning to achieve high levels of performance. For most organizations planning is not an easy process to conduct. Planning software is typically designed for only a few people in the process, such as analysts, or organizations might use spreadsheets, which are not designed for business planning across an organization. Most technologies only allow you to examine the past and not plan for the future. For decades organizations have tried to focus planning on driving better results through higher participation, but they have usually failed, as technology has not advanced enough to support this business need.

Read More

Topics: Big Data, Sales Performance, Supply Chain Performance, Mobile Technology, Operations, Operational Performance, Business Analytics, Business Collaboration, Business Intelligence, Business Performance, Cloud Computing, Cloudera, Customer & Contact Center, Financial Performance, Governance, Risk & Compliance (GRC), Information Applications, Workforce Performance, Business Planning, CFO, finance, Tidemark, Workday

EMC Looks to Be Pivotal for Big Data

Posted by Mark Smith on Mar 6, 2013 6:42:03 AM

The big-data landscape just got a little more interesting with the release of EMC’s Pivotal HD distribution of Hadoop. Pivotal HD takes Apache Hadoop and extends it with a data loader and command center capabilities to configure, deploy, monitor and manage Hadoop. Pivotal HD, from EMC’s Pivotal Labs division, integrates with Greenplum Database, a massively parallel processing (MPP) database from EMC’s Greenplum division, and uses HDFS as the storage technology. The combination should help sites gain from big data a key part of its value in information optimization.

Read More

Topics: EMC, MapR, HAWQ, HDFS, Pivotal HD, Business Analytics, Business Intelligence, Cloud Computing, Cloudera, Hortonworks, Information Applications, Information Management, Location Intelligence, Cirro, Hive, Tableau Software, Strata+Hadoop

SnapLogic is Making Big Data Integration as a Service a Hadoop Reality

Posted by Mark Smith on Mar 1, 2013 10:10:45 AM

SnapLogic, a provider of data integration in the cloud, this week announced Big Data-as-a-Service to address businesses’ needs to integrate and process data across Hadoop big data environments. As our research agenda for 2013 outlines, dealing with data in the cloud is very important to organizations. At the same time, businesses need to be able to integrate their big data with all their technology assets, as I pointed out recently.

Read More

Topics: Big Data, R, Sales Performance, Salesforce.com, SnapLogic, Operational Performance, Business Analytics, Cloud Computing, Cloudera, Customer & Contact Center, Data Integration, Information Applications, Information Management

Big Data Search is Getting Better with LucidWorks

Posted by Mark Smith on Feb 20, 2013 11:52:17 PM

LucidWorks addresses the growing volume of information now being stored in the enterprise and in big data with two products aimed at the enterprise with search technology. Though you may not be familiar with LucidWorks (previously known as Lucid Imagination), the company has for many years contributed to Apache Lucene, an open source search project, and commercialized and supported for it for business.

Read More

Topics: Big Data, MapR, Sales Performance, IT Performance, Operational Performance, Business Analytics, Business Intelligence, Business Performance, Cloud Computing, Cloudera, Customer & Contact Center, Hortonworks, Information Applications, Information Management, Operational Intelligence, Search, Strata+Hadoop

Actuate Rides the Big-Data Wave

Posted by Ventana Research on Jul 19, 2012 1:58:26 PM

Actuate, the driving force behind the open source Eclipse Business Intelligence and Reporting Tools (BIRT) project, is positioning itself in the center of the big-data world through multiple partnerships with companies such as Cloudera, Hortonworks, KXEN, Pervasive and a number of OEMs. These agreements, following on its acquisition of Xenos a couple of years ago, help Actuate address some big issues in big data, involving enterprise integration and closed-loop operational systems that provide what my colleague Robert Kugel refers to as action-oriented information technology systems. Today, most initiatives in big data and Hadoop are still in the proof-of-concept stages or being implemented in organizational siloes. Actuate, with its enterprise orientation and federated architecture, is in a position to potentially advance these efforts in a variety of ways.

Read More

Topics: Big Data, Pervasive, Eclipse, IT Performance, Operational Performance, Analytics, Business Analytics, Business Collaboration, Business Intelligence, Business Performance, Cloud Computing, Cloudera, Customer & Contact Center, Information Applications, Information Management, Operational Intelligence, Strata+Hadoop

Datameer Provides Business Visualization and Discovery for Hadoop

Posted by Ventana Research on Jul 17, 2012 2:15:20 PM

As volumes of data grow in organizations, so do the number of deployments of Hadoop, and as Hadoop becomes widespread, more organizations demand data analysis, ease of use and visualization of large data sets. In our benchmark research on Hadoop, 88 percent of organizations said analyzing Hadoop data is important, and in our research on business analytics 89 percent said it is important to make it simpler to provide analytics and metrics to all users who need them. As my colleague Mark Smith has noted, Datameer has an ambitious plan to tackle these issues. It aims to provide a single solution in lieu of the common three-step process involving data integration, data warehouse and BI, giving analysts the ability to apply analytics and visualization to find the dynamic “why” behind data rather than just the static “what.”

Read More

Topics: Big Data, Datameer, MapR, Operational Performance, Business Analytics, Business Intelligence, Business Performance, Cloudera, Customer & Contact Center, Hortonworks, IBM, Information Applications, Operational Intelligence, Visualization, Data Discovery, Strata+Hadoop

Enterprise Revolution of Predictive Analytics with Version 6

Posted by Mark Smith on Jun 15, 2012 11:40:38 AM

In our benchmark research in predictive analytics we’ve uncovered some intriguing tools for taking advantage of big data in the enterprise. Revolution Analytics, which we analyzed earlier this year, this month introduced its 6.0 release. Revolution extends the open source statistical programming language R with capabilities you would expect out of enterprise software. The company has grown substantially over the last several years and has an impressive list of more than a hundred customers in both the private and public sectors. Revolution partners with database and data integration providers such as Talend and Informatica and business intelligence providers who want to connect to more advanced level of analytics. Revolution can operate across a range of big data architectures, including Hadoop, working with Cloudera and IBM as well as data warehouse appliances such as IBM Netezza and Teradata. This is a smart move, since predictive analytics is the second most important unavailable capability cited by big data deployments in to our benchmark research.

Read More

Topics: Big Data, Linux, Predictive, Revolution, Operational Performance, Business Analytics, Business Intelligence, Business Performance, Cloud Computing, Cloudera, Data Mining, Strata+Hadoop

IBM Makes Big Data Deal for Vivisimo and Supports Cloudera Hadoop

Posted by Mark Smith on Apr 26, 2012 12:29:08 PM

Through a series of acquisitions and organic development over the last five years, IBM has established itself as a leader in enterprise big data for business analytics. I recently wrote about IBM Smarter Analytics, which brings together the company’s portfolio of software, systems and services from analytics to big data. But supporting big data requires the ability to access many sources of information; our benchmark research on big data found that more than half of organizations require information from external sources, and that requires some software flexibility.

Read More

Topics: Big Data, Sales Performance, Social Media, Supply Chain Performance, Sustainability, Vivisimo, IT Performance, Business Analytics, Business Collaboration, Business Intelligence, Business Mobility, Business Performance, Business Technology, CIO, Cloud Computing, Cloudera, Customer & Contact Center, Financial Performance, Governance, Risk & Compliance (GRC), IBM, Information Applications, Information Management, Information Technology, Location Intelligence, Operational Intelligence, Workforce Performance, Strata+Hadoop

The World of Big Data Gets Even Bigger at Hadoop World

Posted by Ventana Research on Nov 16, 2011 8:09:50 AM

Cloudera’s recent Hadoop World 2011 event confirmed that the world of big data is getting even bigger. As I wrote of last year’s event, Hadoop, the open source large-scale data processing technology, has gone mainstream. And while 75% of the audience attended this year for the first time and so may not have realized the breadth of Hadoop’s acceptance, statistics announced in the opening keynote show widespread use of it. Mike Olson, Cloudera CEO, reported that the event was sold out, with 1,400 attendees from 580 organizations and 27 countries. In independent confirmation, our benchmark research shows that 54% of organizations are either using or evaluating Hadoop for their big-data needs.

Read More

Topics: Big Data, Datameer, MapR, Sales Performance, Social Media, Supply Chain Performance, Operational Performance, Business Analytics, Business Intelligence, Business Performance, Cloudera, Customer & Contact Center, Financial Performance, Hortonworks, Informatica HParser, Karmasphere, NetApp, Workforce Performance, Strata+Hadoop

Cloudera Supports Hadoop with New Distribution and Enterprise Version

Posted by Ventana Research on Jun 19, 2011 7:28:19 AM

Cloudera is riding the wave of big data. I first learned about the company while working at Vertica, one of Cloudera’s partners. Customers that managed large amounts of structured relational data also needed to process large amounts of semistructured data such as the type found in web logs and application logs. The emerging channel of social media provided another source of data lacking the structure that would lend itself to analysis in a relational database. Other organizations needed to perform calculations and analyses that were difficult to express in SQL. Seeing this market Cloudera recognized earlier than others an opportunity to leverage the Apache Hadoop project; it has been offering the Cloudera Distribution for Hadoop (CDH) since early 2009

Read More

Topics: Big Data, Predictive Analytics, Sales Performance, Social Media, Supply Chain Performance, Operational Performance, Business Analytics, Business Intelligence, Business Performance, CDH3, Cloudera, Customer & Contact Center, Information Management, Strata+Hadoop

IBM Chooses Hadoop Unity; Not Shipping the Elephant

Posted by Ventana Research on May 23, 2011 11:06:33 PM

Last week I attended the IBM Big Data Symposium at the Watson Research Center in Yorktown Heights, N.Y. The event was held in the auditorium where the recent Jeopardy shows featuring the computer called Watson took place and which still features the set used for the show – a fitting environment for IBM to put on another sort of “show” involving fast processing of lots of data. The same technology featured prominently in IBM’s big-data message, and the event was an orchestrated presentation more like a TV show than a news conference. Although it announced very little news at the event, IBM did make one very important statement: The company will not produce its own distribution of Hadoop, the open source distributed computing technology that enables organizations to process very large amounts of data quickly. Instead it will rely on and throw its weight behind the Apache Hadoop project – a stark contrast to EMC’s decision to do exactly that, announced earlier in the week. As an indication of IBM’s approach, Anant Jhingran, vice president and CTO for information management, commented, “We have got to avoid forking. It’s a death knell for emerging capabilities.”

The event brought together organizations presenting interesting and diverse use cases ranging from traditional big-data stories from Web businesses such as Yahoo to less well known scenarios such as informatics in life sciences and healthcare, by Illumina and the University of Ontario Institute of Technology (UOIT), respectively, low-latency financial services by eZly and customer demographic data by Axciom.

Eric Baldeschwieler, vice president of Hadoop development at Yahoo, shared some impressive statistics about its Hadoop usage, one of the largest in the world with over 40,000 servers. Yahoo manages 170 petabytes of data with Hadoop and runs more than 5 million Hadoop jobs every month. The models it uses to help prevent spam and others that do ad-targeting are in some cases retrained every five minutes to ensure they are based on up-to-date content. As a point of reference CPU utilization on Yahoo’s Hadoop computing resources averages greater than 30% and at its best is greater than 80%. It appears from these figures that the Hadoop clusters are configured with enough spare capacity to handle spikes in demand.

During the discussions, I detected a bit of a debate about who is the driving force behind Hadoop. According to Baldeschwieler, Yahoo has contributed 70% of the Apache Hadoop project code, but on April 12, Cloudera claimed in a press release, “Cloudera leads or is among the top three code contributors on the most important Apache Hadoop and Hadoop-related projects in the world, including Hadoop, HDFS, MapReduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Flume, and Hue, among others.” Perhaps Yahoo wants to reestablish its credentials as it mulls whether to spin out its Hadoop software unit. If such a spinoff were to occur, it could further fracture the Hadoop market.

I found it interesting that the customers IBM brought to the event, while having interesting use cases, were not necessarily leveraging IBM products in their applications. This fact led me to the initial conclusion that the event was more of a show than a news conference. Reflecting further on IBM’s stated direction of supporting the Apache Hadoop distribution, I wondered what IBM Hadoop-related products they would use. IBM will be announcing version 1.1 of InfoSphere BigInsights in both a free basic edition and an enterprise edition. The product includes Big Sheets, which can integrate large amounts of unstructured Web data. InfoSphere Streams 2.0, announced in April, adds Netezza TwinFin, Microsoft SQLServer and MySQL support to other SQL sources already supported. But this event was not about those products. It was about IBM’s presence in and knowledge of the big-data marketplace. Executives did say that the IBM product portfolio will be extended “in all the places you would expect” to support big data but offered few specifics.

IBM emphasized the combination of streaming data, via InfoSphere Streams, and big data more than other big-data vendors do. The company painted a context of “three V’s” (volume, velocity and variety) of data, which attendees, Twitter followers and eventually the IBM presenters expanded to include a fourth V, validity. To illustrate the potential value of combining streaming data and big data, Dr. Carolyn McGregor, chair in health informatics at UOIT, shared how the institute is literally saving lives in neonatal intensive care units by monitoring and analyzing neonatal data in real time.

Rob Thomas, IBM vice president of business development for information management explained the role of partners in the IBM big data ecosystem. As stated above, IBM will rely on Apache Hadoop as the foundation of its work, but will partner with vendors further up the stack. Datameer, Digital Resaoning,  and  Karmasphere all participated in the event as examples of the types of partnerships IBM will seek.

IBM has already demonstrated, via Watson, that it knows how to deal with large-scale data and Hadoop, but to date, if you want those same capabilities from IBM, it will have to come mostly in the form of services. The event made it clear that IBM backs the Apache Hadoop effort but not in the form of new products. In effect, IBM used its bully pulpit (not to mention its size and presence in the market) to discourage others from fragmenting the market. The announcements may also have been intended to buy time for further product developments. I look for more definition from IBM on its product roadmap. If it wants to remain competitive in the big-data market, IBM needs to articulate how its products will interact with and support Hadoop. In my soon to be released Hadoop and Information Management benchmark research that I am completing will provide some facts on whether or not IBM is making the right bet on Hadoop.

Regards,

Ventana Research

Read More

Topics: Big Data, EMC, Business Intelligence, Cloudera, Greenplum, IBM, Information Applications, Information Management, InfoSphere, Strata+Hadoop

EMC Enters Elephant Race with Hadoop

Posted by Ventana Research on May 12, 2011 5:21:09 PM

Earlier this week EMC announced it will create its own distribution for Apache Hadoop.  Hadoop provides distributed computing capabilities that enable organizations to process very large amounts of data quickly. As I have written previously, the Hadoop market continues to grow and evolve. In fact, the rate of change may be accelerating. Let’s start with what EMC announced and then I’ll address what the announcement means for the market.

 EMC announced three new offerings, slated for the third quarter of 2011, that leverage its acquisition of Greenplum last year, ranging from an open source version to incorporation in its data warehouse appliance.

The EMC Greenplum HD Community Edition is a free, open source version of the Apache Hadoop stack comprising HDFS, MapReduce, Zookeeper, Hive and HBase. EMC extends Hadoop with fault tolerance for the Name Node and Job Tracker, both of which are well-known points of failure in standard Hadoop implementations.

The EMC Greenplum HD Enterprise Edition, interface-compatible with the Apache Hadoop stack, provides several additional features including snapshots, wide-area replication, a Network File System (NFS) interface and some management tools. EMC also claims performance increases of two to five times the performance over standard packaged versions of Apache Hadoop.

The EMC Greenplum HD Data Computing Appliance integrates Apache Hadoop with the Greenplum database and computing hardware. The appliance configuration provides SQL access and analytics to Hadoop data residing on the Hadoop Distributed File System (HDFS) as external tables, eliminating the need to materialize the data in the Greenplum database.

Until now Cloudera has dominated the emerging commercial Hadoop market and faced little or no competition since it introduced the Cloudera Distribution for Hadoop (CDH). The EMC announcements are both good and bad news for Cloudera. On the one hand they suggest – you might even say validate – that Cloudera has chosen a valuable market. EMC seems to be willing to invest heavily to try to get a share of it. On the other hand, Cloudera now faces a competitor that has significant resources. For customers competition is generally a good thing, of course, as it pushes vendors to innovate and improve their products to win more business.

EMC’s approach to the market differs dramatically from IBM’s strategy. IBM announced on Twitter at its Big Data Symposium held this week that it is putting all its weight behind Apache Hadoop in the hope of avoiding the fragmentation that plagued the UNIX market for years. EMC’s Enterprise Edition promises to tackle issues well known to the Hadoop market, but EMC faces competition from others who are also tackling these issues. If lower-cost or free competitive offerings adequately address these issues it could seriously undercut the market for EMC’s Enterprise Edition. While EMC brings more enterprise credentials to the Hadoop market than Cloudera, it has less experience with Hadoop. Multiple vendors are attempting to bring enterprise class capabilities to Hadoop, and it’s too soon to see who will succeed. However, overall, the Hadoop market will benefit from all the attention and investment.

 I find it interesting and a little ironic that prior to its acquisition by EMC, Greenplum (along with Aster Data, now part of Teradata)  helped popularize MapReduce, one of Hadoop’s most commonly used components, by embedding MapReduce as part of its databases. These proprietary implementations could be credited with helping to bring Hadoop into the mainstream big-data market because they combined data warehousing with MapReduce. It spawned a debate in which database guru Mike Stonebraker at first dismissed MapReduce and then embraced it. The debate attracted attention, a key ingredient in building any new market. Now EMC Greenplum completes the circle by embracing Hadoop.

 To its credit, EMC aligned a dozen partners around these announcements, creating an ecosystem of third-party products and services. Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, MicroStrategy, Pentaho, SAS, SnapLogic, Talend and VMware all announced their support for the EMC products in one form or another. Most of these companies also partner with Cloudera, so this is a good move but not a coup for EMC.

 The Hadoop market continues to evolve. We are now analyzing the data collected in our benchmark research on the state of the large-scale or now called the big data market, including Hadoop. Stay tuned for the results. It will be interesting to see where the market ends up. I expect more changes and innovation driven in part by the increased competition.

 The Hadoop market is no longer a one-elephant race.

 Regards,

 David Menninger – VP & Research Director

Read More

Topics: Big Data, EMC, Social Media, Operational Performance, Business Analytics, Business Collaboration, Business Intelligence, Cloud Computing, Cloudera, Customer & Contact Center, Greenplum, Information Applications, Information Management, Strata+Hadoop

Hadoop Gets Easier with Cloudera Version 3

Posted by Mark Smith on Nov 27, 2010 4:51:58 PM

Managing large volumes of enterprise data continues to challenge IT organizations as they deal with administration and storage of no longer just terabytes but now petabytes of data and costs increase accordingly. This massive size of data complicates the underlying issues of where and how to store it easily in low-cost hardware and manage the data efficiently. One attempt at a solution is Hadoop, an open source community-based project. It began as part of Yahoo and was led by Doug Cutting, who used the MapReduce concepts for large-scale distributed computing to create a distributed file system. Yahoo itself runs the largest deployment of Hadoop. Doug Cutting is not new to the open source world, being involved in the creation of Lucene, open source search technology among many other open source community projects.

Read More

Topics: Cloudera, Information Management, Data, Strata+Hadoop

Hadoop Gets Easier with Cloudera Version 3

Posted by Mark Smith on Nov 26, 2010 3:36:48 PM

Managing large volumes of enterprise data continues to challenge IT organizations as they deal with administration and storage of no longer just terabytes but now petabytes of data and costs increase accordingly. This massive size of data complicates the underlying issues of where and how to store it easily in low-cost hardware and manage the data efficiently. One attempt at a solution is Hadoop, an open source community-based project. It began as part of Yahoo and was led by Doug Cutting, who used the MapReduce concepts for large-scale distributed computing to create a distributed file system. Yahoo itself runs the largest deployment of Hadoop. Doug Cutting is not new to the open source world, being involved in the creation of Lucene, open source search technology among many other open source community projects.

Read More

Topics: Cloudera, Information Management, Data, Strata+Hadoop

Content not found