MapReduce and the Database: Analytics in Hyperdrive

In what could best be termed a photo finish, Greenplum and Aster Data Systems have both announced that they have integrated MapReduce into their massively parallel processing (MPP) database engines.

MapReduce, pioneered by Google for analyzing the Web, now becomes available to enterprises and service providers, giving them more access and visibility into more data from more origins. Originally created to analyze massive amounts of unstructured data, the approach has been updated to analyze structured data as well.

Greenplum, in San Mateo, Calif., says that MapReduce will be part of its Greenplum Database beginning in September. Aster Data, of Redwood Shores, Calif., says that MapReduce will be included in its Aster nCluster.

Fruitful Marriage

Curt Monash, president of Monash Research, editor of DBMS2, and a leading authority on MapReduce, sees this as a major leap forward. He reports that both companies had completed adding MapReduce to their existing products and had been racing to the finish line to get their news out first. As it turned out, both made their announcements within hours of each other.

Curt lists some points on his blog about what this new technology marriage means.

  • Google’s internal use of MapReduce is impressive. So is Hadoop’s success. Now commercial implementations of MapReduce are getting their shots too.
  • The hardest part of data analysis is often the recognition of entities or semantic equivalences. The rest is arithmetic, Boolean logic, sorting, and so forth. MapReduce is already proven in use cases encompassing all of those areas.
  • MapReduce isn’t needed for tabular data management. That’s been efficiently parallelized in other ways. But, if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help.
  • In principle, any alphanumeric data at all can be stuffed into tables. But in high-dimensional scenarios, those tables are super-sparse. That’s when MapReduce can offer big advantages by bypassing relational databases. Examples of such scenarios are found in CRM and relationship analytics.

Enterprise’s Crystal Ball

Greenplum customers have been involved in an early-access program using Greenplum MapReduce for advanced analytics. For example, LinkedIn is using Greenplum Database for new, innovative social networking features such as “People You May Know” and sees it as a way to develop compelling analytics products faster. A primary benefit of the new capability is that customers can combine SQL queries and MapReduce programs into unified tasks that are executed in parallel across hundreds or thousands of cores.

Part of the appeal of business intelligence and its huge ramp-up over the past five years is that IT assets play an ever-larger role in providing unprecedented strategic guidance and insights to leaders of enterprises, governments, telecos and cloud providers. IT has gone from an automating business functions role to an essential crystal ball service of the highest order. By consequently gaining access to larger data sets that — more than ever before can be mined and analyzed for higher levels of process and business refinements — IT has become a member of the board.

With better data reach and inclusion come better results. So BI allows leaders to establish the trends early that will determine their future success or failures. In a fast-paced, global, hyper-competitive business landscape, these insights are the currency of success for the future. The better you do BI, the better you do business … current, near-term and long-term. There’s no better way to know your customers, competitors, employees and the variables that buffet and stir markets than effective BI.

Insatiable Appetite for Data

Now, by expanding the role and reach of MapReduce technologies and methods, a powerful new tool is added to the BI arsenal. More data, more data types, more data sources — all rolled into an analytical framework that can be directly targeted by developers, scripters, business analysts, executives and investors.

These new MapReduce use announcements mark a significant advancement that helps makes IT another notch higher in its utility and indispensable nature to business. And it comes at a time when more data, meta data, complex events, transactions and Internet-scale inferences demand tools that can do for enterprise BI what Google has done for Web search and indexing.

Being comprehensive and deep with massive data sets analytics offers a new mantra: The database is dead, long live the data. Structured data and the containers that contain it are simply not enough to organize an access the intelligence lurking on modern networks, at Internet scale and Internet time.

Dana Gardner is president and principal analyst at Interarbor Solutions, which tracks trends, delivers forecasts and interprets the competitive landscape of enterprise applications and software infrastructure markets for clients. He also produces BriefingsDirect sponsored podcasts. Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories

CRM Buyer Channels