Machine Learning, TDA and the Future of Invention
Jan 23, 2013 5:00 AM PT
Last week Ayasdi came out of stealth mode and told the world it had a new way to analyze big data, and I think the implications for CRM and social are very large indeed. The new way is called "topological data analysis" (TDA) and hearing about it has the feel of hearing about relativity for the first time (or Salesforce.com) and learning that space is curved.
Who would have thought it, but Big Data is not some amorphous mass but something with topology -- an entity with curves and folds and shapes?
Why is that important? Well, understanding the shape of data turns out to be, mathematically, a shortcut to understanding it or to extracting meaning from it. Shapes include clusters, and they can tell us where the interesting bits are.
Consider the implications. No longer does one have to be inspired to ask good questions of data so as to write queries that deliver information. With topological data analysis, you can first identify the interesting clusters of data and then ask what's so interesting about that?
I'll Ask the Questions, Dave
It's a big shift in perspective and maybe philosophy. Certainly, it takes the human race down a notch in its own esteem. Now we don't rack our brains to ask piercing questions of our data -- we have machines that do it better, so we have to stand back and watch.
This may seem odd, but what if there's a bombshell lurking in your data that you were never inspired to ask about? Would the data hold its secrets forever? Well not any more.
Right now, topological data analysis is a very geeky mathematical concept -- just a couple of years removed from Stanford and a DARPA lab -- but the potential it holds is big.
The Next New Age
I believe that the Information Age is winding down, just like the Age of Steam did and just as all "Ages" do. That's not to be feared -- it's something to be embraced. What will take the place of information as the major disruptor and economic driver? Whatever it is, it will have to stand on the shoulders of the Information Age and use the latest and greatest tools.
Part of that means topological data analysis for the simple reason that our ability to exploit discoveries in both pharmaceuticals and oil and gas -- to take two for the moment -- is maxing out.
It costs upwards of US$100 million to drill an oil well in the Gulf of Mexico; it takes a team of people a few billion dollars and a decade to bring a new drug to market. It hardly gets said, but these investments cost the same whether or not the oil well has oil at the bottom of it, and it's the same story if the pharmaceutical comes a cropper.
Those numbers are big -- so big that they represent ceilings to further discovery unless we find breakthroughs that will reduce the costs and the risks of getting it all wrong.
All Roads Lead to Discovery
Already we're seeing topological data analysis crack some amazingly hard nuts, not only in the aforementioned pharmaceuticals, oil and gas, but also in financial services and government. Anywhere there's big data there is an opportunity for topological analysis, and that means the mass of social data we generate too.
People at Ayasdi tell me that when they apply topological data analysis to 20-year-old data from pharmaceutical research, they find new and interesting information. So far, I don't think they've come up with any new drugs, but it's early days.
The market has other entrants too, and while Ayasdi might be taking the highest road to the biggest customers and perhaps the hardest problems, other companies using machine learning are implementing roughly the same idea.
'CustomerDNA' by Any Other Name
Consider Mintigo for example. This company focuses on identifying sales prospects, which is not the same as generating leads, but it's a cool and important idea nonetheless and essential in many industries.
Mintigo analyzes existing customers to build a sophisticated data model of what a successful customer looks like for your organization. This is to say that Mintigo looks at the data surrounding those customers and identifies the clusters of relevant data that qualify them as a match for your company and its products.
From there, it's a simple matter of targeting the machine's model on the general marketplace to see what it drags in. They call it identifying your "CustomerDNA."
Call it "CustomerDNA" or "TDA" or more broadly, "machine learning." Whatever you call it, we're on the cusp of another revolution that simplifies a major headache and reduces the cost of important business processes to manageable levels again. With these as catalysts, can new discoveries and economic growth be far behind?