I posted this piece on the Dachis Group blog this morning.
With the news that Jive is buying Proximal Labs, people are starting to take notice of what some people term “Big Data.” The opportunities in Big Data are emerging, and some companies at the edge of the frontier have been around for several years. My definition of Big Data companies are those the collect and analyze large volumes of information.
The collection of large amounts of data is a fairly new practice, and as you would expect the tools to manage the data are pretty new too. With Twitter at 50 million Tweets per day, Facebook with 60 million daily status updates, plus check-in data from Gowalla and Foursquare the volume becomes staggering. Between just those websites we’re talking about 1,000 pieces of information per second. Traditional databases have a very hard time managing that much data, so a new breed of unstructured data stores (such as Cassandra, and Hadoop) have been created to solve this problem. All this collection talk assumes you can even get an API feed from Twitter or Facebook. People who actually have Facebook API access talk about it in whispered tones, like the location of a vein of gold in the hills.
When analyzing public social information, the holy grail is to correlate disparate social profiles and tie them to a specific person. If your mailing list has an email address, you could potentially tell that person’s Twitter handle, LinkedIn profile, Jigsaw record ID, Facebook profile, and Klout score. Companies could do some amazing multi-channel social marketing with that kind of information. Especially if you could analyze when and where they tweet/update status, what they talk about, and who they mention (both individuals and brands). You can quickly see how this becomes a major problem when trying to do analysis on several petabytes of information that is growing by the second. If you can see how big the problem is, then you can see how big the opportunity is.
In Jive’s case they’d really like to have this type of information inside an enterprise. Who is contributing to discussions, how often, at what times, and who are they engaging with? What are the topics of discussions, what can be discerned about word choice, how has tone changed over time? Forget about org charts, who are they really working with and how often? Who should they know, but don’t? Who is not aligned with the company culture, and who is starting to disassociate themselves from the rest of the company, and why? Applying the same Big Data analysis principles to the enterprise, instead of public social media, can yield some incredibly useful insight.
It really is like the “wild west” out there with little awareness of privacy issues, ownership guidelines, and governance. My prediction: once a few companies begin to unlock some value in Big Data with any meaningful scale, you’ll see an acquisition by Google/Yahoo/Microsoft and then the spotlight will begin to shine on this topic.
How about some comments from you folks in Austin working in Big Data? Shoot out a comment and tell us what you’re working on in this area.