My vision of big data, 11 prerequisites

Late yesterday I learned from Twitter that Whitney Houston died.

99% of the Whitney Houston Tweets were exactly, “RIP Whitney Houston.” Okay… but how did she move you? Do you remember a special dance with that one girl while she was singing? Was her voice so beautiful that it made you tear up? It was for me. Originality is there, but buried on Twitter. I would have enjoyed others insights on Whitney, to feel camaraderie in a shared loss. If it existed on Twitter, it was obfuscated behind all the drone “RIP Whitney Houston” tweets. So instead I played some Whitney songs and told my children who the woman with the beautiful voice was.

Twitter is big data.

“Big data” is making the news. The concept has crept from the back pages of technical publications into the mainstream. It’s a new topic, so the reporters have commandeered it. It’s becoming popular, and that’s too bad. Media feeding frenzies perpetuate the peripheral definition; articles get copied over and over again, and people stop thinking.

With their IPO in the news, Facebook has become the poster child for big data. So what is it? What is big data? Simply put, massive amounts of information about millions, and eventually, billions of people. Big data is making the news because of fear – fear of the possibilities of abuse. It sells newspapers, gets clicks, and page views which means we will be hearing a lot about big data. Scare people and make money.

Facebook is big data.

Google is changing its privacy policy. Another media feeding frenzy. If you have a Gmail account, Google+, music, shopping, etc. All the privacy policies are melding into one. I like the idea and I have to admit, I don’t understand the problems people are having. If you use 5 or 10 different Google services, are you really going to read many different user agreements? I don’t know anyone who actually does. I would prefer to have one policy that covers them all. Google gives these services away, if you don’t like that one, single policy – stop using the service. The chances of people being informed about Google’s policies will increase if they have a single policy. It’s a good thing. Stop the bitching.

Google is big data.

Another bit in the news. The Seattle Times reports a top porn site, Brazzers, was hacked. From the article,  and other news about regarding it, usernames, passwords and real names were hacked. The data is making its way across the Internet on file sharing sites.

Internet user databases are big data.

In my vision of the world, big data is in its infancy. Don’t freak out for at least 10 years.

Why now? Why is big data coming into mainstream now? It has been around for many years. Large data providers like Experian, Axiom, and D&B have been collecting data for a long time. What is different now? To ask “why now,” you must understand the continuum of getting at big data.

11 big Data Prerequisites

  1. The data must be there – this is the most exciting tipping point.  In being the CEO of a data-mining software company, I’m still dumbfounded when users expect to get information off the web…that is not there.  It must actually exist.
  2. You must be able to flag it – you can’t store everything and must make choices.  What is important? When does it happen?  Example:  News release with subject: Nanotechnology
  3. You must be able to find it – in the absence of a real-time data stream, you must able to search though data to find a “flag” of what you are looking for.
  4. You must be able to parse it –  this is the analysis of relevant grammatical constituents, identifying the parts of what you need, from within potential noise.  Example: parsing out the name of an inventor from within an article on nanotechnology
  5. You must be able to extract it – Not the same as parsing.  What if the data is in a PDF file or HTML web page?  In many cases, extraction is about access.  Is the data I am looking for across 5 sub-links of a single web page?  Extraction as it relates to the Internet also encapsulates web crawling.
  6. You must be able to process it – This takes CPU cycles. Bigger problems need bigger computers.
  7. You must normalize it – If you have multiple pieces of data on “The Container Company”,  “Container Company, The”, “The Container Co”, etc,  how do you merge that data?  You must normalize like entities to a standard “canonical form”.  With out it, we’ve got the Data Tower of Babel.
  8. You must be able to store it – Big data takes up disk space.
  9. You must be able to index it – If you ever want to find it after you store it, the data needs to be indexed.  This also means more disk space.
  10. You must be able to analyze it – big data needs big (or many distributed) CPU’s to crunch the numbers and garner order from the chaos.
  11. There must be a payoff – Putting together big data is expensive.  Without a end goal in mind, it is expensive to collect.  Google & Facebook collect, process, index & store data for profit.

So what is my vision of “big data”?  What is being talked about in the media is very short sighted.  I think I know where big data is going.  I’m basing my vision on my prerequisites.

Big Data Thoughts

1: Information is growing beyond the ability of any single source to store and index everything. Therefore, big data can never be “all data.” Facebook and Google cannot store everything. Therefore choices must be made. Google already does it; indexing what they deem relevant.

2: The amount of data about people on Facebook is paltry…compared to the maximum possibilities. Yes, in aggregate, it is the largest set of minimal data. Think for a second about your day. What would it take to record your entire life in HD, from 7 different angles. This future data stream would include everything you heard, read, and generally interacted with.

3: Mass, personal data recording is on the horizon. The first phase is already starting. The only limit is reasonable storage. The term is called “LifeLogging.” There are devices that you can wear and it will take a picture every 30 seconds. High quality LifeLogging technology will be critical in the future. Every 30 seconds is 1/900th of video (30 frames per second). If the Lifelogging device is just the conduit vs. the storage medium, the lifelog could be stored on your home PC. With h.264 video compression and 5.5 hours of 1080p video can be stored on a 32GB thumb drive. That means a single 1TB (terabyte) drive can hold 176 hours of hi definition video (7.3 days of video). It would be expensive today to buy 52 X 1TB drives to store a year of your life. It seems crazy… right? Not when you are a historian. In 1992, the average hard drive was around 1GB – 1000 times less than today.

Some ideas to reduce the storage size of LifeLogging:

-Go vector. If you have an avatar created of you, a vectorized version of you could be stored. This type of compression does not exist, but it will. LifeLogging in bitmap video is like a tape deck. Vectorizing video with the lifelogee as the center of the story would save 1000X the storage. It is like the hard drive compared to tape storage. In addition, storing data in this way could be accessed very quickly. Bottom line: with the right *Software* real LifeLogging could be done today. I should save this for another in-depth blog. I’ve spent many nights thinking about how it all could be done. I’ve got to stop watching Sci- Fi before bed. Lawn Mower Man

4: Assume that we are in the 2020′s. Based on Moore’s Law, and several others, A LifeLogging device will be able to be worn around your neck, and record your life in HD. They’ll probably be the price of premium iPad. At that level, LifeLogging is ubiquitous.

5. What did I eat today? What about over the past week, month, or year? Just because that information, is recorded, as video (me munching Apple), does not mean that it can be analyzed and recognized as Donato-eats-apple. Where did you buy that Apple? Can the date of the purchase be cross referenced with the date that you bought it at the grocery store?

New industries

Software that analyzes and makes inferences from LifeStreaming (the will be a multi-billion dollar industry. (Donato ate apple, Donato started car, Donato got phone call, Donato was watching the movie Contact). I would expect that each major type of world interaction would be handled by a different app or algorithm.

Software that compiles inferences, builds statistics and performs what-ifs on mass LifeStream data will be multi-billion dollar industry. (23% of people that ate apples 4x per month, where the apples came from Chile, and most likely were treated with chemical X, developed cancer by age 55). These are the types of discoveries we will be able to make that are currently only made by virtue of a happy accident.  (I made up that example…but do eat organic apples).

Example: compiling a list of the junk (postal) mail letters that I throw out without opening. That is good data. What is the one that I opened?

Software that manages the rights, payments, connectivity and privacy between life streams will be a multi-billion dollar industry. So if that apple from Chile used some real nasty pesticides – like a carcinogen? Could that supplier of that apple to the store be tracked? Do you want to know this? What if your wife bought it… and it is not part of your personal data stream? Do you and you wife have a LifeStream sharing agreement?

One person, eating one Apple does not a trend make. Multiply that by 50 million people over 5 years. This is not science fiction. This is simply faster computers, more memory, and analysis software. It’s a lot of Apples. Do I want to share, if it was anonymous, my eating habits and cross reference it with my health…maybe.

I expect that companies will pop up, each with a different set of analysis technology for different niches. It will probably evolve into an AppStore model. One company looks at how you interact with media, what you watch, listen to, theaters attended. Another knows what you eat. You can choose which feeds to share with the greater LifeStream and take part in a greater community.

By the way, none of this LifeStreaming will be on Facebook, or Google+. No one would trust them. In addition, it would be prohibitively expensive to centrally transmit, store and analyze it. Hmmm, maybe Facebook could be the trend builder? It is well positioned for it. Can you imagine it?

Donato ate an Apple
Donato threw core in garbage
Donato did not recycle V8 can
Donato is driving 15 miles over the speed limit

This is the first time in a few years that I thought of a way for Facebook to survive long term. In this Facebook, you would never log in to look at what people are doing, you would log in to see that latest trend and how it affected you.

I just hope it does not make it to twitter and get retweeted by the “RIP Whitney Houston” drones. Once analysis agents can understand (and broadcast) our individual actions, Twitter has no reason to exist.

Big Data equals big money.

If it is possible, and someone can profit, it will be collected.

 

The decline of Apps and the rise of Agents and Clewds

The decline of Apps and the rise of Agents and Clewds

Last week, while presenting a live webinar “The Near and Far Future of Recruiting” I had an epiphany.  I was talking about the eventual decline (or morphing) of Facebook.  The theory is this: Mobile computing power in 10 years will be server-capable.  Add in violation of trust and general mistrust of social networks.  The result is peer-peer social networking.  No Facebook needed.  Everything sits on your mobile device.  More private, more secure, total user control and no ads.  Facebook may lead the way, but it will be hard to do as they would cannibalize their own ad-driven revenue model.

This was last year’s Epiphany.

What led to the new epiphany was my pontificating on CRM systems.  This was a recruiter-centric talk about the future of recruiting.  Many recruiter CRMs have connections to LinkedIn profiles.   Every one of these, that I have seen, has been implemented incorrectly, not due to any fault of the vendors.  In an optimal situation, the data inside the Profile should be mashed up with current CRM data.  Instead, LinkedIn requires usage of their API which brings back a canned LinkedIn profile. This is what I call “social linkage”.

The optimal situation would be a pair of  “social agents”.  While a company may have 1000 company prospects  in their CRM, they may only contact 50 in a given day. One “social agent” would automatically refresh the entire CRM on a longer cycle such as once per quarter.  Another just-in-time social agent would update the CRM just before the outreach process.  Why is this important?  LinkedIn is not a definitive data-source; nothing is.  What happens when you combine Facebook, Google+, Jigsaw (now data.com), Foursquare, twitter and whatever social network Microsoft comes up with?  Are you going to clutter your Salesforce or Microsoft Dynamics interface with 6-8 little snippets, much with redundant information?   This gets ugly fast.  The optimal implementation is to have a social agent retrieve LinkedIn, Data.com, Google+, Facebook, Twitter information.  Next, mash, score, apply analytics to present the information in a way that optimally fits your selling model.

(more…)

The future death of social networking

The future death of social networking

Social networking is going to die.  This article is about how it will happen.

The focus for this article will be business social networking.  If you are worried about your Facebook friends and photos and the life sucking that goes on in personal social networks, don’t worry, they will be around for awhile.  They will be dying a totally different death.  That will have to be a future blog posting.  Ask me over a beer and I will explain it.

Ask three people to define business social networking and you will get three different answers.  Try it. Going even further, I hypothesized that you ask ten different people about the benefits of business social networking, you will get ten different answers.  I was recently inspired by a quote attributed to Steve Jobs about dogma as “Being satisfied with the results of other peoples thinking.”  This article will be as dogma free as possible.  While I can’t help being influenced by everything that is being written about social networking, I have come up a few unique conclusions.

1.  LinkedIn is not a social network. Most of my contacts are either in a sales or recruiting role.  In the early days, the premise behind LinkedIn was that you can connect to many people through a chain of trusted referrals. It does not matter what the creators of LinkedIn claim it to be.  LinkedIn was founded on the idea that you can go through a series of trusted connections to network with a target person.  It was a noble idea, however, LinkedIn is now controlled by the mob.  The real question is… how are the majority of people using LinkedIn?   The answer:  Get as many connections as possible, build as big a network as possible.  Next, when you find someone in LinkedIn that you want to connect with, read their background and connect directly.

LinkedIN is a social database.

2.  Social CRM is a buzz word.

The community aspect of SocialCRM is aptly named.  Unfortunately, the average person confuses the community, group and collaboration aspects of SocialCRM with popular social networking sites like LinkedIn. They are different.

SocialCRM is not concisely defined.

When everyone is copying what everyone else is thinking, you get a buzz word.  Fun to report, you don’t need to think too much to find other articles to read, alter and republish.  Read about Social CRM and then write about Social Recruiting. It goes both ways.  But what is Social CRM?  SOCIAL is the base part of the equation.

Unfortunately SocialCRM is being used as a catch-all phrase and it is confusing the consumer. For clarity,  SocialCRM should be broken into 2 distinct terms.  Here is a way to clarify thinking and talking about it.

CollaborationCRM – Denoting the functions within a CRM that allow group collaboration, community connection and project sharing.  Salesforce chatter is a good example.

SocialCRM –  Connectivity to existing social networks like LinkedIn.  This is the definition, when polled,  that most people believe social CRM to be.  (Straw poll yielded 9 out of 10 assuming this definition).

Social Linkage – defined below

The current implementions of Social CRM (as defined above) defeat the purpose of having a CRM.  The best implementation of a CRM is when the CRM is self-contained.  Art Papas, CEO of Bullhorn, an Applicant Tracking System (recruiter CRM) describes it well.  “Our clients live inside Bullhorn”.   The best CRM should have everything the users need, inside the CRM.

Example: you click on a LinkedIN link next to a contact record in your CRM.  What happens?  A browser page opens and you are in a separate web page, disconnected technology, outside your CRM.  This is Social Linkage, not social CRM.  Bad process.

If a CRM is implemented correctly, you should not have to leave the CRM to perform important tasks.

Most of what is touted as Social CRM today is simply Social Linkage.  Social CRM sounds better, sounds integrated, but in every case I have seen…it is not.   What is the challenge here?  Until LinkedIN and Facebook and all the other networks allow tighter integrations,  social linkage will be all that we have.   LinkedIN wants you to stay on LinkedIN,  Facebook wants you on Facebook.  Salesforce wants to be able to say they have connection to LinkedIN.

3. Marketing, not sales, is driving “the idea” of Social CRM

If you look at who is pushing the SocialCRM idea, it is marketing.  The dream:  Having EVERY contact in your CRM mashed up with all social network information.  This would be great for marketing and market segmentation, but unnecessary for sales.   The Reality:  Click, click, and more clicks.  The current state of SocialCRM is, at best, Social Linkage.  The reality does not match the dream.  Marketing is pushes the dream and leaves sales stuck with the reality.

If you have a question about what sales thinks about “Social CRM” as it relates to social network data, look at the ratings The LinkedIn plugin got on salesforce CRM.   Don’t get me wrong,  I am a fan on LinkedIN.  Visionary concept, great source of data, however, it is not seamless with CRM.  If anything the combination is anti-social CRM.

Attn marketers: Your focus should be social media, let sales people worry about and define SocialCRM

4.  Social Agents will replace Social CRM. Social CRM/Social Linkage tries to solve the problem of having “an answer” for every contact in your CRM.  Every contact that you can view in your CRM will, if available, have a link to external social network profile(s).  Services like RapLeaf aggregate multiple social network links associated with a specific person.  Due to the sheer volume of information, mashups are not always correct due to the ambiguous nature of contact information.  The end result:  You click on multiple different links in your CRM and open multiple disparate sources of information.  Even when the links are correct you get Another Bad process.

Enter social agents.

The best products are built from dreaming an ultimate scenario.  Then, working backwards to what is possible.  If there were no constraints…What is the ultimate potential of Social CRM?  Answer:  Every CRM contact has real-time social network information from all social networks.  This information would not be linked, but mashed up inside the CRM.  This is not happening.  Why?  (1) It is not in the interest in the Social Network (really social database) to make the information free and fully available.  (2)  The incentive chain of $ is not there.

So if it is a bad idea to pre-populate social network information for every contact in your CRM, what should be done?  On demand, social agents.

The average sales rep engages 10-20 contacts per day.  A real-time, on-demand social agent is fully capable of making a real time extraction of social network information, mashing that information up inside the CRM and presenting it in a usable format for a sales rep.   This is what sales wants.

Conversely, I have seen a sales reps presented with a CRM that has Linkage to social networks.  While the potential is exciting to the sale rep, they are fired up about the available information available, usage drops off dramatically.

As soon as marketing starts thinking and stops listening to reporters & consultants (who listen to reporters), demand for social agents will proliferate.

5. Social Data comes in 2 distinct flavors

Where someone went to college will never change.  It is a fact, fixed in time.  Where someone currently works is a fluid social data point.   A fixed social data point only needs to be found and stored once in a CRM, whereas fluid data points require social agents to keep them updated.

Fixed and fluid social data points should be treated differently.  Why is this important to understand?  Treating  fluid and fixed data points, with different agents reduces the refresh and load on the technology infrastructure that empowers social agents.  In addition, what can be done with the result of social agents varies based on the information being fixed or fluid.

Last thought. Adding a human-verification element, to cement data accuracy, is realistic on a fixed data point.  Scan once, verify and store forever.

6.  Social Intuition will evolve from social agents

Once we have on-demand social agents, then what?  Take a mind walk:  We now have a CRM, where, on-demand, or slightly before  (predictive system)  social network information is extracted, parsed and mashed up inside the CRM.  No need to live anywhere but the CRM.  A dream of efficiency.

Now that I have all this information about someone.  How do I leverage it?  The fact that someone went to the University of Miami (The Hurricanes) is something that would be in social network profile.  Thus, via a social agent, I would have the University of Miami as a data point in the CRM.  However, would I know the UM mascot is the Hurricanes?  Would I know the score from the football team the night before?  Would I know the weather in Miami that day?   The answer to all these questions is no.

Enter Social Intuition

Social intuition is a combination of social network data points combined with real-time agents to gather additional talking points.   The prerequisite for performing this type of mash-up is (1) Aggregated & scored data from Social Networks (2) Highly accurate fixed data points (i.e.  Mascots for every college)  and (3) Intelligent agents that leverage, fixed data points with social data points to “intuit” additional information.

7.  Company-centric (NOT contact-centric) social mash-ups will prevail

Even with the proliferation of social networks, the average person has just a few, if any data points about them.  Multiply that by the number of people at a company and patterns emerge.  Patterns that would not be apparent in the microcosm of one person.  The best approach in sales is to engage multiple points of contact (people) at a company on the onset of first contact.  This approach is called Sphere of Influence Selling and is well documented in The Sphere of Influence Selling webinar.

Remember:  You talk to people, but the company writes the check.

8.  CRM Socialbases become the ultimate silos

The most valuable list is the list that no one else has.  Think about it.

The most unique set of data is inside your CRM.  Don’t worry about the world,  just about your clients and the companies you want to sell to.  Gather rich data from social networks and other sources and combine it with your CRM.  The future king of all data sets will not be inside social networks.  Companies will mash data from social networks and combine it with conversation history, notes, purchasing habits, etc.

CRM Socialbases will be built on a combination of Fixed and Fluid social data points.

The value of any list can be scored based on data quality & competitive advantage.  For example, LinkedIN has great data, but it is it exclusive?  No.  Anyone with a bunch of connection can get to the names of almost everyone.

9.  Things to watch

Bleeding edge: Watson.  An IBM supercomputer that will, in the coming months, be competing with top Jeopardy players.  In initial testing, it beat the average player, that were winners, on the Jeopardy TV show.  5 years ago this was not possible.   Watson is an answer machine.   What happens when you connect an answer machine with your CRM SocialBase?

Hot: Salesforce chatter: I like this technology.  Nothing that can’t be copied.  Expect to see it in every CRM within a few years. Brings another aspect of social into CRM, in terms of work teams and projects.

Fun: Proximity based social networks – Not a primary technology, but something that should be eventually mashed up. FourSquare is a good example.   (Yes, I am the mayor of Broadlook).

Practical: CRM Profiler – The next iteration of the technology is cloud-based, lives inside the CRM, jumps over social linkage and includes social agents.  Build your own social knowledge-base.


10.  Black swans emerging?

Black swan theory Something that changes everything in a space.  Denotes an occurrence that no one though of.

LinkedIn CRM – It makes sense, but would they alienate CRM’s that currently mash up with them?  It has happened before.  In the recruiting space, AIRS, a recruiter add on tool, created their own applicant tracking system.  Guess who integrates with AIRS today?  Nothing of importance.  Next AIRS was acquired by a RPO (recruitment process outsourcing) company… how many competing RPO’s will continue to use them?  The number is declining.

Facebook CRM – That would be real scary, however, a spin-off without the facebook label might fly.  The yo-yo ethics of their privacy policy is comical.  Can’t ignore them.

Salesforce acquisition of LinkedIn:  More likely to be Oracle, SAP, Microsoft or a company that has deep pockets.  Salesforce already acquired Jigsaw.

Scariest combo:  Google Acquires LinkedIn, creates the Google CRM and makes it free.  It actually makes total sense.  If Google wants to push ads all day long, while people are at work.  This is the way.  Gmail is already the best web-based email system.  They have google docs.  They have a mobile platform.  All the components are there.  If you take a step further and look at the talent they have hired, patterns emerge.   Nuff said.

Recap:

Social Network -> Social Database -> Social CRM  ->  Social Linkage -> Social Agents -> CRM SocialBase.

You heard it here first!

Secured By miniOrange