My vision of big data, 11 prerequisites

Late yesterday I learned from Twitter that Whitney Houston died.

99% of the Whitney Houston Tweets were exactly, “RIP Whitney Houston.” Okay… but how did she move you? Do you remember a special dance with that one girl while she was singing? Was her voice so beautiful that it made you tear up? It was for me. Originality is there, but buried on Twitter. I would have enjoyed others insights on Whitney, to feel camaraderie in a shared loss. If it existed on Twitter, it was obfuscated behind all the drone “RIP Whitney Houston” tweets. So instead I played some Whitney songs and told my children who the woman with the beautiful voice was.

Twitter is big data.

“Big data” is making the news. The concept has crept from the back pages of technical publications into the mainstream. It’s a new topic, so the reporters have commandeered it. It’s becoming popular, and that’s too bad. Media feeding frenzies perpetuate the peripheral definition; articles get copied over and over again, and people stop thinking.

With their IPO in the news, Facebook has become the poster child for big data. So what is it? What is big data? Simply put, massive amounts of information about millions, and eventually, billions of people. Big data is making the news because of fear – fear of the possibilities of abuse. It sells newspapers, gets clicks, and page views which means we will be hearing a lot about big data. Scare people and make money.

Facebook is big data.

Google is changing its privacy policy. Another media feeding frenzy. If you have a Gmail account, Google+, music, shopping, etc. All the privacy policies are melding into one. I like the idea and I have to admit, I don’t understand the problems people are having. If you use 5 or 10 different Google services, are you really going to read many different user agreements? I don’t know anyone who actually does. I would prefer to have one policy that covers them all. Google gives these services away, if you don’t like that one, single policy – stop using the service. The chances of people being informed about Google’s policies will increase if they have a single policy. It’s a good thing. Stop the bitching.

Google is big data.

Another bit in the news. The Seattle Times reports a top porn site, Brazzers, was hacked. From the article,  and other news about regarding it, usernames, passwords and real names were hacked. The data is making its way across the Internet on file sharing sites.

Internet user databases are big data.

In my vision of the world, big data is in its infancy. Don’t freak out for at least 10 years.

Why now? Why is big data coming into mainstream now? It has been around for many years. Large data providers like Experian, Axiom, and D&B have been collecting data for a long time. What is different now? To ask “why now,” you must understand the continuum of getting at big data.

11 big Data Prerequisites

  1. The data must be there – this is the most exciting tipping point.  In being the CEO of a data-mining software company, I’m still dumbfounded when users expect to get information off the web…that is not there.  It must actually exist.
  2. You must be able to flag it – you can’t store everything and must make choices.  What is important? When does it happen?  Example:  News release with subject: Nanotechnology
  3. You must be able to find it – in the absence of a real-time data stream, you must able to search though data to find a “flag” of what you are looking for.
  4. You must be able to parse it –  this is the analysis of relevant grammatical constituents, identifying the parts of what you need, from within potential noise.  Example: parsing out the name of an inventor from within an article on nanotechnology
  5. You must be able to extract it – Not the same as parsing.  What if the data is in a PDF file or HTML web page?  In many cases, extraction is about access.  Is the data I am looking for across 5 sub-links of a single web page?  Extraction as it relates to the Internet also encapsulates web crawling.
  6. You must be able to process it – This takes CPU cycles. Bigger problems need bigger computers.
  7. You must normalize it – If you have multiple pieces of data on “The Container Company”,  “Container Company, The”, “The Container Co”, etc,  how do you merge that data?  You must normalize like entities to a standard “canonical form”.  With out it, we’ve got the Data Tower of Babel.
  8. You must be able to store it – Big data takes up disk space.
  9. You must be able to index it – If you ever want to find it after you store it, the data needs to be indexed.  This also means more disk space.
  10. You must be able to analyze it – big data needs big (or many distributed) CPU’s to crunch the numbers and garner order from the chaos.
  11. There must be a payoff – Putting together big data is expensive.  Without a end goal in mind, it is expensive to collect.  Google & Facebook collect, process, index & store data for profit.

So what is my vision of “big data”?  What is being talked about in the media is very short sighted.  I think I know where big data is going.  I’m basing my vision on my prerequisites.

Big Data Thoughts

1: Information is growing beyond the ability of any single source to store and index everything. Therefore, big data can never be “all data.” Facebook and Google cannot store everything. Therefore choices must be made. Google already does it; indexing what they deem relevant.

2: The amount of data about people on Facebook is paltry…compared to the maximum possibilities. Yes, in aggregate, it is the largest set of minimal data. Think for a second about your day. What would it take to record your entire life in HD, from 7 different angles. This future data stream would include everything you heard, read, and generally interacted with.

3: Mass, personal data recording is on the horizon. The first phase is already starting. The only limit is reasonable storage. The term is called “LifeLogging.” There are devices that you can wear and it will take a picture every 30 seconds. High quality LifeLogging technology will be critical in the future. Every 30 seconds is 1/900th of video (30 frames per second). If the Lifelogging device is just the conduit vs. the storage medium, the lifelog could be stored on your home PC. With h.264 video compression and 5.5 hours of 1080p video can be stored on a 32GB thumb drive. That means a single 1TB (terabyte) drive can hold 176 hours of hi definition video (7.3 days of video). It would be expensive today to buy 52 X 1TB drives to store a year of your life. It seems crazy… right? Not when you are a historian. In 1992, the average hard drive was around 1GB – 1000 times less than today.

Some ideas to reduce the storage size of LifeLogging:

-Go vector. If you have an avatar created of you, a vectorized version of you could be stored. This type of compression does not exist, but it will. LifeLogging in bitmap video is like a tape deck. Vectorizing video with the lifelogee as the center of the story would save 1000X the storage. It is like the hard drive compared to tape storage. In addition, storing data in this way could be accessed very quickly. Bottom line: with the right *Software* real LifeLogging could be done today. I should save this for another in-depth blog. I’ve spent many nights thinking about how it all could be done. I’ve got to stop watching Sci- Fi before bed. Lawn Mower Man

4: Assume that we are in the 2020′s. Based on Moore’s Law, and several others, A LifeLogging device will be able to be worn around your neck, and record your life in HD. They’ll probably be the price of premium iPad. At that level, LifeLogging is ubiquitous.

5. What did I eat today? What about over the past week, month, or year? Just because that information, is recorded, as video (me munching Apple), does not mean that it can be analyzed and recognized as Donato-eats-apple. Where did you buy that Apple? Can the date of the purchase be cross referenced with the date that you bought it at the grocery store?

New industries

Software that analyzes and makes inferences from LifeStreaming (the will be a multi-billion dollar industry. (Donato ate apple, Donato started car, Donato got phone call, Donato was watching the movie Contact). I would expect that each major type of world interaction would be handled by a different app or algorithm.

Software that compiles inferences, builds statistics and performs what-ifs on mass LifeStream data will be multi-billion dollar industry. (23% of people that ate apples 4x per month, where the apples came from Chile, and most likely were treated with chemical X, developed cancer by age 55). These are the types of discoveries we will be able to make that are currently only made by virtue of a happy accident.  (I made up that example…but do eat organic apples).

Example: compiling a list of the junk (postal) mail letters that I throw out without opening. That is good data. What is the one that I opened?

Software that manages the rights, payments, connectivity and privacy between life streams will be a multi-billion dollar industry. So if that apple from Chile used some real nasty pesticides – like a carcinogen? Could that supplier of that apple to the store be tracked? Do you want to know this? What if your wife bought it… and it is not part of your personal data stream? Do you and you wife have a LifeStream sharing agreement?

One person, eating one Apple does not a trend make. Multiply that by 50 million people over 5 years. This is not science fiction. This is simply faster computers, more memory, and analysis software. It’s a lot of Apples. Do I want to share, if it was anonymous, my eating habits and cross reference it with my health…maybe.

I expect that companies will pop up, each with a different set of analysis technology for different niches. It will probably evolve into an AppStore model. One company looks at how you interact with media, what you watch, listen to, theaters attended. Another knows what you eat. You can choose which feeds to share with the greater LifeStream and take part in a greater community.

By the way, none of this LifeStreaming will be on Facebook, or Google+. No one would trust them. In addition, it would be prohibitively expensive to centrally transmit, store and analyze it. Hmmm, maybe Facebook could be the trend builder? It is well positioned for it. Can you imagine it?

Donato ate an Apple
Donato threw core in garbage
Donato did not recycle V8 can
Donato is driving 15 miles over the speed limit

This is the first time in a few years that I thought of a way for Facebook to survive long term. In this Facebook, you would never log in to look at what people are doing, you would log in to see that latest trend and how it affected you.

I just hope it does not make it to twitter and get retweeted by the “RIP Whitney Houston” drones. Once analysis agents can understand (and broadcast) our individual actions, Twitter has no reason to exist.

Big Data equals big money.

If it is possible, and someone can profit, it will be collected.


What happens to my iTunes music after I die?

What happens to my iTunes music after I die?


What happens to my iTunes music after I die?

It’s been something I’ve been thinking about.  Now that Steve Jobs has left us, it’s come to the top of my mind. For months I’ve been looking on iTunes at the Beatles complete collection.  It is around $100, a $9.99 download does not make be think, but $100 makes me stop and contemplate.

When I die… can I leave my vast iTunes music collection to my children?

My CD’s sure, but what about digital rights?  I’m not sure if anyone is having this conversation.  What about divorce?  Who gets the iTunes?  After reading the lengthly iTunes user agreement, it is unclear.  So in preparation, I changed my official iTunes email login to a family-oriented email.  If I get hit by a train, my family will have control of my music and Apps.  This made me look at copyright lengths.  UK Copyright lengths are 50 years.  What that means is that:

Sgt. Pepper’s Lonely Hearts Club Band, Copyright June 1967 will be free to copy in June 2016.  Unfortunately, this will be in the UK only.  The US copyright has been extended to 95 years.  How to resolve the US/UK difference?  I’m sure the Beatles copyrighted music and the US and UK, but it is interesting to think about.

This made me think more.  Is this why bands keep releasing “remastered” sound tracks?  A new soundtrack means another 50 years copyright.  That is every band’s right. It is also my right to hold on to orginal CD’s of non-remastered recordings until they pry them out of my cold dead hands.

What it all comes down to is that there will be many more discussions on digitial music and video rights. It is interesting.  I just want to live long enough to learn to play Stairway to Heaven and then use the sound track for a really cool product release…royalty free.

An article on UK copyrights:

The decline of Apps and the rise of Agents and Clewds

The decline of Apps and the rise of Agents and Clewds

Last week, while presenting a live webinar “The Near and Far Future of Recruiting” I had an epiphany.  I was talking about the eventual decline (or morphing) of Facebook.  The theory is this: Mobile computing power in 10 years will be server-capable.  Add in violation of trust and general mistrust of social networks.  The result is peer-peer social networking.  No Facebook needed.  Everything sits on your mobile device.  More private, more secure, total user control and no ads.  Facebook may lead the way, but it will be hard to do as they would cannibalize their own ad-driven revenue model.

This was last year’s Epiphany.

What led to the new epiphany was my pontificating on CRM systems.  This was a recruiter-centric talk about the future of recruiting.  Many recruiter CRMs have connections to LinkedIn profiles.   Every one of these, that I have seen, has been implemented incorrectly, not due to any fault of the vendors.  In an optimal situation, the data inside the Profile should be mashed up with current CRM data.  Instead, LinkedIn requires usage of their API which brings back a canned LinkedIn profile. This is what I call “social linkage”.

The optimal situation would be a pair of  “social agents”.  While a company may have 1000 company prospects  in their CRM, they may only contact 50 in a given day. One “social agent” would automatically refresh the entire CRM on a longer cycle such as once per quarter.  Another just-in-time social agent would update the CRM just before the outreach process.  Why is this important?  LinkedIn is not a definitive data-source; nothing is.  What happens when you combine Facebook, Google+, Jigsaw (now, Foursquare, twitter and whatever social network Microsoft comes up with?  Are you going to clutter your Salesforce or Microsoft Dynamics interface with 6-8 little snippets, much with redundant information?   This gets ugly fast.  The optimal implementation is to have a social agent retrieve LinkedIn,, Google+, Facebook, Twitter information.  Next, mash, score, apply analytics to present the information in a way that optimally fits your selling model.


iThink:  “Thought to Text” Technology

iThink: “Thought to Text” Technology

I have been struggling, for years, with getting speech-to-text working in a usable way.  About every two years over the last ten, I go out a store, excited, and buy the most recent voice recognition software. I’m always hoping for a break-through.  I’ve tried various versions of ViaVoice and Dragon Dictate. Dictation programs typically require you to read a few paragraphs to train them to your voice.  Other than that, they are fairly easy to use. For the PC and Mac, voice recognition programs have reached an acceptable level of usability.  Today, I can talk and dictate to my computer much faster than I can type.  In fact, this article is being dictated to my Mac on a plane ride from San Francisco to Minneapolis.  It seemed appropriate.

Social graces.

The guy next to me on the plane looks annoyed.  I am using a wired microphone to dictate this story.  Ok, now he is smiling.  He is enjoying the irony.  However, every time I say “period” to end a sentence the woman on the other side of me looks at me…

(Completing this article by typing)

“With Contempt”  is what I was going to say, so I decided to start using the keyboard.  Conservative, tightly wound person that probably thinks iPods are an evil plot.

The reality:  voice recognition, even if it works, does not fit into the social construct of the plane ride, the coffee shop, the bus, the subway, in fact any mass transit system.  It is a solitary endeavor OR completely annoying.  Want to try a fun experiment?  As voice becomes more commonplace in interfacing with devices…when you see someone talking out loud to command to their phone try this:   In a loud voice say  ” A B C 1 2 3″  and make sure it is loud enough for the mic on their phone to hear.  It works and totally screws up the voice recognition. Fun.

Vlingo on my iPhone

While I mentioned Vlingo on my iPhone, I did not mention that it does not work for me.  I am not a voice recognition expert, but I know that the processing power of mobile phones is not at the level of my new Core i7 MacBook (yeah.. it screams!)   In Vlingo’s defense, if you articulate well and speak slowly, it is good for sending text messages and single line emails in a quiet environment.  But it is not as good as laptop or desktop, and that is frustrating.

Don’t drink the Software-As-A-Service Kool Aid

Prediction:  Cloud-based service for Mobile Voice Recognition is a bad direction.  Even in a connected world, there are many places where you do not have either a cell or network connection.  Does it work today?  Sometimes.  However, when voice recognition really works for mobile, it will have to be native and a core function with 100% availability.  SaaS cannot offer that.  I am really surprised that Steve Jobs added the voice command into the iPhone  (Not a SaaS implementation, so they got that piece right). They usually don’t ship stuff that works  50% of the time.  Apple should have tested it in my Jeep.  If you follow the laws of computing, in about 10 years, mobile devices will be able to process voice as good as a desktop/laptop of today.  This will be a convergence of technologies, just like when the evolution from the iPhone to the iPad.  Voice commands will make more sense on the mobile device, just like some applications make sense on the iPad vs. the iPhone due to larger form factor.

What does this mean for technology affecting culture?

It can go in four directions.  First, think:  Do you remember the first time someone had a cell phone conversation, close to you, in a confined space?  How about the first time someone sat in the stall next to you and had a loud cell phone conversation?  How will you react when you are in close quarters and people are talking to their phones, dictating and email, text or tweet?

Direction 1: It will isolate people.  Socially unacceptable,  therefore people will withdrawal to a more quiet location to talk to their phones.

Direction 2: The older folks like me will lose the social acceptable battle.  The younger generation will be texting, emailing, Tweeting and “voicing” and if we don’t like it, we can  put on noise canceling head phones.

Direction 3: The advent of sub-audible microphones.  Arthur C Clarke, one of the true thought leaders talked about this in several of his books.  Basically, imagine a tooth implant that could pick up “throat noises”.  These sounds would not be heard external to a person’s body.  The sub-audible microphone would pick up the sounds and transmit to a mobile device.  Everyone could happily be talking, Tweeting, emailing, or Voicing on the same subway car without every bothering a soul.   Technology done right is elegant.   Early versions of the microphone could be placed on the neck.  No need to go running off to the Dentist just yet.  Direction 3 is my prediction.

Direction 4: Something no one has thought of yet.


If everyone is using sub-audible microphones,  there will be privacy issues.  If someone whispers and you hear it, are you invading their privacy?  Ethically yes, but legally, no.   There will be devices that unscrupulous people will employ to invade privacy.   There will be outrage, backlash and then an attempt at regulation.  While interfacing with mobile devices via sub-audible,  I predict that people will develop their own private vocabulary, like a password or macro to communicate securely with their devices.

Stepping stones to a strange new world

1. Seamless Voice recognition, native to mobile devices  (8-10 year)
2. Sub Audible microphones  (technology is here now)
3. Social acceptance          (who knows)

I started this whole article as a mind-walk towards the concept of “think to text”.  With sub audible mic’s, you can sometimes tell if someone is talking, because they will move their lips out of habit.  The younger generation that grows up on it will not.

So what about “think to text”?  I have no idea.  The headsets that are supposed to measure brainwaves and sell for about $200 and are pure crap. Don’t waist your money, they don’t work.  Based on the state of technology, a system that could recognize words you think is way off.

Waiting in line for the  iThink

On the other hand, I will be guy in line, every 2 years, excited, buying a new iThink…and hoping.

Note:  The man next to me on the plane ride was a software engineer.  We had a great conversation.  Bill:  great to meet you!  It helped to talk through the scenarios.  The woman next to us was a 4 term politician.  Thought we were crazy, thinks ideas are dangerous, wanted to make everything Bill and I were “discovering” as an exercise in thought…illegal. Basic book burner.  What a contrast!

Disruption by convergence. Here comes the iPad

Disruption by convergence. Here comes the iPad

Sometimes a totally new technology emerges and disrupts existing markets. This is where the mind naturally goes when talking about a disruption. Sometimes a disruptive technology succeeds and sometimes it does not. If you look carefully, the ones that succeeded were more a factor of convergence of multiple factors, rather than a single breakthrough.

I have a message to those naysayers on the iPad: You have no imagination, you have no vision, or you have an agenda.

Before I pontificate about the iPad, I’ll share a personal story: Last year I bought a Jeep.

I always wanted one with a top I could take off in the warmer months. Fun to park it anywhere half up a plowed-snow embankment; a side effect of Wisconsin winters. The one I got was bare-bones. It even has manual crank windows. No extras. Here is the kicker…it came with a free year of Satellite radio. Having never used Sirius or XM, it was a great experience. The fact that there is a 24×7 Springsteen-only station still boggles my mind. I love The Boss, but I was glad to have many channels to pick from. Soon I found that one of my favorite morning stations was the BBC. It is nice to get away from the ultra left and right of American talk radio. From the beginning of my subscription, I pondered if I would renew when my free year was up. I got attached to the BBC and a few other stations; It became habit.

My year of Sirius/XM just expired. I did not renew, but I did review my portfolio to make sure that Sirius/XM stocks were not present.

Admittedly, I am behind the cutting edge as it relates to Internet Radio (IR). I use Pandora occasionally, but IR is not part of my routine. It is not yet a habit. Satellite radio was habit, but I did not want to pay $12.95 per month. Why the discord with paying for a enjoyable service? I was raised on FREE radio. It just seems wrong to have to pay for radio.

Enter the Apple AppStore. I fired up my iPhone and found a whole list of Internet Radio applications. The average Internet Radio app supported about 30,000 radio stations from around the world. After trying a few Apps for $.99 to $1.99 I found one that I liked call TuneIn Radio. Yes, it supports the BBC, all the stations I wanted… and many, many others.

Naturally I asked myself the question: Why does anyone pay for Satellite radio and when they can just use Internet Radio? If it is Howard Stern, I am simply disconnected from the mainstream mod. If that is the case, stop reading. Is Howard Stern on Internet Radio? I’m not going to take the time to check.

The answer was not Howard. The answer is… that it is not super seamless AND simple. To make this work in my Jeep and would have to have my iPhone, charger, output cord and maybe a bracket to prop it up on my dash. In addition, the iPhone has a tiny little screen that is hard to interact with while I am driving. Maybe even hazardous. Most teens could hook this up easily. I’m a techie, so it is easy for me, but most people simply won’t do it. Additionally, most cars today don’t have an external input jack. That is changing, but it is not the norm.

What would make it super seamless AND simple?

(1) Improved connectivity. Bluetooth input for car radio. No cords, unless you want to charge it. Everyone understands charging

(2) Simple to use application: Done. 100’s of them on the AppStore

(3) Larger screen. Interface will be easier to work with while driving.

(4) Really good mounting. Not hard to do. I have one for my iPhone

As I was thinking about making my new iPhone powered Internet radio

Questions to ask:

Q: What are the convergence factors in moving from the iPhone/iPod to the iPad?

A: Bigger screen, faster processor, more memory, 140,000 applications

Q: Based on convergence factors, what did NOT work well on the iPhone, but will work on the iPad.

A: GPS applications: The screen was too small. Browser: Unless you have to, no one wants to read a webpage on a tiny little screen.

There are a whole set of possibilities that involve convergence on the iPad. For the nay sayers. Yes, right now the iPhone and Internet Radio is not a perfect replacement for Sirius/XM. On my drive to work, I get reception about 95% of the trip. Taking into account that I have spotty AT&T coverage and I live out near cow fields, I’m excited with 95% reception. Again, for the nay-sayers: guess what… connectivity is going to improve. Unless satellite radio does something stunning, it is going to be disrupted.

iPhone in my Jeep, Playing the BBC

Getting social network apathy?

It is hard for me to look at a “what are you thinking” box in a social network and not write something.  A few minutes ago, I got sucked in to google buzz.  This is where this post came from and what inspired it.  I’m convinced I’m right.

I’m getting social network apathy.  How many can I join, watch, etc? I have to guess that I am a few years ahead of the average person in terms of connectivity.  What will happen when the thick middle says “enough is enough”?  There will be a backlash.  There will be/must be business models that will give control back to people.  My gut tells me that the solution has to do with preference…and carrying that preference with you. All social networks (which is a very limited term) will have to interact with you based on preferences that you set in ONE PLACE.  Basically, a single-point-of-truth for how you require the world to interact with you.  I don’t want to set my preferences on myspace, facebook, linkedIN, etc etc.  Social networks, airport signs, your car, all should obey your rules.  I can see a flood of preference requests coming that I don’t want to answer.  Multiply social networks of today by 50, this is what we are in for.  Instead of social networks, I’ll group them all together as “intelligent systems”

With this being said, one of my new side projects is defining single-point-of-truth preferences.

Some axioms that I am playing with

“You own and control your preferences.”
“If a intelligent system does not obey your preferences, it gets cut off”

I believe that if I develop axioms first, the rest will be natural.  If anyone wants to brainstorm with me on this one, I’m game.

Secured By miniOrange