Data Mining and Doctor Who

Data Mining and Doctor Who

What do data mining and Doctor Who have in common?

data-miningdrwho

 

Start with data mining.  Some companies sell different versions of a database or subscription service, based on your location.  They determine your location based on your computer’s IP address.  For example, if you live in the UK and you want full access the UK version of the database, then you pay full price, but if you want to add other locations, you can get it at a fraction of the cost.   In other cases, due to licensing restrictions, if you have an IP address outside a specific country, you simply can’t access the website.

It is the same for Doctor Who.  If you live inside the UK, you can watch Doctor Who live and streaming via the BBC.  However, if you try to access from the United States, you get this message.

BBC message

Basically, you can’t view it without having an IP address in the UK.

But you really want to access a particular website and crunch all that data… or you want to really watch Doctor Who live, the day it comes out.  What do you do?

Simple:  Sign up for a VPN service.

Not all VPN options are the same.  Some have point to point utility.  For example: from your laptop you can securely connect to your company’s office router.  This is what most telecommuters are used to.  This is not the type of VPN I am talking about.  The VPN I am talking about is a location select-able VPN.  How it works:  I buy access to the service.  Next I can browse the Internet, securely, appearing as if I am in San Francisco,  Dallas, Amsterdam or London (London is in the UK).

So go out and get your private VPN.  It will take a few minutes to install the VPN client on your Mac or PC.   Next,  data mine or watch Doctor Who to your heart’s content.

Some VPN services:

http://witopia.net

https://www.privateinternetaccess.com/

http://www.ipvanish.com/

https://privacy.io/

The 12th Doctor’s first episode premieres live on the BBC Saturday August 23rd at 7:50 GMT.

 

Deep Dive into hidden and protected pages – new sourcing technique “cache windowing”

I’ve tried this and it works every time.  How long it will work, who knows.

If you google a specific page and then search the page again… but this time using the last few words from the excerpt of the page results… you can use this technique to actually scroll through an entire web page.

It’s like peering through a small cache window a section of the page at a time.

Resumes sites, Linkedin, and many others.  Maybe call it a recursion search?

 

What you know about Internet Resume searching is wrong

What you know about Internet Resume searching is wrong

Do you use search engines to look for resumes on the Internet?   Do you use exclusions such as “-jobs”  or  “-submit”?

If you do. Stop it. Read on and I’ll tell you why.

First a story about Easter hams.

To understand what I am going to say about searching for resumes, you will need to be in the right frame of mind. Here we go…

A little girl was closely watching her mother prepare the Easter Ham. She was five years old, a great age for asking questions about the world.  She watched her mother prepared the glaze, preheated the oven and brought out the large roasting pan.   In an automatic fashion, her mother took a large knife and sliced off 2 full inches of meat from each end of the ham.

The little girl,  Sarah, smiled as a question came to mind.

“Mommy, why do you cut the ends off the ham?”  she asked.

As if startled the mother replied, unconvincingly  “I don’t know Sarah, my mom always did it.  Maybe it is so the glaze gets inside. ”

Not being satisfied with the answer, Sarah tracked down Grandma.

“Grandma” She asked.  “I just saw mommy cut off the two ends of the Easter Ham.  She said that she learned it from you.  Why did you make the Easter Ham that way?”

Grandma answered.  “That is a good question, Sarah, but I learned it from my mom, your great grandma.  I always thought that it was so the Ham cooked faster.”

Again, unsatisfied, Sarah tracked down, Great Grandma, the family Matriarch.

“Great grandma”, She asked as she crawled up on her lap.  “Mommy cut the ends off the Easter Ham. She thought is was so the glaze flavor got into the ham.  She did this because Grandma did it.  Grandma thought it was so the Ham would cook faster.  Grandma learned it from you. ”

With anticipation, Sarah asked her Great Grandma. “Grandma, why did you cut the ends off the Easter Ham?”

Grandma, wise as she was old, chuckled and answered.  “Sarah, when I married your great grandfather, the roasting pan we got for our wedding was too small for a Christmas Ham.”

“We cut the ends off the Ham so it would fit in the pan”.
xmas ham

Such is the progression of knowledge.  There is no fault when we inherit a practical idea that worked in the past, yet is anachronistic.  In the case of the Easter Ham, a practical, real world solution should have lived and died within a single generation, a single iteration.  However, it continued until one with a child’s mind, a questioning mind, wanted to know why.  When she was not satisfied with the answer, went on a journey of discovery.

Looking at resume search with a “Beginners Mind”

In the past 2 year I’ve taken a bit of a journey in questioning how people use search engines to search the Internet.

Observation:  Top Internet searchers, myself included, had an innate set of beliefs that they held.  These observations eventually evolved into The 8 Laws of Internet Search,  which are a set of axioms for searching the Internet.

At this point I want to make a disclaimer:  I am really, really good at finding things on the Internet. This is not due to any formal training, nor did I have the advantage of a teacher or mentor.  I am self-taught.  I have literally been immersed in searching the Internet for the last 15 years.

Second disclaimer:  I do not include myself as one of the search-string guru’s out there.  To be a search string guru, you need to be current, know the latest websites that are out there, as well as the latest capabilities of each of the search engines; you need to be immersed in the searching.  My immersion is in the underlying rules.

I recently had a conversation with a search string guru .  We agreed that the best analogy was that I design the aircraft and the search string gurus are the pilots.  Works for me.

So what about resumes and searching the Internet?

If I attempted to research the state of resume search, without a basis or set of axioms to work from, I would not have known where to start.  Fortunately, I decided to use the 8 Laws of Internet Search as a starting point. With a special emphasis on the first 3.

8 laws of internet search

So the question I decided to ask myself is: How do the commonly taught practices of resume search stack up to the Laws of Internet Search?  This was a definable goal.   Caveat: My focus is “Open web” resume searches and not searches within a controlled environment like Monster.com or CareerBuilder.com.

1-law of environment
The Law of Environment. Trainers do an excellent good job talking about the various search engines, their capabilities and limitations.

Industry score on the Law of Environment: A+

2-law of permutation

In taking The Law of Permutation into consideration, I found 2 areas that were very different.

1.  Boolean search methods

Sub-score:  B.   Trainers are clear on the concepts that you must search using multiple permutations such as “VP of Sales”, “Vice President of Sales”,  “VP Sales”, etc.  However, the reality is that you may need 15-20 title combinations to reach all possible results.

2.  Semantic search methods

Sub-score: C.  A good deal of mis-information is being spread about semantic search.  Some of this stems from irresponsible vendors that are trying to make a buck.  It would not be a big deal,  if trainers actually tested, scientifically, what they started teaching.  The funny thing is that the value proposition is significant with semantic search.  Say what it can (and can not do) and those vendors will have happy customers with proper expectations.  I shouldn’t be too harsh here, in the early days, I believed the software from Broadlook was meant for everyone.  It is not.  Setting clear expectations of technology capabilities is the mark of a mature vendor.

Semantic search is great when you have a type of resume that is well identified and the rules have been built.  However, throw it a niche area that has not been cataloged and it will fall flat.  Advice:  If you are looking for a commodity position like a .NET programmer, semantic search can work marvels.  If you are working in a niche area, pick a semantic search engine that can be trained by inputting sample resume data.  In the later case, you may have to do the leg work with good old Boolean search first.  Also, ask your semantic search vendor if they use exclusions when they mine search engines.  If they do, twist their arm until they stop.  It’s an old Easter Ham.

Industry score on the Law of Permutation: C+


3-law of completeness
The Law of Completeness.    Widely taught methodologies, that have not been questioned in years (like the Easter Ham) are yielding approximately 65%.  If you get 65% on a math test, that is not a good grade.   The first example is not using the full available results from a search string query.  If a google search yields 380 results,  the Law of Completeness states that you must work with the entire set of results for maximum yield.

Completeness is not being reached. Why?  When trainers first started teaching how to use search engines (before google),  there were limitations in the technology.  Those limitations were:

(1) No high accuracy method to screen out page results that were NOT a resume.  Therefore search strings needed to be modified to exclude results that were not resumes.

(2) No method to extract all results from a search query.  Therefore search strings needed to be modified to reduce results to a manageable quantity

In both cases, the strategy worked, unfortunately there was a side effect:  Many good results were also thrown out.

Industry score on the Law of Completeness: D

Dropping the bomb on search string exclusions.

So where is the proof, where is the science?

First, I want to thank Cory Dickenson at Broadlook Technologies for leading the team of researchers on search string exclusion metrics.  Looking through tens of thousands of resumes, by hand, and then doing it two more times, is not a fun task.  The reality is that someone had to do it.  Hopefully when this study is reviewed both recruiters and technology vendors will have a better foundation in which to build upon.  I basically hate inefficiency.

Resume Exclusion Metrics (Broadlook project: FRET, Frikken Resume Exclusion Test)

The study was simple.  What was the effect of using exclusions on a resume search string?

The first thing we did for the study was to mine a bunch of social networks and sites that had advice on resume search strings.  We wanted examples, over the past 10 years, that experts were using.  From a few hundred examples, we made a list off all the popular resume search string exclusions that were being used (i.e. -job -job -you -your -submit).

Creating the resume data set

To set up the study, we created search strings for about job 50 positions.  The positions were a wide range: IT , biotechnology, health care, sales, business development, financial, etc.  Next for each search,  we made sure that the search string was specific enough so the results from the search engine was <1000. We did not use any exclusions.  Last step:  Hand verification of every single search engine result.  Each result was classified in one of 4 categories (1) Resume (2) Resume sample page  (3) resume book page (4) Junk: Not a resume.

At this point, we could bring automation into the equation.  Using Broadlook’s Eclipse tool, we automated each of the 50 searches with one of the exclusion terms.  We then repeated the each of the 50 searches with each of the exclusion terms.  Since we already hand-identified which search engine result pages were resumes, we were able to calculate, for each search-exclusion combination, how many REAL resumes were skipped by using each exclusion term. When the searching was done, we had average percentages, across many industries and titles.  We know, with high precision, what percentage of resumes you will lose by using an exclusion term.

Why did I do this study?  Too much time on my hands?..no.  I was interested in making the best open web resume search tool possible.  To accomplish that goal, the tool needs to work within the framework of the Laws of Internet Search.  Specifically the first 3:  Environment, Permutation, Completeness.  The end result was Broadlook Diver 3.0.  The resume search part of the tool *automatically* screens out pages that are not resumes.  In addition, since it is an automation tool, it allows the user to work with complete results from a search engine.   While you can only get Diver from Broadlook, the Resume Exclusion Metrics are free to all.  Enjoy.

The Axioms of Internet Resume Search

1.  Seek <1000 results per search.

You should conduct your search with enough specificity that the search engine reports that there are less than 1000 results.  If you are doing a search that yields many thousands, break up the search into a few separate searches

2.  Never use single-phrase exclusions

Otherwise you will miss a good percentage of resumes.  It is reasonable to use multi-word exclusions, as the level of ambiguity is low.

3.  Use multiple search engines.

There are varying reports of the cross over being as low as 20%.   (Happy to get comments from additional sources on this)

4.  Use automation to screen out non-resumes

Don’t do it by hand and don’t ignore the data below and use exclusions.  This is not 1998 anymore.  Let automation technology screen out Search Engine Result Pages (SERPS) that are not resumes. This includes sample resume pages, job pages, etc.

And now for the Exclusion metrics.

From pool of about 50 job descriptions,  100+ searches,  75,000 search engine results, 28200 resumes, hand verified.  The sort order is based on the worst offending term.  These exclusion terms were pulled from top experts answers on forums about resume search.  Remember the Easter Ham, it is not my intention to reduce the tremendous contribution of those people that freely answer questions (every day) about internet resume search.  It is my intention to give more data so that the entire industry has more facts in which to work with.

Exclusion % REAL Resumes Missed
-job 49.78%
-jobs 40.89%
-summary 37.33%
-intext:resumes 34.37%
-about 34.07%
-writing 32.74%
-your 29.19%
-you 27.41%
-example 25.78%
-required 25.19%
-require 23.70%
-free 23.26%
-list 19.11%
-“how to” 17.04%
-template 16.15%
-library 14.96%
-intitle:jobs 14.37%
-professor 13.48%
-intitle:job 13.19%
-inurl:aspx 12.74%
-send 12.44%
-write 11.56%
-inurl:php 11.41%
-requirement 10.22%
-apply 9.78%
-intitle:apply 9.78%
-sample 9.78%
-intitle:sample 9.48%
-“resume
service”
9.19%
-intitle:career 9.04%
-intitle:example 9.04%
-careers 8.89%
-submit 8.89%
-intitle:examples 8.59%
-intitle:write 8.59%
-intitle:how 8.44%
-intitle:submit 8.44%
-inurl:books 8.44%
-trainings 8.00%
-wizard 7.70%
-samples 7.41%
-inanchor:apply 6.67%
-opening 6.37%
-reply 6.22%
-wanted 6.07%
-applicant 4.89%
-inanchor:sample 4.59%
-inanchor:submit 4.00%
-eoe 3.70%

This resume research project yielded many other interesting facts, such as percentages of doc files vs. pdf, etc.  In the coming weeks, I will be publishing a white paper that breaks down the data in a bunch of categories… after I get back from DisneyWorld!

The future death of social networking

The future death of social networking

Social networking is going to die.  This article is about how it will happen.

The focus for this article will be business social networking.  If you are worried about your Facebook friends and photos and the life sucking that goes on in personal social networks, don’t worry, they will be around for awhile.  They will be dying a totally different death.  That will have to be a future blog posting.  Ask me over a beer and I will explain it.

Ask three people to define business social networking and you will get three different answers.  Try it. Going even further, I hypothesized that you ask ten different people about the benefits of business social networking, you will get ten different answers.  I was recently inspired by a quote attributed to Steve Jobs about dogma as “Being satisfied with the results of other peoples thinking.”  This article will be as dogma free as possible.  While I can’t help being influenced by everything that is being written about social networking, I have come up a few unique conclusions.

1.  LinkedIn is not a social network. Most of my contacts are either in a sales or recruiting role.  In the early days, the premise behind LinkedIn was that you can connect to many people through a chain of trusted referrals. It does not matter what the creators of LinkedIn claim it to be.  LinkedIn was founded on the idea that you can go through a series of trusted connections to network with a target person.  It was a noble idea, however, LinkedIn is now controlled by the mob.  The real question is… how are the majority of people using LinkedIn?   The answer:  Get as many connections as possible, build as big a network as possible.  Next, when you find someone in LinkedIn that you want to connect with, read their background and connect directly.

LinkedIN is a social database.

2.  Social CRM is a buzz word.

The community aspect of SocialCRM is aptly named.  Unfortunately, the average person confuses the community, group and collaboration aspects of SocialCRM with popular social networking sites like LinkedIn. They are different.

SocialCRM is not concisely defined.

When everyone is copying what everyone else is thinking, you get a buzz word.  Fun to report, you don’t need to think too much to find other articles to read, alter and republish.  Read about Social CRM and then write about Social Recruiting. It goes both ways.  But what is Social CRM?  SOCIAL is the base part of the equation.

Unfortunately SocialCRM is being used as a catch-all phrase and it is confusing the consumer. For clarity,  SocialCRM should be broken into 2 distinct terms.  Here is a way to clarify thinking and talking about it.

CollaborationCRM – Denoting the functions within a CRM that allow group collaboration, community connection and project sharing.  Salesforce chatter is a good example.

SocialCRM –  Connectivity to existing social networks like LinkedIn.  This is the definition, when polled,  that most people believe social CRM to be.  (Straw poll yielded 9 out of 10 assuming this definition).

Social Linkage – defined below

The current implementions of Social CRM (as defined above) defeat the purpose of having a CRM.  The best implementation of a CRM is when the CRM is self-contained.  Art Papas, CEO of Bullhorn, an Applicant Tracking System (recruiter CRM) describes it well.  “Our clients live inside Bullhorn”.   The best CRM should have everything the users need, inside the CRM.

Example: you click on a LinkedIN link next to a contact record in your CRM.  What happens?  A browser page opens and you are in a separate web page, disconnected technology, outside your CRM.  This is Social Linkage, not social CRM.  Bad process.

If a CRM is implemented correctly, you should not have to leave the CRM to perform important tasks.

Most of what is touted as Social CRM today is simply Social Linkage.  Social CRM sounds better, sounds integrated, but in every case I have seen…it is not.   What is the challenge here?  Until LinkedIN and Facebook and all the other networks allow tighter integrations,  social linkage will be all that we have.   LinkedIN wants you to stay on LinkedIN,  Facebook wants you on Facebook.  Salesforce wants to be able to say they have connection to LinkedIN.

3. Marketing, not sales, is driving “the idea” of Social CRM

If you look at who is pushing the SocialCRM idea, it is marketing.  The dream:  Having EVERY contact in your CRM mashed up with all social network information.  This would be great for marketing and market segmentation, but unnecessary for sales.   The Reality:  Click, click, and more clicks.  The current state of SocialCRM is, at best, Social Linkage.  The reality does not match the dream.  Marketing is pushes the dream and leaves sales stuck with the reality.

If you have a question about what sales thinks about “Social CRM” as it relates to social network data, look at the ratings The LinkedIn plugin got on salesforce CRM.   Don’t get me wrong,  I am a fan on LinkedIN.  Visionary concept, great source of data, however, it is not seamless with CRM.  If anything the combination is anti-social CRM.

Attn marketers: Your focus should be social media, let sales people worry about and define SocialCRM

4.  Social Agents will replace Social CRM. Social CRM/Social Linkage tries to solve the problem of having “an answer” for every contact in your CRM.  Every contact that you can view in your CRM will, if available, have a link to external social network profile(s).  Services like RapLeaf aggregate multiple social network links associated with a specific person.  Due to the sheer volume of information, mashups are not always correct due to the ambiguous nature of contact information.  The end result:  You click on multiple different links in your CRM and open multiple disparate sources of information.  Even when the links are correct you get Another Bad process.

Enter social agents.

The best products are built from dreaming an ultimate scenario.  Then, working backwards to what is possible.  If there were no constraints…What is the ultimate potential of Social CRM?  Answer:  Every CRM contact has real-time social network information from all social networks.  This information would not be linked, but mashed up inside the CRM.  This is not happening.  Why?  (1) It is not in the interest in the Social Network (really social database) to make the information free and fully available.  (2)  The incentive chain of $ is not there.

So if it is a bad idea to pre-populate social network information for every contact in your CRM, what should be done?  On demand, social agents.

The average sales rep engages 10-20 contacts per day.  A real-time, on-demand social agent is fully capable of making a real time extraction of social network information, mashing that information up inside the CRM and presenting it in a usable format for a sales rep.   This is what sales wants.

Conversely, I have seen a sales reps presented with a CRM that has Linkage to social networks.  While the potential is exciting to the sale rep, they are fired up about the available information available, usage drops off dramatically.

As soon as marketing starts thinking and stops listening to reporters & consultants (who listen to reporters), demand for social agents will proliferate.

5. Social Data comes in 2 distinct flavors

Where someone went to college will never change.  It is a fact, fixed in time.  Where someone currently works is a fluid social data point.   A fixed social data point only needs to be found and stored once in a CRM, whereas fluid data points require social agents to keep them updated.

Fixed and fluid social data points should be treated differently.  Why is this important to understand?  Treating  fluid and fixed data points, with different agents reduces the refresh and load on the technology infrastructure that empowers social agents.  In addition, what can be done with the result of social agents varies based on the information being fixed or fluid.

Last thought. Adding a human-verification element, to cement data accuracy, is realistic on a fixed data point.  Scan once, verify and store forever.

6.  Social Intuition will evolve from social agents

Once we have on-demand social agents, then what?  Take a mind walk:  We now have a CRM, where, on-demand, or slightly before  (predictive system)  social network information is extracted, parsed and mashed up inside the CRM.  No need to live anywhere but the CRM.  A dream of efficiency.

Now that I have all this information about someone.  How do I leverage it?  The fact that someone went to the University of Miami (The Hurricanes) is something that would be in social network profile.  Thus, via a social agent, I would have the University of Miami as a data point in the CRM.  However, would I know the UM mascot is the Hurricanes?  Would I know the score from the football team the night before?  Would I know the weather in Miami that day?   The answer to all these questions is no.

Enter Social Intuition

Social intuition is a combination of social network data points combined with real-time agents to gather additional talking points.   The prerequisite for performing this type of mash-up is (1) Aggregated & scored data from Social Networks (2) Highly accurate fixed data points (i.e.  Mascots for every college)  and (3) Intelligent agents that leverage, fixed data points with social data points to “intuit” additional information.

7.  Company-centric (NOT contact-centric) social mash-ups will prevail

Even with the proliferation of social networks, the average person has just a few, if any data points about them.  Multiply that by the number of people at a company and patterns emerge.  Patterns that would not be apparent in the microcosm of one person.  The best approach in sales is to engage multiple points of contact (people) at a company on the onset of first contact.  This approach is called Sphere of Influence Selling and is well documented in The Sphere of Influence Selling webinar.

Remember:  You talk to people, but the company writes the check.

8.  CRM Socialbases become the ultimate silos

The most valuable list is the list that no one else has.  Think about it.

The most unique set of data is inside your CRM.  Don’t worry about the world,  just about your clients and the companies you want to sell to.  Gather rich data from social networks and other sources and combine it with your CRM.  The future king of all data sets will not be inside social networks.  Companies will mash data from social networks and combine it with conversation history, notes, purchasing habits, etc.

CRM Socialbases will be built on a combination of Fixed and Fluid social data points.

The value of any list can be scored based on data quality & competitive advantage.  For example, LinkedIN has great data, but it is it exclusive?  No.  Anyone with a bunch of connection can get to the names of almost everyone.

9.  Things to watch

Bleeding edge: Watson.  An IBM supercomputer that will, in the coming months, be competing with top Jeopardy players.  In initial testing, it beat the average player, that were winners, on the Jeopardy TV show.  5 years ago this was not possible.   Watson is an answer machine.   What happens when you connect an answer machine with your CRM SocialBase?

Hot: Salesforce chatter: I like this technology.  Nothing that can’t be copied.  Expect to see it in every CRM within a few years. Brings another aspect of social into CRM, in terms of work teams and projects.

Fun: Proximity based social networks – Not a primary technology, but something that should be eventually mashed up. FourSquare is a good example.   (Yes, I am the mayor of Broadlook).

Practical: CRM Profiler – The next iteration of the technology is cloud-based, lives inside the CRM, jumps over social linkage and includes social agents.  Build your own social knowledge-base.


10.  Black swans emerging?

Black swan theory Something that changes everything in a space.  Denotes an occurrence that no one though of.

LinkedIn CRM – It makes sense, but would they alienate CRM’s that currently mash up with them?  It has happened before.  In the recruiting space, AIRS, a recruiter add on tool, created their own applicant tracking system.  Guess who integrates with AIRS today?  Nothing of importance.  Next AIRS was acquired by a RPO (recruitment process outsourcing) company… how many competing RPO’s will continue to use them?  The number is declining.

Facebook CRM – That would be real scary, however, a spin-off without the facebook label might fly.  The yo-yo ethics of their privacy policy is comical.  Can’t ignore them.

Salesforce acquisition of LinkedIn:  More likely to be Oracle, SAP, Microsoft or a company that has deep pockets.  Salesforce already acquired Jigsaw.

Scariest combo:  Google Acquires LinkedIn, creates the Google CRM and makes it free.  It actually makes total sense.  If Google wants to push ads all day long, while people are at work.  This is the way.  Gmail is already the best web-based email system.  They have google docs.  They have a mobile platform.  All the components are there.  If you take a step further and look at the talent they have hired, patterns emerge.   Nuff said.

Recap:

Social Network -> Social Database -> Social CRM  ->  Social Linkage -> Social Agents -> CRM SocialBase.

You heard it here first!

The 8th law of Internet Search;  The Law of Environment

The 8th law of Internet Search; The Law of Environment

Steven Covey published The Seven Habits of Highly Successful People and it was a great book.

When Dr. Covey came out with a new book, The 8th Habit, I was skeptical.  Why didn’t he think up the 8th habit right from the start?

Now I understand it.  Ideas evolve.  We are the sum total of your experiences at any point in time. You create a set of rules that you believe are universal.  In my case, I am the author of The Seven Laws of Internet Search.

The Original Laws …
1. Permutation
2. Completeness
3. Iteration
4. Frequency
5.  Process
6. Taxonomy
7. Measurable Results

It has been about a year and a half and now, guess what?  I came up with another Law of Internet Search.  The 8th law could not have been created by me…unless I was able to observe people learning and implementing the first seven laws in their Internet search activity.

Here is what I observed:  The Internet is “non-homogeneous”.  The idea of homogeneity  also resonated with me as I wrote the original seven laws.  I played with the idea of a Law of Non-homogeneity.  This means that the Internet exists in many different formats and there is no way to query everything, with a single method or game plan.

“Non-Homogeneous” sounds ugly.  To define something with “non” in front of it…it would be like cheating.  Each of the seven laws of Internet Search is meant to be a simple axiom of advice.    I failed to get my concept of Homogeneity into the laws.

Why did I fail?  It is simple.  Each of the seven laws is a solution.  Whereas “non-homogeneous” or “non-homogeneity” was talking about a problem.

What was I trying to get at?  It is also simple.  The Internet is not homogeneous, therefore, many different methods are needed to search it.  It is those very search mechanisms that the 8th Law takes into account.  The 8th law is  The Law of Environment.

In fact, the 8th Law is so important, I have moved it the top spot in The Laws of Internet Search.  It is now The 1st Law of Internet Search.

8th law of internet search

To understand the Law of Environment.  Get your mind around the concept of the Internet having many modalities. Many sites, each with it’s own set of rules or search environment.

internet_environments

Next.  There are some simple questions to ask.   What is the access method?  What are the sites restrictions?  Etc

environment_questions

In addition to the simple questions about the environment, the more advanced Internet search may want to dive into further understand the full capabilities of the search environment.

in depth environment questions

Once the simple questions about the environment are answered, the Internet search can proceed with quantifiable expectations on what to expect from their chosen search medium.

an ordered vision

For example, it is important to understand that Google will only give you a maximum of 1000 results from any search.  Even if Google reports that their are 2450 results, you only have access to the first 1000.  Understanding this is understanding the limitation of the environment.

google environment

Here are the The Laws of Internet Search, Reloaded

1.  Environment
2. Permutation
3. Completeness
4. Iteration
5. Frequency
6. Process
7. Taxonomy
8. Measurable Results

Dr. Steven Covey, now I understand. Looking forward to the ninth law.

Finding the right place for semantic search

Semantic search is a fantastic technology, if used correctly.  I am not talking about users of semantic search technology, I am talking about the technology vendors that make it part of a system

I was inspired to write this blog after reading Glen Cathey’s (The Boolean Black Belt) Article on Why Do So Many ATS Vendors Offer Poor Search Capability.  The article made me think about search engines (google, yahoo, etc) and how semantic search is being used with them.

What is semantic search?  To put is simple: semantic search can take, as input, a word like “Java” and offers up other related terms like “J2EE” or “Beans” (both are related to Java).  This allows the user to type in a few terms but match many, many terms.

The matching terms are built into an “expert system” that is continually built over time.  Many fancy names are given to these systems, based on how they are built, but basically they are sets of rules.

Semantic search is not AI (artificial intelligence).  If you hear that, it probably started in a marketing department somewhere.

Companies that have built semantic search engines, while they have not created AI, have spent a tremendous amount of time and resources to build these sets of rules.  The better engines can build rules on the fly from a new set of data, like resumes.  This is very cool stuff.

Overall, I like semantic search.  It has great potential, however, it has great weaknesses if used incorrectly.   If built into the engine itself, semantic search can be very powerful,  this is because semantic processing is done at the search engine side, without any limitations or constraints.  However, if bolted onto a search engine, it can be more harmful than good.

Here is what I mean.  I’ll try to keep my logic simple.

1. The Google search engine has a limit in how many terms can be submitted to it.

2. Semantic search, by it’s nature, creates permutations upon given terms. For example:

“Senior VP of Sales”  can be “SVP Sales” or “Senior Vice President of Sales”

to translate that into a boolean expression you get

“senior vp of sales” OR “SVP sales” OR “senior vice president of sales”

3.  After creating permutations upon several concepts, you are out of search terms.

I’m a big believe in laws (maybe not speed-limit laws), but more the “laws of the universe” type stuff.  I like to understand and deconstruct the rules and see if each one stands alone, or, do I need to recheck my premises.  In this spirit, just before the first sourceCon conference, I developed the Seven Laws of Internet Research.  I felt there was too much emphasis on memorizing search strings and the latest search engines or sites, but not enough fundamental thought leadership on how to think about searching the Internet.

The first two laws are

1. The Law of Permutation
2. The Law of Completeness

The Law of Permutation simply states that when searching the Internet, as it is not a homogeneous source of data, you must describe what you are looking for in the language of the many vs. the language of the one.  (YES, this is what Semantic search is doing).

The Law of Completeness states you must strive for completeness of search engine results in order to have the superior outcome

Big Question:  What happens if semantic search is applied before you reach completeness of results?

Answer:  Missing data. Competitors eat your lunch.  If you are a sales person, it means missed sales leads, if you are a recruiter, it means missed resumes or passive candidates.

Does this mean that I am anti-semantic search?  No way.  I think it has great potential.

Here are my take-aways:

-Semantic search should be inside the search engine for optimal results

-Semantic search bolted onto a standard search engine is severely limited.

-Semantic search will cause data to be missed if applied before reaching completeness of possible results

-When combining a standard search engine and semantic search, it is best to apply the semantic processing AFTER completeness of data has been reached.  In reality, this would not be semantic search, but semantic filtering.

Secured By miniOrange