What you know about Internet Resume searching is wrong

What you know about Internet Resume searching is wrong

Do you use search engines to look for resumes on the Internet?   Do you use exclusions such as “-jobs”  or  “-submit”?

If you do. Stop it. Read on and I’ll tell you why.

First a story about Easter hams.

To understand what I am going to say about searching for resumes, you will need to be in the right frame of mind. Here we go…

A little girl was closely watching her mother prepare the Easter Ham. She was five years old, a great age for asking questions about the world.  She watched her mother prepared the glaze, preheated the oven and brought out the large roasting pan.   In an automatic fashion, her mother took a large knife and sliced off 2 full inches of meat from each end of the ham.

The little girl,  Sarah, smiled as a question came to mind.

“Mommy, why do you cut the ends off the ham?”  she asked.

As if startled the mother replied, unconvincingly  “I don’t know Sarah, my mom always did it.  Maybe it is so the glaze gets inside. ”

Not being satisfied with the answer, Sarah tracked down Grandma.

“Grandma” She asked.  “I just saw mommy cut off the two ends of the Easter Ham.  She said that she learned it from you.  Why did you make the Easter Ham that way?”

Grandma answered.  “That is a good question, Sarah, but I learned it from my mom, your great grandma.  I always thought that it was so the Ham cooked faster.”

Again, unsatisfied, Sarah tracked down, Great Grandma, the family Matriarch.

“Great grandma”, She asked as she crawled up on her lap.  “Mommy cut the ends off the Easter Ham. She thought is was so the glaze flavor got into the ham.  She did this because Grandma did it.  Grandma thought it was so the Ham would cook faster.  Grandma learned it from you. ”

With anticipation, Sarah asked her Great Grandma. “Grandma, why did you cut the ends off the Easter Ham?”

Grandma, wise as she was old, chuckled and answered.  “Sarah, when I married your great grandfather, the roasting pan we got for our wedding was too small for a Christmas Ham.”

“We cut the ends off the Ham so it would fit in the pan”.
xmas ham

Such is the progression of knowledge.  There is no fault when we inherit a practical idea that worked in the past, yet is anachronistic.  In the case of the Easter Ham, a practical, real world solution should have lived and died within a single generation, a single iteration.  However, it continued until one with a child’s mind, a questioning mind, wanted to know why.  When she was not satisfied with the answer, went on a journey of discovery.

Looking at resume search with a “Beginners Mind”

In the past 2 year I’ve taken a bit of a journey in questioning how people use search engines to search the Internet.

Observation:  Top Internet searchers, myself included, had an innate set of beliefs that they held.  These observations eventually evolved into The 8 Laws of Internet Search,  which are a set of axioms for searching the Internet.

At this point I want to make a disclaimer:  I am really, really good at finding things on the Internet. This is not due to any formal training, nor did I have the advantage of a teacher or mentor.  I am self-taught.  I have literally been immersed in searching the Internet for the last 15 years.

Second disclaimer:  I do not include myself as one of the search-string guru’s out there.  To be a search string guru, you need to be current, know the latest websites that are out there, as well as the latest capabilities of each of the search engines; you need to be immersed in the searching.  My immersion is in the underlying rules.

I recently had a conversation with a search string guru .  We agreed that the best analogy was that I design the aircraft and the search string gurus are the pilots.  Works for me.

So what about resumes and searching the Internet?

If I attempted to research the state of resume search, without a basis or set of axioms to work from, I would not have known where to start.  Fortunately, I decided to use the 8 Laws of Internet Search as a starting point. With a special emphasis on the first 3.

8 laws of internet search

So the question I decided to ask myself is: How do the commonly taught practices of resume search stack up to the Laws of Internet Search?  This was a definable goal.   Caveat: My focus is “Open web” resume searches and not searches within a controlled environment like Monster.com or CareerBuilder.com.

1-law of environment
The Law of Environment. Trainers do an excellent good job talking about the various search engines, their capabilities and limitations.

Industry score on the Law of Environment: A+

2-law of permutation

In taking The Law of Permutation into consideration, I found 2 areas that were very different.

1.  Boolean search methods

Sub-score:  B.   Trainers are clear on the concepts that you must search using multiple permutations such as “VP of Sales”, “Vice President of Sales”,  “VP Sales”, etc.  However, the reality is that you may need 15-20 title combinations to reach all possible results.

2.  Semantic search methods

Sub-score: C.  A good deal of mis-information is being spread about semantic search.  Some of this stems from irresponsible vendors that are trying to make a buck.  It would not be a big deal,  if trainers actually tested, scientifically, what they started teaching.  The funny thing is that the value proposition is significant with semantic search.  Say what it can (and can not do) and those vendors will have happy customers with proper expectations.  I shouldn’t be too harsh here, in the early days, I believed the software from Broadlook was meant for everyone.  It is not.  Setting clear expectations of technology capabilities is the mark of a mature vendor.

Semantic search is great when you have a type of resume that is well identified and the rules have been built.  However, throw it a niche area that has not been cataloged and it will fall flat.  Advice:  If you are looking for a commodity position like a .NET programmer, semantic search can work marvels.  If you are working in a niche area, pick a semantic search engine that can be trained by inputting sample resume data.  In the later case, you may have to do the leg work with good old Boolean search first.  Also, ask your semantic search vendor if they use exclusions when they mine search engines.  If they do, twist their arm until they stop.  It’s an old Easter Ham.

Industry score on the Law of Permutation: C+


3-law of completeness
The Law of Completeness.    Widely taught methodologies, that have not been questioned in years (like the Easter Ham) are yielding approximately 65%.  If you get 65% on a math test, that is not a good grade.   The first example is not using the full available results from a search string query.  If a google search yields 380 results,  the Law of Completeness states that you must work with the entire set of results for maximum yield.

Completeness is not being reached. Why?  When trainers first started teaching how to use search engines (before google),  there were limitations in the technology.  Those limitations were:

(1) No high accuracy method to screen out page results that were NOT a resume.  Therefore search strings needed to be modified to exclude results that were not resumes.

(2) No method to extract all results from a search query.  Therefore search strings needed to be modified to reduce results to a manageable quantity

In both cases, the strategy worked, unfortunately there was a side effect:  Many good results were also thrown out.

Industry score on the Law of Completeness: D

Dropping the bomb on search string exclusions.

So where is the proof, where is the science?

First, I want to thank Cory Dickenson at Broadlook Technologies for leading the team of researchers on search string exclusion metrics.  Looking through tens of thousands of resumes, by hand, and then doing it two more times, is not a fun task.  The reality is that someone had to do it.  Hopefully when this study is reviewed both recruiters and technology vendors will have a better foundation in which to build upon.  I basically hate inefficiency.

Resume Exclusion Metrics (Broadlook project: FRET, Frikken Resume Exclusion Test)

The study was simple.  What was the effect of using exclusions on a resume search string?

The first thing we did for the study was to mine a bunch of social networks and sites that had advice on resume search strings.  We wanted examples, over the past 10 years, that experts were using.  From a few hundred examples, we made a list off all the popular resume search string exclusions that were being used (i.e. -job -job -you -your -submit).

Creating the resume data set

To set up the study, we created search strings for about job 50 positions.  The positions were a wide range: IT , biotechnology, health care, sales, business development, financial, etc.  Next for each search,  we made sure that the search string was specific enough so the results from the search engine was <1000. We did not use any exclusions.  Last step:  Hand verification of every single search engine result.  Each result was classified in one of 4 categories (1) Resume (2) Resume sample page  (3) resume book page (4) Junk: Not a resume.

At this point, we could bring automation into the equation.  Using Broadlook’s Eclipse tool, we automated each of the 50 searches with one of the exclusion terms.  We then repeated the each of the 50 searches with each of the exclusion terms.  Since we already hand-identified which search engine result pages were resumes, we were able to calculate, for each search-exclusion combination, how many REAL resumes were skipped by using each exclusion term. When the searching was done, we had average percentages, across many industries and titles.  We know, with high precision, what percentage of resumes you will lose by using an exclusion term.

Why did I do this study?  Too much time on my hands?..no.  I was interested in making the best open web resume search tool possible.  To accomplish that goal, the tool needs to work within the framework of the Laws of Internet Search.  Specifically the first 3:  Environment, Permutation, Completeness.  The end result was Broadlook Diver 3.0.  The resume search part of the tool *automatically* screens out pages that are not resumes.  In addition, since it is an automation tool, it allows the user to work with complete results from a search engine.   While you can only get Diver from Broadlook, the Resume Exclusion Metrics are free to all.  Enjoy.

The Axioms of Internet Resume Search

1.  Seek <1000 results per search.

You should conduct your search with enough specificity that the search engine reports that there are less than 1000 results.  If you are doing a search that yields many thousands, break up the search into a few separate searches

2.  Never use single-phrase exclusions

Otherwise you will miss a good percentage of resumes.  It is reasonable to use multi-word exclusions, as the level of ambiguity is low.

3.  Use multiple search engines.

There are varying reports of the cross over being as low as 20%.   (Happy to get comments from additional sources on this)

4.  Use automation to screen out non-resumes

Don’t do it by hand and don’t ignore the data below and use exclusions.  This is not 1998 anymore.  Let automation technology screen out Search Engine Result Pages (SERPS) that are not resumes. This includes sample resume pages, job pages, etc.

And now for the Exclusion metrics.

From pool of about 50 job descriptions,  100+ searches,  75,000 search engine results, 28200 resumes, hand verified.  The sort order is based on the worst offending term.  These exclusion terms were pulled from top experts answers on forums about resume search.  Remember the Easter Ham, it is not my intention to reduce the tremendous contribution of those people that freely answer questions (every day) about internet resume search.  It is my intention to give more data so that the entire industry has more facts in which to work with.

Exclusion % REAL Resumes Missed
-job 49.78%
-jobs 40.89%
-summary 37.33%
-intext:resumes 34.37%
-about 34.07%
-writing 32.74%
-your 29.19%
-you 27.41%
-example 25.78%
-required 25.19%
-require 23.70%
-free 23.26%
-list 19.11%
-“how to” 17.04%
-template 16.15%
-library 14.96%
-intitle:jobs 14.37%
-professor 13.48%
-intitle:job 13.19%
-inurl:aspx 12.74%
-send 12.44%
-write 11.56%
-inurl:php 11.41%
-requirement 10.22%
-apply 9.78%
-intitle:apply 9.78%
-sample 9.78%
-intitle:sample 9.48%
-“resume
service”
9.19%
-intitle:career 9.04%
-intitle:example 9.04%
-careers 8.89%
-submit 8.89%
-intitle:examples 8.59%
-intitle:write 8.59%
-intitle:how 8.44%
-intitle:submit 8.44%
-inurl:books 8.44%
-trainings 8.00%
-wizard 7.70%
-samples 7.41%
-inanchor:apply 6.67%
-opening 6.37%
-reply 6.22%
-wanted 6.07%
-applicant 4.89%
-inanchor:sample 4.59%
-inanchor:submit 4.00%
-eoe 3.70%

This resume research project yielded many other interesting facts, such as percentages of doc files vs. pdf, etc.  In the coming weeks, I will be publishing a white paper that breaks down the data in a bunch of categories… after I get back from DisneyWorld!

Finding the right place for semantic search

Semantic search is a fantastic technology, if used correctly.  I am not talking about users of semantic search technology, I am talking about the technology vendors that make it part of a system

I was inspired to write this blog after reading Glen Cathey’s (The Boolean Black Belt) Article on Why Do So Many ATS Vendors Offer Poor Search Capability.  The article made me think about search engines (google, yahoo, etc) and how semantic search is being used with them.

What is semantic search?  To put is simple: semantic search can take, as input, a word like “Java” and offers up other related terms like “J2EE” or “Beans” (both are related to Java).  This allows the user to type in a few terms but match many, many terms.

The matching terms are built into an “expert system” that is continually built over time.  Many fancy names are given to these systems, based on how they are built, but basically they are sets of rules.

Semantic search is not AI (artificial intelligence).  If you hear that, it probably started in a marketing department somewhere.

Companies that have built semantic search engines, while they have not created AI, have spent a tremendous amount of time and resources to build these sets of rules.  The better engines can build rules on the fly from a new set of data, like resumes.  This is very cool stuff.

Overall, I like semantic search.  It has great potential, however, it has great weaknesses if used incorrectly.   If built into the engine itself, semantic search can be very powerful,  this is because semantic processing is done at the search engine side, without any limitations or constraints.  However, if bolted onto a search engine, it can be more harmful than good.

Here is what I mean.  I’ll try to keep my logic simple.

1. The Google search engine has a limit in how many terms can be submitted to it.

2. Semantic search, by it’s nature, creates permutations upon given terms. For example:

“Senior VP of Sales”  can be “SVP Sales” or “Senior Vice President of Sales”

to translate that into a boolean expression you get

“senior vp of sales” OR “SVP sales” OR “senior vice president of sales”

3.  After creating permutations upon several concepts, you are out of search terms.

I’m a big believe in laws (maybe not speed-limit laws), but more the “laws of the universe” type stuff.  I like to understand and deconstruct the rules and see if each one stands alone, or, do I need to recheck my premises.  In this spirit, just before the first sourceCon conference, I developed the Seven Laws of Internet Research.  I felt there was too much emphasis on memorizing search strings and the latest search engines or sites, but not enough fundamental thought leadership on how to think about searching the Internet.

The first two laws are

1. The Law of Permutation
2. The Law of Completeness

The Law of Permutation simply states that when searching the Internet, as it is not a homogeneous source of data, you must describe what you are looking for in the language of the many vs. the language of the one.  (YES, this is what Semantic search is doing).

The Law of Completeness states you must strive for completeness of search engine results in order to have the superior outcome

Big Question:  What happens if semantic search is applied before you reach completeness of results?

Answer:  Missing data. Competitors eat your lunch.  If you are a sales person, it means missed sales leads, if you are a recruiter, it means missed resumes or passive candidates.

Does this mean that I am anti-semantic search?  No way.  I think it has great potential.

Here are my take-aways:

-Semantic search should be inside the search engine for optimal results

-Semantic search bolted onto a standard search engine is severely limited.

-Semantic search will cause data to be missed if applied before reaching completeness of possible results

-When combining a standard search engine and semantic search, it is best to apply the semantic processing AFTER completeness of data has been reached.  In reality, this would not be semantic search, but semantic filtering.

Secured By miniOrange