Here is an experiment to help develop taxonomies of keywords.
First engine is http://keywords.broadlook.com. The focus of this engine is the keywords and metatags within websites and build a list of (1) keyword extensions of your input. Example: if you type in the word “research”, the engine will return the top results in 1000’s of phrases that start with “research”, like “research and development”, “research papers” and “research triangle”. In addition, engine 1 returns a list of keywords in closest proximity to your input term.
Second engine is http://keywords2.broadlook.com. This engine does the same as engine #1, except it works with the BODY of html pages vs. keywords and meta tags.
Building a ultra-fast proximity engine with entity recognition could yield some interesting results.
Examples:
-Who are the top 10 people on the web that are mentioned in closest proximity to Bill Gates or Barack Obama? How does that compare to the month previous?
-What are the top 10 companies mentioned near Broadlook or Salesforce.com?
-Who are the top 10 people mentioned in conjuntion with an event, company or date?
The combinations are endless. What applications can be developed from this type of information? I can think of many in the research and analytics space. The 2 main components are the entity recognition and proximity indexing. For questions about the engines email to donato dot diorio at gmail com with subject “keyword engine”
Please keep in mind that this is pure research and we are not even sure ourselves of the uses of the core technology.
Enjoy!