Beyond the keyword search breaking point


By: Penny Herscher

A gauntlet was thrown down in the recent Techcrunch post, “Is Keyword Search About to Hit Its Breaking Point?” With a provocative up-and-to-the-right Web generations curve, the article claimed that the semantic web is the solution to the limitations of keyword search. And it’s coming in 2010 – if we’d just adopt the standards that would help computers extract meaning from the web.

I’m a believer in the general case, eventually. But, in many important domains, the solution is in production use today. Keyword search is a fantastic technology when a few words and popularity drive you to the right answer. It is a lousy solution when the answer involves an obscure concept or a relationship with common keyword characteristics. Unfortunately that is the case for most qualitative (text-based) business information. The semantic web is held up as a solution but, prior article aside, most in the field believe we’re a long way from this goal. Even is semantic tagging is enough, relying on people isn’t scalable or precise. Using machines is not yet reliable.

A real world solution
But there is a way to solve the problem. Decide on a domain and then solve within that domain.

This is being done effectively today for the investment and market research community today using what we call search-driven research. In the world of the professional investor or the executive, business decisions rely on fresh, conceptual information. Since it is both subtle and obscure, that information is extremely hard to find. Keyword search is not effective and a new paradigm is required. Truly valuable insights typically involve entities and their relationships - complex concepts that can’t be captured in keywords. Fortunately, they can be discovered once you’ve correctly identified and formalized the relevant concepts in each domain. The process demands detailed models which go way beyond keyword and tagging exercises.Investment and market research is a classic long tail problem; the high end of the Pareto curve contains the least empowering information. The search solution requires a shift from using prominence and popularity as a proxy for value to using business impact as a proxy instead.

For example – consider the portfolio manager who holds a pharmaceutical stock and wants to stay current on the market trends that affect his portfolio. Keyword search around, say, “drug discovery” or its synonyms produces no significant results beyond general education and company references. However, when conceptual knowledge about the domain is modeled, the search results can be high precision and value for the user. Actionable information discovery must happen at a much higher conceptual level.

Three step solution
The solution to model markets requires three technology families:

First - modeling the entities in a market. For business and investment decision makers the key entities – often colloquially called “topics” – include concepts like companies, market trends, management, brands or supply chain members – to name a few. The models of these entities drive detection based on a combination of keywords, grammar and natural language processing – and the web results are organized and tagged with the entities or business topics.

Second - modeling relationships between entities. In addition to modeling the entities, modeling the relationships between entities reveals the meaning. Relationships have many different types – for example status relationships such as competition between products or, in contrast, action relationships such as a merger between two companies or an executive moving from one company to another.

Third – detect events by identifying temporal patterns in the relationships between entities – detecting changes in how entities are arranged relative to each other in time.

Of course, all this wizardry needs to be combined with pragmatism to be useful in business – removing duplication, de-junking, factoring in source authority, normalizing and ranking results– to produce meaningful results from the unruly beast.

But the end result achieves the objective of the semantic web – to be able to use the web as a vast database of rapidly-changing, useful, decision making information. So rather than describe keyword search as reaching a breaking point, I’d describe it as useful for the problems it’s solving today, but not able to scale to the next class of business problems which can be solved using a domain-specific approach today.

Sphere: Related Content

One Response to “Beyond the keyword search breaking point”

  1. Darrell W. Gunter says:

    Dear Penny,
    I fully agree with your comments with one exception. The semantic search wave is upon us now. Here at Collexis we have a number of prestigious organizations utilizing our knowledge discovery “Fingerprinting” Technology. If you visit our website collexis.com you will see the list of customers such as the NIH, Johns Hopkins, the Mayo Clinic, Harvard Dana Farber, Univ. of San Francisco to name a few.

    Take a look at Mills Davis of Project 10X and what he writes about the semantic wave
    http://www.project10x.com/ . I am sure that you will find it compelling.

    If you ever would like to discuss these and other developments please call me +1.973.454.3475.

    Best regards,

    Darrell W. Gunter
    EVP/Chief Marketing Officer
    Collexis Holdings, Inc.
    CLXS on the OTCBB.com

 

Leave a Reply

  Entries (RSS)  |  Comments (RSS) altsearchengines.com is proudly powered by WordPress  
© 2008 altsearchengines.com