The Semantic Web’s Tagging System


The Semantic Web’s Tagging System Already Exists – And It’s Not RDF or OWL

Notice: Contains programming language.   Non-techies (like me) please use appropriate caution.

Dr. Kathleen Dahlgren is the founder and Chief Technology Officer of Cognition Technologies, Inc.. Dr. Dahlgren has a Ph.D. in Linguistics and a post-doctorate in Computer Science from the University of California, Los Angeles.  Currently, she is also an adjunct professor of Linguistics at the University of California, Los Angeles.

Over the course of the several years, there has been significant discussion on how to move the Web towards the vision of Web 3.0 (the Semantic Web).  At the time, it appeared that the only course of action was to develop a common tagging language (RDF and OWL) and then attempt to impose these standards of content creators on the Web.  Granted, there weren’t many other alternatives to this approach at the time and it appeared that a “brute force” approach to the problem was the only way to get the process moving.  That was then – this is now:  RDF and OWL are insufficient to meet the needs of the Semantic Web (Web 3.0).  There are much better alternatives today, many of which have only recently made themselves known (e.g. Powerset, Hakia, Cognition Technologies, etc.), which employ new semantic technologies and capabilities which render the need for tagging obsolete.  Tagging is unnecessary for Natural Language Processing (NLP) systems, extremely labor intensive, it requires a broad consensus amongst users and content creators, and is unenforceable.  The way the Web becomes semantic is to employ the only foundational standard necessary — the English language.

First, let’s define the terms:  The commonly-known Semantic Web creates hierarchical relationships and descriptions of Web using a special markup language in XML.  The initial language used for this was RDF (Resource Description Framework,) which was later augmented with a higher level language OWL (Web Ontology Language).  An example of an RDF tag as applied to the description of a music CD:

<?xml version=”1.0″?>

<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:cd=”http://www.recshop.fake/cd#”>
<rdf:Description
rdf:about=”http://www.recshop.fake/cd/Empire Burlesque”>
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about=”http://www.recshop.fake/cd/Hide your heart”>
<cd:artist>Bonnie Tyler</cd:artist>
<cd:country>UK</cd:country>
<cd:company>CBS Records</cd:company>
<cd:price>9.90</cd:price>
<cd:year>1988</cd:year>
</rdf:Description>
</rdf:RDF>

As you can see, information about the page is marked up in a formal language, indicating the artist, country, company, price etc. of the CD.  Such mark-up is a form of document tagging.

The complication isn’t just in the fact that someone has to create these tags for all Web content.  A more significant issue is that to find documents by these tags, users have to know what the system of tags are. To the extent that all coders of RDF or OWL can coordinate to use the same system of tags, then the tagged pages could become accessible to a wide spectrum of Internet users.  A typical way of accessing tags is to offer users a menu of choices for values of the tags.  Unfortunately, this creates more work and inefficiency.  Also, it injects a certain amount of subjectivity into to the process because the content creator or manager is the arbiter of which tags will or will not be used.  As a result, if the Search process is left simply to using pre-defined and subjective RDF or OWL tags, then the searcher may miss what he or she is searching for.  Being able to Search, understand and utilize all of the text on a Web page naturally results in higher relevancy and a more satisfying Search experience.

So, rather than use a structured, subjective and incomplete dictionary of tags, how about we use an already established, agreed upon and universal “tagging system” – the English language itself.  Semantic NLP uses English as the means of communications – a system of symbols, if you will, that has already been agreed upon by English speakers.  No new system of symbols needs to be created, calibrated, propagated and used.

Cognition’s Semantic NLP™ offers free-text, Semantic search in the body of documents and Web pages.  The advantages of searching semantically in free-text are:

No labor to create tags as all of the content is indexed and used;

Any information in a document can be searched for;

Users don’t need to know the exact way a concept was expressed (Alternative ways of expressing concepts are recognized as meaning the same thing and relevant to the query.   For example, the pages in the RDF example above could be found with “cost” as well as “price”.);

Words are disambiguated within the context of how they are used, so “price” meaning “consequences of an action” would not be retrieved in response to a query about “price” meaning “monetary cost”.

For categorized (tagged) content already in place, Cognition’s Semantic NLP can be used in conjunction with tags to create a versatile free-text Search and a structured data Search.  This can be seen using the Advanced Search on Cognition’s SemanticMEDLINE™ Website (www.SemanticMEDLINE.com), where users can search by author, date and journal of abstracts along with free concept Search.

With Microsoft’s acquisition of Powerset, Yahoo!’s decision to open its platform to RDF (tagging) and micro-formats, and Hakia’s ontological (categorization) vision, the Semantic Web is becoming more of a reality.  Cognition’s Semantic NLP, which is built on a vast and complete Semantic Map of the English language, has done the majority of the work that RDF and OWL are intended to do: render the content on a Web page semantically retrievable.

Sphere: Related Content

4 Responses to “The Semantic Web’s Tagging System”

  1. Kingsley Idehen says:

    The issue isn’t about searching opaque web pages and then emitting nicer looking, or more coherent, opaque web pages. The issue is about exposing the structured data behind the nice looking opaque web pages, giving us the option to use our cognitive skills to view the data behind the pages differently i.e. mutliple views of the same thing.

    RDF and Linked Data [1] facilitate what I describe above.

    Cognition needs a little increment in the form of exposing it’s intelligent insights in strucutured data form using dereferencable URIs [2] for each of the entities that comprise it’s analysis graphs.

    Again, it’s not about smarter opaque web pages from search engines that apply semantic search inside (no matter how sophisticated the NLP may be). It’s about transparent access to the data entities in the smart output (web page).

    Links:

    1. http://en.wikipedia.org/wiki/Linked_Data
    2. http://en.wikipedia.org/wiki/Dereferenceable_URIs
    3. https://addons.mozilla.org/en-US/firefox/addon/8062
    4. http://ode.openlinksw.com/

  2. Andrew McKnight says:

    I agree that if you can extract semantics from natural language you avoid an incredible amount of tagging work and can do some amazing things, but despite that I disagree with much of what you said.

    “…[Tagging] requires a broad consensus amongst users and content creators…”. Do you mean that they need to agree upon which tags to use in the vocabulary? If so, this shouldn’t a problem if you offer a very large number of tags that are descriptive enough, though it may be difficult.

    “A typical way of accessing tags is to offer users a menu of choices for values of the tags. Unfortunately, this creates more work and inefficiency. Also, it injects a certain amount of subjectivity into to the process because the content creator or manager is the arbiter of which tags will or will not be used.” This isn’t necessarily true either. If tags are generated through non-subjective means with mass collaboration.

    “Words are disambiguated within the context of how they are used, so “price” meaning “consequences of an action” would not be retrieved in response to a query about “price” meaning “monetary cost”.” To get the context you require, your natural language tags would probably need to be at least a full sentence long, if not longer. Tags can have disambiguation associated with them.

    My co-workers have created Entity Describer, which uses Freebase topics as tags, and is an example of how these problems can be solved with semantic tags. Freebase can be edited by anyone and is very large with disambiguation built in.

  3. Hope Leman says:

    This is a fascinating discussion and one I am attempting to understand.

    Could somehow compare Cognition with:

    http://altsearchengines.com/2008/07/31/the-true-knowledge-answer-machine/

    in terms of the underlying technologies and performance?

    I very much enjoy Dr. Dahlgren’s disquisitions on this site. As I read the responses, it seems to me that they don’t really address Dr. Dahlgren’s point that tagging systems, however multifaceted and easy to use they may be, are simply too labor intensive and unstandardized to be of practicable or lasting value. Or I am wrong here, gentlemen?

    Anyway, you are all obviously brilliant people and whatever you all can do to improve search, more power to you.

  4. Hope Leman says:

    Oops–meant “someone.”

 

Leave a Reply

  Entries (RSS)  |  Comments (RSS) altsearchengines.com is proudly powered by WordPress  
© 2008 altsearchengines.com