<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Note! Defining Semantics, NLP, LSI and AI</title>
	<atom:link href="http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/</link>
	<description>The most wonderful search engines you've never seen!</description>
	<pubDate>Tue, 02 Dec 2008 02:46:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
		<item>
		<title>By: Sam</title>
		<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71369</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Fri, 20 Jun 2008 01:50:52 +0000</pubDate>
		<guid isPermaLink="false">http://altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71369</guid>
		<description>Hi Kathleen,

As you know for your numbers to be compared to the Trec legal track it would be helpful and more accurate to run your test against the same data as it is the only way to do a proper apples to apples comparison.  

Even if the data you used is similar to the Trec data it is not a valid comparison, which is why Trec participants all use the same baseline data for the competition.  The numbers you site as far as I am concerned are not valid as they do not use the same data.

Your technology sounds very interesting but all the posts I have seen from your company seem to be simply look at me too, invoking the name of your "compeditors" to get hits rather than instructive posts about your technology and where it fits into the wider world.

The above post is an example, it really wasn't that helpful.

Cheers,

Sam</description>
		<content:encoded><![CDATA[<p>Hi Kathleen,</p>
<p>As you know for your numbers to be compared to the Trec legal track it would be helpful and more accurate to run your test against the same data as it is the only way to do a proper apples to apples comparison.  </p>
<p>Even if the data you used is similar to the Trec data it is not a valid comparison, which is why Trec participants all use the same baseline data for the competition.  The numbers you site as far as I am concerned are not valid as they do not use the same data.</p>
<p>Your technology sounds very interesting but all the posts I have seen from your company seem to be simply look at me too, invoking the name of your &#8220;compeditors&#8221; to get hits rather than instructive posts about your technology and where it fits into the wider world.</p>
<p>The above post is an example, it really wasn&#8217;t that helpful.</p>
<p>Cheers,</p>
<p>Sam</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kathleen Dahlgren</title>
		<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71317</link>
		<dc:creator>Kathleen Dahlgren</dc:creator>
		<pubDate>Thu, 19 Jun 2008 23:18:51 +0000</pubDate>
		<guid isPermaLink="false">http://altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71317</guid>
		<description>My blog post was intended to be a high-level (i.e. simple) description of various approaches to Search.  Readers will find greater detail in Cognition’s Technical Overview whitepaper, which can be downloaded from our Website at www.cognition.com.  

In response to the concerns about higher computation costs, statistical semantic approaches, such as LSI, do indeed experience exponential growth in computation resources because they have to compare all the co-occurrences in an entire document base.  On the other hand, Semantic NLP based upon linguistic approaches, does not have that problem.  CognitionSearch, for example, indexes documents about half as fast as a typical pattern-matcher technology, and the function is linear as it scales.  Its algorithms use bottom-up interpretation -- word-by-word and sentence-by-sentence.   

To answer the concerns about scientific measures, precision and recall are standard measures of Search engine performance. Precision is a measure of retrieval accuracy calculated by dividing the total number of relevant retrievals by the number of all retrievals generated by the Search.  Recall is a measure of the extent to which relevant material in the total document base is found.  It is calculated by dividing the number of relevant retrievals by the total number of potentially relevant retrievals in the document base.  

Pattern-matching technologies perform with both low precision and low recall (typically under 20% for both ).  The TREC (Text Retrieval Conference), sponsored by the  National Institute of Standards and Technology (NIST), is a recognized source of precision/recall testing for various technologies, including pattern-matching and statistical approaches.  In TREC’s legal track competition in 2007, there were 13 technologies participating.  Their precision performance ranged from under 1% to 23% and their recall performance ranged from under 1% to 22%. 

While Cognition did not participate in the TREC competition in 2007 (but is participating in 2008), it did conduct its own internal precision/recall tests on a wide variety of document bases (similar to the TREC data) and Websites.  These included the National Library of Medicine’s MEDLINE™, the public domain Enron fraud case, the public domain Microsoft anti-trust case, the BBC World News Website (http://news.bbc.co.uk/),  and the Global Issues Website (http://www.globalissues.com), among others.  For each test, 50 queries that were considered likely to be asked by users of the data/Website were formulated and posed to a CognitionSearch Search function on the sites documents.  Relevancy was judged for a sample of 20 or fewer retrievals and extrapolated.  Cognition’s precision exceeded 90%.  Recall was measured relatively.   In other words, full recall was taken to be the total of all relevant retrievals returned by any of the Search engines used in the particular test.  Cognition’s relative recall in these tests exceeded 90% relative recall.   

In summary, by employing Semantic NLP technology, Search results will achieve significantly better precision and recall than pattern-matching or statistical approaches.  

 1 [TREC Conference 2007]</description>
		<content:encoded><![CDATA[<p>My blog post was intended to be a high-level (i.e. simple) description of various approaches to Search.  Readers will find greater detail in Cognition’s Technical Overview whitepaper, which can be downloaded from our Website at <a href="http://www.cognition.com" rel="nofollow">http://www.cognition.com</a>.  </p>
<p>In response to the concerns about higher computation costs, statistical semantic approaches, such as LSI, do indeed experience exponential growth in computation resources because they have to compare all the co-occurrences in an entire document base.  On the other hand, Semantic NLP based upon linguistic approaches, does not have that problem.  CognitionSearch, for example, indexes documents about half as fast as a typical pattern-matcher technology, and the function is linear as it scales.  Its algorithms use bottom-up interpretation &#8212; word-by-word and sentence-by-sentence.   </p>
<p>To answer the concerns about scientific measures, precision and recall are standard measures of Search engine performance. Precision is a measure of retrieval accuracy calculated by dividing the total number of relevant retrievals by the number of all retrievals generated by the Search.  Recall is a measure of the extent to which relevant material in the total document base is found.  It is calculated by dividing the number of relevant retrievals by the total number of potentially relevant retrievals in the document base.  </p>
<p>Pattern-matching technologies perform with both low precision and low recall (typically under 20% for both ).  The TREC (Text Retrieval Conference), sponsored by the  National Institute of Standards and Technology (NIST), is a recognized source of precision/recall testing for various technologies, including pattern-matching and statistical approaches.  In TREC’s legal track competition in 2007, there were 13 technologies participating.  Their precision performance ranged from under 1% to 23% and their recall performance ranged from under 1% to 22%. </p>
<p>While Cognition did not participate in the TREC competition in 2007 (but is participating in 2008), it did conduct its own internal precision/recall tests on a wide variety of document bases (similar to the TREC data) and Websites.  These included the National Library of Medicine’s MEDLINE™, the public domain Enron fraud case, the public domain Microsoft anti-trust case, the BBC World News Website (http://news.bbc.co.uk/),  and the Global Issues Website (http://www.globalissues.com), among others.  For each test, 50 queries that were considered likely to be asked by users of the data/Website were formulated and posed to a CognitionSearch Search function on the sites documents.  Relevancy was judged for a sample of 20 or fewer retrievals and extrapolated.  Cognition’s precision exceeded 90%.  Recall was measured relatively.   In other words, full recall was taken to be the total of all relevant retrievals returned by any of the Search engines used in the particular test.  Cognition’s relative recall in these tests exceeded 90% relative recall.   </p>
<p>In summary, by employing Semantic NLP technology, Search results will achieve significantly better precision and recall than pattern-matching or statistical approaches.  </p>
<p> 1 [TREC Conference 2007]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Christian Hempelmann</title>
		<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71194</link>
		<dc:creator>Christian Hempelmann</dc:creator>
		<pubDate>Thu, 19 Jun 2008 16:11:58 +0000</pubDate>
		<guid isPermaLink="false">http://altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-71194</guid>
		<description>Dear Kathleen - 

Thanks for mentioning hakia here. I just wanted to clarify one point. Our semantic resources are actually transparent. You can interact with a limited dataset at http://labs.hakia.com/hakia-lab-onto.html. There is more information about the underlying theory, Ontological Semantics, and its application to Web search here: http://www.ontologicalsemantics.com/. 

Best, 
Dr. Kiki Hempelmann, Chief Scientific Officer, hakia</description>
		<content:encoded><![CDATA[<p>Dear Kathleen - </p>
<p>Thanks for mentioning hakia here. I just wanted to clarify one point. Our semantic resources are actually transparent. You can interact with a limited dataset at <a href="http://labs.hakia.com/hakia-lab-onto.html" rel="nofollow">http://labs.hakia.com/hakia-lab-onto.html</a>. There is more information about the underlying theory, Ontological Semantics, and its application to Web search here: <a href="http://www.ontologicalsemantics.com/" rel="nofollow">http://www.ontologicalsemantics.com/</a>. </p>
<p>Best,<br />
Dr. Kiki Hempelmann, Chief Scientific Officer, hakia</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: vicaya</title>
		<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-70954</link>
		<dc:creator>vicaya</dc:creator>
		<pubDate>Wed, 18 Jun 2008 22:49:37 +0000</pubDate>
		<guid isPermaLink="false">http://altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-70954</guid>
		<description>Geez, how about demonstrate some understanding of ROC and/or precision/recall next time you write about how would NLP/LSI/AI help? Using these anecdotal examples insult the intelligence of readers who actually care about alt search engines. Most studies show that NLP/LSI or whatever semantics stuff doesn't help *much* for the overall ROC for web search, with much higher computation cost.</description>
		<content:encoded><![CDATA[<p>Geez, how about demonstrate some understanding of ROC and/or precision/recall next time you write about how would NLP/LSI/AI help? Using these anecdotal examples insult the intelligence of readers who actually care about alt search engines. Most studies show that NLP/LSI or whatever semantics stuff doesn&#8217;t help *much* for the overall ROC for web search, with much higher computation cost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Charles Knight</title>
		<link>http://www.altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-70922</link>
		<dc:creator>Charles Knight</dc:creator>
		<pubDate>Wed, 18 Jun 2008 19:42:40 +0000</pubDate>
		<guid isPermaLink="false">http://altsearchengines.com/2008/06/18/note-defining-semantics-nlp-lsi-and-ai/#comment-70922</guid>
		<description>Paul writes:

The example query, “strike up an intimate friendship”, goes too far, IMHO. Even if the search engine understands the meanings of the words, the phrase is still too vague to be useful. The functionality implied here seems to skip a level that would be much more useful. If you could make a query like, "forums that do not charge a fee on which I can meet smart people who like to talk about personal relationships", return relevant results, that would be better.</description>
		<content:encoded><![CDATA[<p>Paul writes:</p>
<p>The example query, “strike up an intimate friendship”, goes too far, IMHO. Even if the search engine understands the meanings of the words, the phrase is still too vague to be useful. The functionality implied here seems to skip a level that would be much more useful. If you could make a query like, &#8220;forums that do not charge a fee on which I can meet smart people who like to talk about personal relationships&#8221;, return relevant results, that would be better.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
