Great Debate: Peer-to-Peer (P2P) Search Part I

Every Tuesday night on AltSearchEngines, we invite two Alternative Search Engines to discuss a common topic for our benefit - and yours.  The topic tonight, Peer-to-Peer or P2P Search, is more complex than most other categories, and we are fortunate indeed to have two experts to explain this important area to us: Wolf Garbe of FAROO, and Jeremie Miller of Wikia.

ATLASFAROO

1) Architecture: How is your search engine different from today’s general search engines? Briefly, what does the architecture of your search engine looks like?

FAROO: FAROO is a web search engine based on peer-to-peer technology.

The users are connecting their computers, building a worldwide, distributed P2P web search engine. No centralized index and crawler are required anymore. Every web page visited is automatically included in the distributed index of the search engine.  Installing our software, you become immediately part of the distributed search engine.  FAROO’s distributed core architecture is fundamentally different from the centralized approach of today’s search engines.

ATLAS: Atlas breaks search into three distinct groupings, crawling (the Factory), index/ranking (the Collector), and query handling (the Broker). The primary difference is that there can be multiples of any of these groups all working together or competing.

2) Distribution/P2P: In what aspect is the architecture distributed?  What are the benefits of this?

FAROO:  FAROO is using a fully distributed architecture: distributed index, distributed crawler, distributed ranking, and distributed search.

Search, as the most frequently used Internet application, will be distributed, and thus follows a principle, which the whole Internet is based upon successfully. The distributed architecture provides cost advantages, better scaling, less intrusive crawling, democratic ranking and improved privacy protection.

* Each of the major search engines requires hundreds of thousands servers. We don’t need any hardware at all. This means huge saving of infrastructure costs, allowing us to share revenues with our users.
  * The Internet is increasing steadily, and so also is the amount of required hardware in order to index all these new web pages and to serve the new users. In FAROO’s distributed architecture the users become part of the solution of this problem. Therefore FAROO scales with the growth of the Internet.
  * FAROO indexes web pages without a dedicated crawler, therefore additional traffic for users and web servers is avoided.

ATLAS:  Each entity within Atlas, whether it be the Factory, Collector, or Broker, can be entirely distinct and independent, and will likely be different companies or groups altogether. There is nothing more distributed or beneficial on the Internet than many independent services connecting through an open protocol.

3) Crawler: How does your distributed crawler work?

FAROO:   We changed the way a crawler works. There is no traditional crawler at all.  Every web page visited by one of our users is automatically included into our distributed index, and instantly searchable for all other users.

ATLAS: Atlas supports any number of crawling projects, either web-wide or just specific segments. The Factory that exposes any crawled information must also add value by understanding the content it is crawling and providing contextual hints and meta-data to the Collector for indexing. A Collector should therefore have many relationships with Factories, and choose the ones doing the best job crawling and adding the most value.

4) Ranking: How does your ranking algorithm work?

FAROO:  FAROO is using an attention based ranking. If users spend a long time on a page, visit it often, put it to bookmarks or print it out, this page goes up in ranking.  For the first time the ranking of the web pages is automatically done by the target audience itself. This leads to a more democratic, user centric ranking, while resistant against rank manipulation.  Additional ranking parameters ensure a proper ranking also during the start with relatively few users and for freshly indexed pages.

ATLAS: Since Atlas is just a protocol that any Collector can support, everyone must compete to provide the best rankings for the data they are indexing. There isn’t any single best algorithm, but instead there will likely be many thousands, all working on either different types of content, media, or locality. An Atlas Broker must select the best Collectors (and it can get rankings from multiple ones and merge them) for the users and queries they represent.

5) Do you use the “wisdom of the crowds”? If so, how?

FAROO:  When it comes to understanding, valuating, and rating of content, the human mind is still unsurpassed. Therefore FAROO uses “wisdom of the crowds” in two ways, for ranking and for crawling.  An algorithm may distinguish between original content, trivial content and spam. But when it comes to more subtle distinctions, we are better off trusting our own species!  And, it’s no surprise that even the well known PageRank uses indirect human judgment, as it is based on the popularity of a page among webmasters.

FAROO’s user generated ranking goes a step further, as it is based on the popularity of a page amongst all users. And this is done automatically.  In this way many more people get involved then with current ranking methods, where either only webmasters are entitled to vote or a manual voting is required.

FAROO also uses user powered crawling. Pages which are changing often like, for eaxample, news, are visited frequently by users. And with FAROO they are therefore also re-indexed more often. So the FAROO users implicitly control the distributed crawler in a way that frequently changing pages are kept fresh in the distributed index, while preventing unnecessary traffic on rather static pages.

ATLAS:  Within Atlas there will likely be some general feedback mechanisms, particularly in handling spam. Every entity with Atlas is encouraged to become more intelligent by involving end users in some way, but there are no specific mandated features to do so.

If you have any questions about Peer-to-Peer (P2P) Search so far, please leave a comment now.

Next Tuesday night we will bring you Part II - the conclusion of this interesting debate!

Sphere: Related Content

10 Responses to “Great Debate: Peer-to-Peer (P2P) Search Part I”

  1. Greg Lindahl says:

    What search engines are the ones doing distributed crawling with the “Java” and “-” user agents?

  2. wolf says:

    > What search engines are the ones doing distributed crawling with
    > the “Java” and “-” user agents?

    How do you know that these are distributed crawlers?
    I would guess these are just some of the traditional java crawlers (http://java-source.net/open-source/crawlers), publicly available and used by many different parties.
    In this respect they are of course widely “distributed”.

    Of course it is a bad behavior, not to show a proper and informative user agent.
    The distributed crawlers I’m aware of are all showing user agents:

    Majestic: MJ12bot http://www.majestic12.co.uk/projects/dsearch/mj12bot.php (Crawler written in c#)

    Grub: grub-client http://www.grub.org/html/help.php?op=robots-faq (Crawler written in c++)

    FAROO: Does not use a traditional crawler. If a user visits a web page with his browser, FAROO re-uses the information, which the browser already requested. There is no request conducted or initiated by FAROO itself. (”Crawler” written in c#)

    Yacy: yacybot http://yacy.net/yacy/bot.html (Crawler written in Java)

  3. Alt Search Engines » Blog Archive » View from the Corner Office, FAROO says:

    [...] Wednesday on AltSearchEngines, we visit the CEO of one of our Alternative search engines.  Since last night’s debate featured FAROO (and Atlas), we headed over there for a chat with CEO Malgorzata [...]

  4. FAROO Blog » Blog Archive » Great Debate: Peer-to-Peer (P2P) Search Part I says:

    [...] In an interesting post on the AltSearchEngines blog FAROO and Wikia are discussing Peer-to-Peer Web Search and providing some insights to their architectures. You can read part I of the debate here. [...]

  5. Great Debate: Peer-to-Peer (P2P) Search Part I | TorrentPimp says:

    [...] two experts to explain this important area to us: Wolf Garbe of FAROO, and Jeremie Miller of Wikia.read more | digg story digg_url = [...]

  6. Alt Search Engines » Blog Archive » Peer-to-Peer (P2P) search debate, Part II says:

    [...] part two of the debate that began last week on Peer-to-Peer (P2P) search.  Here is the link to part one. 6) Participation: How do your users participate (By way of contribution and [...]

  7. Usersky Daily News Network » Weekly Wrapup, 8-12 October 2007 says:

    [...] next year. Alt Search Engines AltSearchEngines this week had a Peer-to-Peer (P2P) Search debate - Part one and Part two. It featured Wolf Garbe of FAROO and Jeremie Miller of Wikia/Atlas. One of the [...]

  8. iAdvert.mobi » Weekly Wrapup, 8-12 October 2007 says:

    [...] this week had a Peer-to-Peer (P2P) Search debate - Part one and Part two. It featured Wolf Garbe of FAROO and Jeremie Miller of Wikia/Atlas. One of the [...]

  9. antischokke » P2P-Websuche mit FAROO says:

    [...] oder Blogbeiträge über Faroo veröffentlicht worden, z.B. bei AltSearchEngines (Teil 1 und Teil 2 sowie ein “Private Interview with Faroo), bei Mashable oder Startup [...]

  10. MSNBC Debate says:

    Tonights debate wasnt much better, alot of mudsinging and what-if questions.

 

Leave a Reply

  Entries (RSS)  |  Comments (RSS) altsearchengines.com is proudly powered by WordPress  
© 2008 altsearchengines.com