Guest Author: Crawling and Indexing the Web
Today we are very fortunate to have another blogger’s perspective on the post that we did on alternative search engines, asking which of the ”Alts” crawled the web and why: “To crawl or not to crawl? The Alts speak out!”
Александра (Alexandra) with SE la vie (get it?) sent this to us from beautiful Belarus!
She translated the quotes of different alternative search engines discussing the problem of crawling and indexing the web from my English post to Russian for her blog, and then reversed the process for this post!
Why is this problem acute? Probably mostly because “alternative search engine” is not a shaped definition yet (not speaking of just “search engines”). There are a lot of opinions. All of them are kind of right. And we all know this one: a search engine is a service which absolutely should have its own index and crawler. I already spoke my mind in my previous article “Alternatives for newbies:” “I will not set up any conditions like having its own index, algorithm or a crawler.”
And I still hold to that point. With all the variety of alternative search engines it’s not hard to find ones which initially, by their own idea, do not need an index (not speaking of meta-search engines).
But at the same time saying that a search engine is an index and a crawler – is quite correct. This is a principle of the majors to crawl and index everything on its way J But if there’s a vertical real estate search engine which should not crawl anything which is not related to real estate, then why should it care about crawling, say, a gadget blog? Just to be among those 11 alts which really crawl? That is why I would not be so fast to say that this or that search engine does not have a crawler so it is not a search engine at all. May be it does. You just don’t know about it. And may be you are not supposed to. ![]()
Another point – many Alts actually are building their own index. How they do it – that is anther question. A blog search engine will quite naturally crawl feeds. A meta search engine will crawl databases, not web-sites. And we all know that many Alts use open source crawlers which don’t embarrass them at all.
Services which use a licensed index and not a proprietary crawler – are they search engines at all? I would say this is a middle stage – not a search engine by it’s “guts” but for a user – it is a search engine.
I do not suppose that services not having their own index are defective. This may sound a bit strange, but maybe the obstacle is in the name. Search engines are the ones which search, naturally. So I can understand the desire to define them by index or crawler.
I think we can also speak of find (discovery) engines – the middle stage. The ones that do not have their indexes but finds results in a ready database (which was built by an actual search engine) reworking them and showing the results their own way according to their algorithm.
Surely, most Alts want to build their own index (Quintura, for example, is already making and testing it). But in some cases a find engine is an all-sufficient service and feels good about it. A fresh example is MSN’s new tafiti search engine.
Not that the fact of finding another name for such systems itself will do a lot of good for humanity. But it will make it easier for me to analyze start-ups for example. Besides, it might help to reduce negativism towards AltSEs when saying they are not search engines. Maybe they aren’t, but at least they’re find engines J. And maybe this will be a good incentive for the Alts.










