Every Picture Tells a Story

D.W. McClary, Ph.D.
President/C.E.O.
img surf, LLC

While all vertical searches present unique challenges, “image-based search” differs drastically from other specialized searches. In some senses, image-based search departs radically from the traditional notion of internet search. This article aims to provide a brief overview of image-based search as well as illustrate some of the many challenges inherent in this paradigm-bending vertical.

Before we can talk about image-based search, we need to get our terms straight. Specifically, we need to differentiate between image-based search and more traditional image search. Image search is becoming quite common: established companies (Google) and startups (Pixy) provide services that allow users to search for images. This leads us to the distinction: image searches use text to provide results which are images, while image-based searches use images to provide results. At some level, image-based methods must search inside an image in order to provide results.

But, how different is that? Suppose we summarize the purpose of search in a single statement:

Given input x, return results y that describe or are related to x.

In general search, x is a string of text and y is a set of pages or links. Many verticals focus on y: increasing the relevance of results, returning multimedia, ranking results according to social data, etc. Image-based search operates on x. The pixel data that constitutes the image is used to generate relevant results.

That’s a pretty big change to the input set. In fact, it runs up against what is currently an utterly insurmountable problem. There is no algorithm, no person, nothing, that can correctly identify the contents of any and all images. Such a task would push us well into Kurzweil’s “Age of Spiritual Machines.” Admitting that such an age is a long time in the future, current attempts at image-based search must pick and choose their domains.

One way to choose a domain for image-based search is to limit the type of noun that’s searched: people, places, or things. Sites such as Like.com focus on products, while others, like Mugr.com, focus on human faces. Since my experience lies mainly in image-based search for faces, the examples here are taken from face-based searching. However, while solution strategies may vary, most of the same issues exist in all image-based searches. We might classify these challenges any number of ways, but let’s break them into two simple activities: finding the information to be searched, and searching it.

A Face in the Crowd

Because we’re talking about image-based search, we can assume at some point we’re going to have to deal with submitted image data. This could be a file submitted by a user, a hyperlink to an image that is already online, or an image gathered via web services from another site. No matter how get the image, we have to conduct a search before we can conduct an image-based search. Specifically, we have to find the region in the image that is the search subject.

Consider photos taken at the last wedding you attended. That photo could contain many people and many objects. While it’s easy for you and me to say, “Oh, this is my Great Uncle Bob,” an image-based search provider is only provided with raw pixel data. Before any computer can identify that Bob’s your uncle, it must determine that there’s a portion of the pixel data that could be a human face. As an example, let’s look at some of the pixel data for an image.

  • Generally speaking, faces have a similar shape. This shape is unlike that of a box, the Eiffel Tower, or an aeroplane.
  • Typically, human skin has a particular texture, which can be detected in images. This texture is unlike that of a brick wall, the bark of a tree, or a wool sweater.
  • The Blue Man Group aside, faces tend to have a distinct color. People are rarely green and purple.

Thinking about it that way, we can really narrow the parts of photos that might be our search subject. But what about all the things that meet that criteria but aren’t what we’re looking for? Suppose we’re looking for human faces: do potatoes have similar shape, colour, and texture? We’re still left with heuristics. No matter how good these heuristics are, we can never say with absolute certainty that we’ve found we’re looking for. In fact, let’s return to the example data from before.


OMG indeed, there wasn’t a face there at all. But was there a face-like object? Consider what happens when we run this image through face detection software from PittPatt (http://www.pittpatt.com) a recognition and detection company spun-off from Carnegie Mellon University).




The software managed to find the “face,” or at least the object most like a face. In fact, it does a very good job of this. But it still presents an interesting problem: before we can even begin to search an image, we must trust the user. If we’re searching for faces, we must trust that the user will submit a picture containing at least one face. If I’m searching products, I have to trust the product is clearly the subject of the photo.

While it’s a very big assumption to make, let’s suppose that we’re given an image that contains at least one “searchable” subject (e.g. a face), and that we can isolate and extract the section of the image that contains this subject. Once all that’s done, we can finally get down to the business of recognizing the subject.

Well, Who Are You? (I Really Wanna Know)

Once we’ve found a face with which to search, we’re presented with the most obvious challenge in image-based search. Given this blob of pixel data, which bears no labels, determine what labels should be attached to it. In face recognition, this equates to determining which of your previously identified subjects the face belongs to. In any image-based search, this is a question of mapping the pixels that make up the image to the set of subjects that are already known.

Fortunately, there are many, many ways to do this. Techniques from machine learning, machine vision, data mining, and statistics have all been used to successfully perform face recognition with reasonable accuracy. The more challenging question comes in which techniques to leverage. Each of the well-explored directions in face recognition has its own particular pitfalls. Some techniques are sensitive to changes in lighting, others perform poorly when faced with many subjects.

Choosing one or more techniques with which to perform the recognition aspect of the search requires a deep investigation. While many techniques perform well, they have often not been tested against the sort of pictures people actually take. Correctly identifying most subjects when given a series of mugshots does not necessarily translate to correctly identifying your significant other in photos from a Hawaiian vacation.

Once we’ve identified a means by which to perform recognition, we need only to transform the pixel data into a suitable input, and map the output to the people (or products, places, etc.) we’re trying to search. If we tie all of this together, we’ve got an image-based search: take submitted photo, find the regions to be searched, use some recognition methods to map the relevant pixels to labeled data, return matching results.

Now all we have to do is build a system that can perform these tasks.

Putting It All Together

We’ve outlined how an image-based search works, but we still haven’t said anything about actually building something that’s brings this concept to the web. If we look at the steps necessary to perform the search, it’s obvious that there’s a lot of image processing going on. There may also be a tremendous amount of mathematics involved in the recognition method. The whole process is computationally expensive and undoubtedly time consuming.

These issues raise a number of questions for anyone in the image-based search sphere. How long are users willing to sit an wait for processing to take place before they receive search results? Does processing need to take place asynchronously? Can image processing coexist with databases and web servers, or does it require separate hardware? What parts of the process can be done in a distributed manner?

However, creating a compelling and useful service is far more important than all of these design challenges. Even if we can perform an image-based search, not everyone is going to want to identify strangers using photographs. Are people going to want to use photographs to find products for purchase (e.g. Like.com)? Is automatic tagging and a search index for personal photographs compelling (e.g. Mugr.com)? As it stands, image-based search is perhaps too new to have found it’s most compelling direction. Yet, it’s safe to say such a radical departure from the traditional search problem is bound to produce exciting and truly killer applications in the future.

Sphere: Related Content

 

Leave a Reply

  Entries (RSS)  |  Comments (RSS) altsearchengines.com is proudly powered by WordPress  
© 2008 altsearchengines.com