Anatomy of Social Search According to Aardvark
In this post I’ll continue the exploration of social search and the concept of relevance in the social media context. Yesterday we heard the news from various sources, e.g. TechCrunch, that Google will acquire a social search start-up Aardvark. I will go through the main ideas behind Aardvark as presented in a forthcoming conference paper by the Aardvark team. The paper, Anatomy of a Large-Scale Social Search Engine, has been made available as a preview version on the Aardvark blog. The paper has been ambitiously named after the classic paper by Brin and Page The anatomy of a large-scale hypertextual Web search engine. The paper presents a social search engine, which according to the authors Damon Horowitz and Sepandar D. Kamvar, changes the whole paradigm from the old “Library paradigm” of search to “Village paradigm” where information is passed from person to person rather than from documents to persons.
The basic idea behind the social search engine is simple question-answer routing. User places a question to Aardvark which locates the best possible person(s) to answer the question. The question is placed to the identified persons one by one until a satisfactory answer is received by the user that placed the question.
- Indexing: First, when a user signs-up to Aardvark, the crawler crawls the users extended social network, which could be formed from the users existing networks in e.g. Facebook, Twitter and LinkedIn. Aardvark extracts topics from the crawled data and gives the topics weighting according to how much expertise the user has in the topics. Topics are stored in an inverted index that lists each topic and a scored list of users for each topic. Additionally various quality features, like response time and answer quality for each user, are stored in the index when available.
- Queries: User asks a question from Aardvark using one of the several possible user interfaces (web UI, Twitter, IM, SMS, etc.). The question then gets processed to identify its main topic. The user is given an opportunity to review and edit the topic.
- Ranking and Relevance: Once the topic of the question is identified, Aardvark checks the index for the user that is most likely to give a satisfactory answer in the question. This is done by calculating a relevance score (Topic Expertise): probability of a user to provide an answer on the topic of the question and a quality score (Connectedness): the probability that the user provides a satisfactory answer regardless of the question. Ranking of the potential users to provide an answer to the question is calculated by taking into account the relevance score, the quality score and availability of the users.
- Question Routing and Answering: Once the best potential answerers have been identified, Aardvark contacts one or more of them and places the question until a satisfactory answer is given. The answerer and the asker are given an opportunity to continue the discussion after the answer is given.
There are number of challenges that Aardvark faces. First, possibility to find a satisfactory answer to a question depends on the size of the user’s social network that Aardvark is possible to index. For someone with a limited number of connections, and especially connections that are not very active in the social media themselves, finding a satisfactory answer for any topic can be hard. Maybe the forthcoming acquisition by Google can bring some form of a solution to this problem given the large amount of user data Google has gathered. Second, the possibility of a satisfactory answer to a question is dependent on goodwill of the potential answerers. There needs to be some form of a benefit for the answerer in providing a good answer. For some good online reputation is enough but for others something more concrete is needed.
On a paper Aardvark seems like a very interesting concept and the statistics seem to show that they are constantly increasing their user base (see the article, p.8). One of the questions that immediately came to my mind is whether the same concept could be successfully utilized in enterprise search as well.
Do you see that Google has the potential to transform Aardvark into a success? Or will Aardvark development be discontinued and merged into Google’s other products as happened to microblogging service Jaiku.
Socially Relevant Search Results
I’ve been thinking quite a lot lately about how search engines could use information about users’ social networks to enhance the relevance of search results. As people’s social networks keep growing and the social media sites keep getting filled with information, search engines need to develop more intelligent ways to leverage the growing asset. As Michael Arrington wrote in his recent blog post:
“[...] the amount of spam and just general nonsense that is flooding all of these services is crippling. As a user, I spend far too much time weeding it all out to find the few gems of real content from people I care about.
And I end up missing a lot of important content that I want to know about.”
Arrington predicts that someone will eventually sort out the mess. But who? My guess would be Search.
As we have already seen during last year, many web search companies, like Google and Yahoo, have rolled out implementations of real-time or social search engines. The idea behind most of these is to bring near real-time results from services like Twitter to the search results. I think this is a good start. Google has recently taken the idea bit further by adding results from your social network to its search results when you search using their social search (see this post for more details). But why should search engines know about your social networks?
There are number of reasons why your social network matters to relevance:
- Location: Many people live in many different places during their lives and people are likely to have connections from many of these places in their social network. If I for example search for good bookstores, I would want to see results from my current hometown, Helsinki, Finland. But I might also be interested in bookstores in London, since I used to live there. Now taking my social networks into consideration search engines could infer that since I have connections in London I’m also interested in that location.
- Interests: Many of the connections in my social network share some interests with me. Let’s assume that most of my friends are interested in basketball but the search engine doesn’t know if I like basketball or not. It is quite likely that I ‘m also interested in basketball. In fact if I search for sports articles, I assume to get results about basketball. People tend to share interests with their friends.
- Contacts: Searching for people from the internet can be a pain. If I for example want to hire a building contractor, I probably try searching for one with a web search engine. But how do I know if the people found with search are good and trustworthy? If on the other hand one of my contacts knows a contractor, it would be great if search would return that contractor at the top of the result list. At least I could then ask for my friend to give his opinion about the contractor.
I could probably think more examples of how social networks can enhance the relevance of search results, but I think you get the point. It will be interesting to see how these social search engines develop and how they are adopted by users.
What do you think?

View Comments