Wednesday, February 6, 2008

Overtagging is not a virtue

Recently I used Flickr to search for some beautiful photo's. I used different tags and indeed I found amazing pictures. What I also notices was that the photographs Flickr returned didn't always correspond with the tag I used to find them (for instance, use the tag "search").

This notion got me wondering. If you tag an article, photo, blog, et cetera, there are always tags that are in the bulls-eye and there are tags that are in the outer ring. The underlying principle: the more tags you add, the more likely the chance of finding the tagged item.

This is true, but in information retrieval there's always a trade off between Precision and Recall. What you want is high on both (get exactly what you want, and a lot of it), but that's difficult to achieve. As a matter of fact: the more outer ring tags there are, the more noise you get. If every user gives an abundance of tags, the noise gets bigger. Tom Gruber used two pictures in a presentation, that explains this quite beautifully.

"Noisy" Tagging

"Clear" Tagging

Folksonomies thrive on the abundance of tagging, but can there be a thing as "overtagging"? Is there a zero-sum game in tagging that leads to a higher recall, but lower precision?

Conceptual Search engines like Collexis give you the opportunity to score the tag for relevance, thus letting the user sit behind the driver seat for the weighing factors. I'm not familiar with the algorithm used by Flickr, but whether or not it weighs the tags for relevance, I do think that overtagging is not a virtue. If each user tags its items "as spot on as possible", the total tagosphere would prosper from it.

Does this mean that Flickr should build a Taxonomy of Tags? No, it doesn't (that's old paradigm thinking), it's just that to much of something is never a good thing. What it does mean is that their should be a governance structure to the tagosphere that lets it grow as emergent as possible, but not out of bounds.

4 comments:

Israel said...

Are you aware of ?

I think that bridging the gap between Taxonomy and Folksonomy is one of the main challenges facing the Semantic Web. It seems like
Reuters' new API is aiming there.

Israel said...

oops, it turned out really messy. let me rephrase:

Are you aware of Flickr's Machine Tags: http://www.flickr.com/groups/api/discuss/72157594497877875/

I think that bridging the gap between Taxonomy and Folksonomy is one of the main challenges facing the Semantic Web. It seems like Reuters' new API is aiming there: http://www.readwriteweb.com/archives/reuters_calais.php

Vincent said...

Indeed,

I recently saw a good presentation on this issue by Tom Gruber: http://tomgruber.org/writing/social-meets-semantic-web.htm

I see many "or/or discussions", but it's an and/and one. In my opinion, it's all about finding the correct balance and using an information retrieval strategy wisely.

Of you switch the domain from the internet to the enteprise, the need for this strategy becomes even more apparant (http://ynno.blogspot.com/2008/01/enterprise-2.html)

Israel said...

Thank you Vincent for the useful and enlightening links. Just got an invitation to this Forrester teleconference: http://www.forrester.com/Teleconference/Overview/1,5158,2181,00.htmlhttp://www.forrester.com/Teleconference/Overview/1,5158,2181,00.html

Will you attend it by any chance?