Thursday, May 11, 2006

Google plug ins have arrived

Six months ago, Dave Winer suggested a plug-in architecture for search engines. He wanted to mix results from specialized search engines like Sphere and memeorandum in with all his searches.


Well, Google apparently listened. It just rolled out a plug-in interface, called "Subscribed Links." A Web publisher can point Google at a special XML feed containing a series of "ResultSpecs." Each ResultSpec (example) is a user query string ("extraordinary rendition") plus the URL, title etc. to return when that text is entered ("Outsourcing torture," http://www.newyorker.com/fact/content/?050214fa_fact6, " ... had been sent to Syria on orders from the U.S. government, under a secretive program known as 'extraordinary rendition.' This program had been devised ... ").


This looks promising. At work, I'd really love to plug in results from WSJ.com, Factiva and Lexis Nexis into my Google results. Which is why I loved Dave's plug-in idea when he first proposed it.


But there are some serious limitations that give me pause about this architecture:



  • Only one result per plug-in per query, it appears. This is silly. If WSJ.com, for example, spits up three good hits, I want all of them, not just one. If WSJ.com starts spamming me with too many hits, I'll just unsubscribe, problem solved.


  • If you want a published document to be a result for more than one query term, the interface gets a lot less simple.


  • I'm not sure about this, but it appears as though you have to specify the exact query terms for each result, instead of just telling Google the various keywords associated with a particular result. So if a subscription newspaper, for example, tokenizes a typical news story, it would have to associate that story not just with the dozens of keywords contained in the news story, but also with the exponentially larger possible combinations of keywords. For example, one for "torture," one for "citizen," one for "torture citizen" and one for "citizen torture." The publisher may be able to get around this with a clever regular expression -- Perl regexes are supported -- but that's a little funky.



By the way, in my original Ocrober post I noted as an aside: "search has become social software and we just have not noticed it yet. PageRank is social software in a crude form." As it turns out, Google's plug-in architecture was rolled out as part of a social search system called Google Co-op.

No comments: