Thursday, May 11, 2006

Google plug ins have arrived

Six months ago, Dave Winer suggested a plug-in architecture for search engines. He wanted to mix results from specialized search engines like Sphere and memeorandum in with all his searches.


Well, Google apparently listened. It just rolled out a plug-in interface, called "Subscribed Links." A Web publisher can point Google at a special XML feed containing a series of "ResultSpecs." Each ResultSpec (example) is a user query string ("extraordinary rendition") plus the URL, title etc. to return when that text is entered ("Outsourcing torture," http://www.newyorker.com/fact/content/?050214fa_fact6, " ... had been sent to Syria on orders from the U.S. government, under a secretive program known as 'extraordinary rendition.' This program had been devised ... ").


This looks promising. At work, I'd really love to plug in results from WSJ.com, Factiva and Lexis Nexis into my Google results. Which is why I loved Dave's plug-in idea when he first proposed it.


But there are some serious limitations that give me pause about this architecture:



  • Only one result per plug-in per query, it appears. This is silly. If WSJ.com, for example, spits up three good hits, I want all of them, not just one. If WSJ.com starts spamming me with too many hits, I'll just unsubscribe, problem solved.


  • If you want a published document to be a result for more than one query term, the interface gets a lot less simple.


  • I'm not sure about this, but it appears as though you have to specify the exact query terms for each result, instead of just telling Google the various keywords associated with a particular result. So if a subscription newspaper, for example, tokenizes a typical news story, it would have to associate that story not just with the dozens of keywords contained in the news story, but also with the exponentially larger possible combinations of keywords. For example, one for "torture," one for "citizen," one for "torture citizen" and one for "citizen torture." The publisher may be able to get around this with a clever regular expression -- Perl regexes are supported -- but that's a little funky.



By the way, in my original Ocrober post I noted as an aside: "search has become social software and we just have not noticed it yet. PageRank is social software in a crude form." As it turns out, Google's plug-in architecture was rolled out as part of a social search system called Google Co-op.

About this site

Programming is not a particularly interesting topic for my family or friends. Writing about programming also happens to be a Web cliche. And I am not a professional software writer, so it's not like my musings on software are going to be particularly useful to a broad audience.


Still, sometimes it is nice to get thoughts written down, in order to stop thinking about them, and in order to record them for future reference. I write software for personal use in Perl and Ruby.


So I am confining my technical writing to this little ghetto. By the way, "hack" can describe a person or a thing. Hmmm.


(Thanks as always to Blogger for the publishing software and OCF for the hosting.)