Finding ISBNs

All Consuming has the unenviable task of finding ISBNs in weblogs' links. It spiders the weblogs.com listlooks for different URL schemes, like Amazon's (http://www.amazon.com/exec/obidos/asin/ISBN/), removing the and producing the list of top .

Originally it only looked for Amazon URLs, but pretty recently Erik made it also look for simply isbn=ISBN. That adds a few more sites, including links directly to an All Consuming product page. However, there are other sites like isbn.nu and Powell's that still aren't included.

All Consuming doesn't simply look for ISBN-shaped numbers because that will catch a lot of extra things, like item or comment IDs. Still, I'm curious how many false positives there would be, and whether they can be filtered out according to assigned ISBN rules, or what Amazon carries. So I think I might write a little Python script or something that would look for all ISBN-shaped numbers in links, and figure if they're actual ISBNs or not using Amazon's web service API.