In my last post, I discussed a database overhaul that significantly improved the access times of the Technorati link tracking service. This has resulted in faster load times, and faster web spider indexing times, which means fresher information on the site.
But that's not all that I did in my recent weekend reengineering at Technorati. I also:
- Added in <guid> fields to the RSS feeds. These special RSS 2.0 tags allow you to identify each RSS entry with a unique identifier, and are perfect for the RSS feeds that Technorati produces. I added them so that RSS Aggregators can identify posts and links uniquely. When you get an RSS watchlist, it is filled with up-to-the-second information on who is linking to you. It also includes text in the feed noting when the link was created, and when the blog was created. This means that every time you check your RSS feed, the text inside each item in the feed changes. The <guid> field allows aggregators to keep track of these posts, and mark them as read, for example. If you've got a Link Cosmos as big as Dave's or Doc's, that helps seriously cut through the clutter.
- Fixed the blog indexing engine so that a blog that is reachable from two or more addresses will be identified as such. For example, take a look at the awesome bOingbOing blog. Some people link to it at www.boingboing.net, and some people link to it at boingboing.net. The links go to the same place, but the old Technorati code thought there were two blogs there. That's fixed now. It also means that the Technorati Top 100 and Interesting Recent Blogs lists are more accurate as well.
- Fixed the Link Cosmos display engine so that links that you create to your own blog don't show up on your Link Cosmos. I got lots of complaints from people on that bug, and I think it is most prevalent with people who use Radio with its Categories option to post multiple blog channels to different directories on the same site - it generates lots of self-referential links when doing blog updates.
- Added a Creative Commons license. You (and your browsing tools) will now see at the bottom of every Technorati page, the Creative Commons copyright license for the page. The license gives you permission to permits others to copy, distribute, display, and perform the work with attribution, and not for commercial purposes. In other words, you can't make a knock-off Technorati site by pulling all the content and replacing Technorati with your name, and you can't use the Technorati results for commercial purposes unless we work out a deal. Of course, that license doesn't apply to the RSS feeds that you get when you purchase a watchlist. You can use those for commercial purposes all you like.
Here's one idea I've been toying with: Would you be interested in viewing graphs of the number of incoming blogs/links to a site over time? It would be a great way to track interest and authority of a site as time passes. Would you be willing to help subsidize the work necessary to build it and store all the data? It's not something that I could work on right away (Sputnik work is my #1, #2, and #3 priorities right now), but I'd be interested in your thoughts. Leave comments below, and let me know.