None

New and Improved! Technorati Keyword Search...

Technorati just released to the world the completely rewritten keyword search system we've been working on, and we're soliciting feedback!
The major improvements of this release are:

  • Ability to keep up with the ever-increasing number of new weblogs and weblog updates each day (over 11,000 new weblogs created every day, over 100,000 updates every day). Median time from post to live index is 7 minutes. Over 1.6 Million sources tracked. I know of no other engine that even comes close.
  • Post-level search, which means we can add synthetic RSS/Atom/whatever feeds whenever we're ready
  • Keyword highlighting of search terms
  • Permalinks to each post (well, they *should* work)
  • Full integration with the current Link Cosmos search - it uses a pretty smart algorithm to figure out if you're searching on a URL or not - and if it sees a URL, it will do a Link Cosmos search by default, otherwise it performs a keyword search. You can also force a search type using a link on the results page, in case we guess wrong.

  • Some advanced search capabilities, including using double quotes to search phrases, + as the AND operator, and - as the NOT operator:

    e.g. "Janet Jackson" +timberlake -grammy

Oh, and we've made some significant speedups on getting the cosmos of big sites, like the New York Times, CNN, Google, Yahoo, and bOingbOing. Give 'em a whirl. Amazon is my favorite, it is like getting a quick glimpse into people's buying habits at any given time of the day.

We're looking for your feedback! Some areas where we already know we need improvement: Response time is important, and our goal is to get to under one second search times. For now, most keyword searches run in under 10 seconds. There are occasional duplicate posts in the database, we're working on cleaning those out as well. We're also getting the search XML APIs ready to go, they aren't quite ready to launch today.

Please let us know what else we can do to improve!

The new search is up on the Technorati site, so go and give it a try, and by all means, let us know what you think.

Technorati is hiring!

Technorati is looking for great engineers and product folks, including an Infrastructure Engineer/Lead, UI Engineer, Operations Infrastructure Engineer, Product Engineer, Director of Product Marketing / Production, and Director of Sales / Syndication. If you have the attitude to lay your ego at the door and work your ass off and the chops to make incredible things happen with a great team in San Francisco, we're looking for you. Don't email me directly - send all inquiries to jobs@technorati.com.

Technorati is hiring!

Technorati is looking for great engineers and product folks, including an Infrastructure Engineer/Lead, UI Engineer, Operations Infrastructure Engineer, Product Engineer, Director of Product Marketing / Production, and Director of Sales / Syndication. If you have the attitude to lay your ego at the door and work your ass off and the chops to make incredible things happen with a great team in San Francisco, we're looking for you. Don't email me directly - send all inquiries to jobs@technorati.com.

Giving Ecto a try

Just downloaded Ado's update to Kung-Log, called Ecto. Seems to have a nice and easy set up, and the posting interface appears pretty clean. Automatic spell-checking is there as well, which is a nice feature too. If only it wasn't only available for this lousy Mac hardware. I've said it before and I'll say it again: If IBM and Apple teamed up and released OSX on a Thinkpad T40, I'd buy one in a heartbeat.

Anyway, that's not Ecto's fault. Kudos to Adriaan on a job well done.

Technorati Hacks at ETCon

I'm speaking at the O'Reilly Emerging Technologies Conference! My talk is called "Technorati Hacks", and it is at 11:00 AM on Tuesday, February 10 in the Plaza room. This is right after the opening keynote, so I'm really excited to be in a "lead-off" position. And I'll be followed by the excellent Liz Lawley at 11:45AM, talking about "Breaking Into the Boys' Club: How Diversifying Your Team Can Expand Your Market". If you're coming to the show, leave a comment or a trackback here - are there any areas or topics you'd like me to cover or explain? I'm planning a few fun surprises up my sleeve...

What is Technorati?

If you're one of the tens of thousands of people who use Technorati every day, you'll notice that most of our changes (on the new beta site so far have been under the hood. Changes to the body have been minimal. As a result, we've been scratching our heads because we've never explained exactly what Technorati *is*. For that matter, we've never explained much about what a "cosmos" is, either -- even though that's what Technorati finds in its searches.

So I thought it would make sense to ask you what Technorati is. Is it a search engine for blogs? A conversation engine? Or something else again?

Same with "cosmos." Is there a more self-explanatory word for what Technorati finds? Or a better way to say exactly what "cosmos" means?

Let us know. We'd like to hear from you. Thanks!

New Technorati Infrastructure beta test!

Folks,

After 2 months of painstaking effort, I'm proud to announce the new Technorati infrastructure is up and ready for use.

Please have a look, and tell us what you think:

http://beta.technorati.com/

We focused 100% of our time on completely refurbishing our underlying event engine - essentially taking a volkswagen engine out and putting a Ferrari engine in. This new engine sports:

1) Much faster indexing - the median amount of time it takes from when someone posts something on their weblog to when it is captured and searchable via our live database is 7 minutes.

2) Much faster querying - our goal is to have every search query take less than a second, even as the database is being continuously updated. We added a query timer at the top of every results page so you can judge for yourself.

3) Much more scalable - We built this distributed database system to scale. As we track more events, we add more machines to scale. As our user traffic increases, we add more machines to scale. This should continue to work for quite some time, so we're eager to test under load.

4) Much better internationalization support - The database is entirely in UTF-8, a character set that encompasses a significant number (well, all) of non-english languages, including Japanese, Farsi, Hebrew, and many others. You can see results in multiple languages all on the same page. Localization should be significantly easier.

5) A new, smarter spider/crawler, which understands weblog posts and blogrolls much better than our old spider. You'll note that on our results pages, many results offer a "Read Full Post" capability, which take you directly to the entire microcontent post that created the link.

6) A redone results page, which should load faster, and is designed for non-browser usage as well. Lots has been moved to CSS, and we've added a nifty pager widget at the top and bottom of each page of results.

Please go and use the site - and send us feedback.

Some known issues: There are a few areas where we're still filling out content, fixing bugs and layout, like in the top 100 page, breaking news, current events, and other pages. We're looking to find showstopper bugs or problems before we move this beta infrastructure over to the production site. So, don't fret if a page you like is currently missing or if the top 100 is messed up, we're fixing that. You may also see a change in your inbound blogs/links numbers, but that is primarily due to the fact that we're still bringing the new database up to speed, so we know that some of the numbers are different.

Thanks again for your time and patience, and on behalf of the entire Technorati team, we thank you for all of your support. We're really looking forward to your feedback.

How to limit comments to one week after posting

I've been thinking a lot recently on the issues we'll be facing as blogging continues to grow and be successful. One of the spam fighting tools (metaphor: club, not lojack) I implemented was a restriction on comment posting to posts that have been created in the past week. A few people have written and asked how I implemented the time restriction on comments.

It is actually very simple, once you've got a SQL backend (see previous post) for your blog. Here's the script I have running out of my crontab:

#!/usr/bin/perl

use DBI;

my $dbname = 'DATABASENAME';
my $hostname = 'localhost'; # Change this if the db is on another box
my $dbuser = 'DBUSER';
my $dbpass = 'DBPASS';

$dbh=DBI->connect("dbi:mysql:database=$dbname;host=$hostname", $dbuser, $dbpass);

$dbh->do("update mt_entry set entry_allow_comments='0' where entry_created_on < date_sub(NOW(),interval 7 day)");

I run this every hour. Basically, it makes a single SQL call, which updates the entry_allow_comments column on mt_entry to turn off comment posting on entries that are older than 7 days. Use and enjoy.

Revamped Blog design

I can't believe it - I actually had a few hours today to work on the weblog. There's been so much going on lately that just about the last thing that I've had time to do was work on the blog, write a new entry, or play with some new software (heck, I was still running MT 2.21!)So, enjoy the new site. I tried to keep things looking about the same, but there's a lot changed on the backend, and some new features available:

  • Technorati Integration (using Adam Kalsey's Technorati Plugin)
  • Snippets sidebar mini-blog, powered by del.icio.us, and some fun RSS hacking
  • ATOM feed, RSS 2.0 feed, RSS 1.0 feed
  • MT-Blacklist to kill off all that nasty blog spam. Man, it was getting nasty!
  • Disabling comments after the post is older than one week old. Sorry, if you don't like it, post on your own blog.

Anyway, those are just a few of the new features. I also put the entries into MySQL so it'll make a whole bunch of scripting/munging easier as well. I hope that this'll make it easier for me to blog more again, but I'm afraid that the bigger issue is lack of time. Soooo busy busy busy...