I noticed this morning that starting somewhere around Dec. 20th, the bandwidth being consumed by my blog made a huge spike. A quick browse of my server log suggests some culprits. It looks like someone(s) or something(s) have been making automated sweeps of my blog from all kinds of different IP addresses. It’s not a search engine like Google or anything — just periodically all the front page links of my blog will be hit.
I don’t have time to track this down before I leave, but I am taking a few minor steps to try and limit the amount of bandwidth this is consuming. Forthwith, the month view and category view of my blog will only show titles rather than the entire entries. (I also fixed a bug that resulted in the category view not being sorted by date/time, which indicates to me most people aren’t using that view anyway.) I also reined in the number of entries that are returned by a category RSS feed — I was returning 50 items, which appears to be some number left over from the original BlogX codebase. I lowered it to a much more reasonable 10 items, which is what you get on the main RSS feed.
Anyway, hopefully nobody will notice nothing. When I get back I’ll delve further into the mystery of who’s scanning my site so regularly…
I seem to recall this same thing happening to Dave Winer at Scripting.com … If I remember correctly one of those sites that attempts to display ‘most recently changed’ weblogs was responsible for endlessly looping through links on his blog. I may be wrong … just wanted to throw it at ya’