Finally installed the very excellent reCAPTCHA plugin for WordPress. I like it when my posts get comments, I don’t like it when the comments are spam. So, if you look at the comment field on this blog, it’s got the reCAPTCHA interface hanging out, keeping us safe. Also, reCAPTCHA helps decode books. Awesome!

You should post a comment — try it out! :)

Bayes, and the Mythical Viagra Spam

OK. I’ve been running a test where I’ve been attempting to gather Viagra spam into a Gmail mailbox,

I started the test in mid-May, posting the email address to questionable and shady mailing lists, as well as linking the address in plain text on my blog.

And 6 weeks in, how is it going? Well…it’s not. There’s not even a hint of Viagra spam. Nothing in the spam folder, nothing in the Inbox. Just all legitimate mailings from newsletters.

Possibly, it takes time to get onto shady mailing lists. I imagine lists of emails get hacked and resold to spammers, but that it takes a while for a given email list to work its way down to the spammers.

Also possible — there’s no more Viagra spam. As in, spam mentioning Viagra by name. They are either advertising whole pharmacies, a class of drugs (blood pressure, E.D.), or the ad is in an image that gets embedded into the email.

Maybe I should retry this project with the word “pharmacy”. Viagra spam is *sooo* 2002, anyways.

Viagra spam filtering

My buddy Kevin saw my previous post on training Gmail to deliver only Viagra spam, as well as the part about how Pfizer must handle their spam filtering.

Being an enterprising person, he emailed Pfizer. Here’s their reply:

Date: April 15, 2008 12:53:23 PM EDT

Subject: RE:Email Validation

This email is sent by the Pfizer server. In order for us to

respond to your inquiry, we need to verify your email address.

Please complete this process by clicking on the link below.

Once you have completed this process, you will receive

a confirmation email

Thank you for contacting Pfizer.


A bit of a pain, but I can understand why. Good job.

Training Gmail: Sit. Stay. Good Gmail.

At the end of my presentation on Bayes’ Theorem at BarCampOrlando, there was some Q&A time.

I was asked a question about automatically training a spam filter, and I got into explaining how Bayesian filtering isn’t a “spam test” per-se. The simplest way to think about Bayesian filtering is that you sort email you’ve already received into two piles: email you don’t consider spam, and email you do consider spam. Then, through the magic of Bayes, new emails automatically get put in one of the two piles, based on which pile the new email most resembles.

Then I mentioned — as a bit of an oddity — that you could theoretically train Gmail to deliver nothing but Viagra spam to your Inbox. “Heh,” I thought, “that would be a neat trick.”


I’m trying to sign up for as many shady email newsletters and web forms as possible. I’m posting the email address here, as a fully-qualified mailto: link. Anything I can to start getting spam as fast as possible.  I’m planning on marking everything that mentions Viagra as “not spam”, even “1337-speak” emails like “V1agra”. Depending on how it goes, I hope to post results here.

(On a side note: I wonder how the IT dept at Pfizer handle spam. They must get a ton of false negatives for Viagra spam.)