Bayes, and the Mythical Viagra Spam

OK. I’ve been running a test where I’ve been attempting to gather Viagra spam into a Gmail mailbox, viagraspamtest@gmail.com

I started the test in mid-May, posting the email address to questionable and shady mailing lists, as well as linking the address in plain text on my blog.

And 6 weeks in, how is it going? Well…it’s not. There’s not even a hint of Viagra spam. Nothing in the spam folder, nothing in the Inbox. Just all legitimate mailings from newsletters.

Possibly, it takes time to get onto shady mailing lists. I imagine lists of emails get hacked and resold to spammers, but that it takes a while for a given email list to work its way down to the spammers.

Also possible — there’s no more Viagra spam. As in, spam mentioning Viagra by name. They are either advertising whole pharmacies, a class of drugs (blood pressure, E.D.), or the ad is in an image that gets embedded into the email.

Maybe I should retry this project with the word “pharmacy”. Viagra spam is *sooo* 2002, anyways.

Training Gmail: Sit. Stay. Good Gmail.

At the end of my presentation on Bayes’ Theorem at BarCampOrlando, there was some Q&A time.

I was asked a question about automatically training a spam filter, and I got into explaining how Bayesian filtering isn’t a “spam test” per-se. The simplest way to think about Bayesian filtering is that you sort email you’ve already received into two piles: email you don’t consider spam, and email you do consider spam. Then, through the magic of Bayes, new emails automatically get put in one of the two piles, based on which pile the new email most resembles.

Then I mentioned — as a bit of an oddity — that you could theoretically train Gmail to deliver nothing but Viagra spam to your Inbox. “Heh,” I thought, “that would be a neat trick.”

Hence: viagraspamtest@gmail.com

I’m trying to sign up for as many shady email newsletters and web forms as possible. I’m posting the email address here, as a fully-qualified mailto: link. Anything I can to start getting spam as fast as possible.  I’m planning on marking everything that mentions Viagra as “not spam”, even “1337-speak” emails like “V1agra”. Depending on how it goes, I hope to post results here.

(On a side note: I wonder how the IT dept at Pfizer handle spam. They must get a ton of false negatives for Viagra spam.)