I released the first public version of yesterday, and awoke to 4 new patches in my inbox! Impressive, go open source! I guess. I'll be integrating the patches and adding support for HINFO, TXT, and SRV records and releasing a new version very soon.


Amavisd-lite 1.3 is out!

At long last, amavisd-lite supports SpamAssassin 3.0! It is still backwards compatible with SA 2.xx if you still wish to use the older version. There is more information at the official amavisd-lite homepage. I also released my SA ruleset and configuration, as well as a shell script to use to prune your bayes database on a regular interval. Both of those are available from the downloads page.

Begone, DSPAM!

I have dumped DSPAM and gone happily running back to SpamAssassin (I also just upgraded to 3.0). My thoughts on the subject are best described in this post, but for simplicity, I will mirror them here:

I totally agree. I used DSPAM for a while, gave it a fair shot, participated on the mailing list. I even, at times, got encouraging results. Ultimately, DSPAM required way too much nursemaid work to make it work for my installation and I scrapped it and went back to SA. The general feel I got from the DSPAM crowd was a big dick waving contest with other products, particularly, but not limited to, SA. A typical mailing list message looked like:

Me: "SA is able to do X accurately. I cannot seem to achieve this with DSPAM. Am I doing something wrong, is there something I need to configure further?"

Response: "SA is inferior. You don't want X. Besides, DSPAM has Y, which approximates X. No, I can't tell you how to do it specifically, but know that you need only DSPAM."

Personally, I found that DSPAM is blatently unable to train itself properly. You might have to train it 4 or 5 times with the same message to get it to classify that message as spam. It doesn't recursively train like SA does. This leads to users getting the EXACT SAME spam multiple times, despite their best efforts to train the filter. In addition, DSPAM's group features are sparsely documented and somewhat magical in their behavior. And of course, with out these features, DSPAM is useless to an installation of people who really do NOT want to have train their spam filter for months to get it to work right.

I'm all for competing software/products, but both projects are OSS, there is no money involved here, and I can't see how bashing the other product while concurrently not being able to do better, or even match it can be viewed as a step forward...

But hey, that's just me, your mileage may vary.

I'd also like to thank the posters who helped me to prove my point ever so eloquently. I'd like to note, for the record, that I made every effort possible to work with the DSPAM people. I DID configure DSPAM correctly, and like I said, I did get some good results at times.

These people keep overlooking what I have stated to be the huge underlying problem with DSPAM: No one wants to have to do that much work to block spam. Not users, not admins. DSPAM ships with no out of box rules, no pre-training. Sure, you can use group features to give a bunch of people the same bayes database. But this is highly inaccurate, as each person receives different types of spam and ham. It's flatly ineffective. You can also use their web interface, but then again, no, that's just one more program for users to have to remember to use, or remember how to find, or remember a password for. It's NOT a good solution. It's probably fine for Linux zealots and tech savvy folks. It DID work fine for me. It does not, however, work for, say, my mom, or my brother and sister. Or my friends who are non-technical. It was an abomination to them.

I really did give DSPAM a good shot, despite my initial prejudice to the idea and the amount of work involved. Ultimately, I found it to give unacceptably bad results, particularly with regards to false positives, and at the end of the day, on my server, SpamAssassin provides the best possible results, with minimal training and upkeep. If DSPAM works for you, I'm happy for you, I'm always happy to see spam affect less people. Just know that I will not be using it.

Spam returns

Latest Update: I have dumped DSPAM. I thought I was doing too much work before with SA? HA. DSPAM was infinitely worse overall. In addition, SA has had a few releases since the issues I was having, and they seem to have cleared everything up. I've also gone to SA 3.0 with good results.

Update: I am still experiencing FPs at the rate of 1 or 2 a day. While not horrible, this is unacceptable. Dspam seems to work under a guilty until proven innocent paradigm when it comes to classifying emails, unlike SA, which does the opposite. I haven't enabled whitelisting yet, but I would have thought that dspam would have figured out by now that whenever a message contains certain addresses it is ALWAYS good mail. I'm still diligently training it. It has only missed 1 or 2 spams in the last 2 weeks, which is pretty good, given my volume. We'll see what happens int he coming weeks...

My live dspam stats are available here.

SpamAssassin is having some trouble with my current spam influx. I've used it for years now, added rules, re-scored rules, trained the Bayes filter repeatedly, and generally spent way too much time on something that should not ever waste my time.

So, I decided to try out dspam. After some training and a bit of adjustment to the way of doing things, it seems to be pretty smart. It also seems to get better visibly each time I train it. Where with SA I could run a message through as ham 2 or 3 times before it would survive the bayes filter, dspam doesn't seem to have this problem, it learns right away (and yes, I WAS cleaning out my bayes db regularly).

Update: This seems to be just dumb luck on DSPAM's part. Later on in testing, it began to exhibit the same symptoms as my old SA install.

So far, including some early on gaffes with training, my stats are as follows (formatted for web):

kilika# dspam_stats jwbozzy
  TS:  5833 # Total Spam
  TI:  2585 # Total Innocent
  SM:    95 # Spam Missed
  IM:    21 # Innocent Missed
  SC:  8452 # Spam Corpusfed
  IC:    25 # Innocent Corpusfed

Well, lets see. Total messages = 8418 (TS + TI). Total screwups = 116 (SM + IM). That looks bad on the surface, but that actually works out to 99.986% correct diagnosis. Not bad for only about 4 days training.

The one problem I have with this is the number of Innocent Missed. False Positives are what can kill a spam filter faster than any amount of spam let through. If I were a normal user, that's 21 important messages I would not have received over the course of a week, roughly 3 a day. That isn't statistically significant, but it is nevertheless bad. I'm going to move around my procmail rules to deliver mailing lists before checking X-DSPAM-Result to see what impact this has, since, to my recollection, half of the FPs were mailing list traffic.

I am waiting to see what happens for the next month or so. I will diligently check my spam bin to make sure good stuff hasn't gotten dumped. I can only hope that this gets better. If it does, I hope to not have to investigate my spam inflow for a long time to come.