Testing new anti-spam system, news at 11.0.0.0

Ben Okopnik [ben at linuxgazette.net]

Sun, 9 May 2010 17:41:43 -0400

Hi, all -

I'm currently trying out a new anti-spam regime on my machine; it's a sea-change from what I've been trying up until now (SpamAssassin, etc.) I'm tired of "enumerating badness" - i.e., trying to figure out who the Bad Guys are and block them. Instead, I've hacked up a procmail-based challenge-and-response system.

The operation of this gadget isn't all that complicated:

0) Copy all emails to a backup mailbox.
1) Archive mail from any of my bots, list-reminders, etc.
2) Deliver mail from any lists I'm on.
3) Dump any blacklisted senders.
4) Deliver any whitelisted ones.
5) Check headers to see if it's actually from me; deliver if so...
6) ...and dump any remaining email purporting to be from me into /dev/null.
7) Mail that doesn't fit the above criteria gets held and the sender is
notified of this. If they respond to this verification message, they
automatically get added to the whitelist. Held email automatically get
dumped when it's a month old.

So far, over the past few hours since I've implemented this, it seems to be working fine: zero spam (once I tuned #5/#6 a little more), and the valid messages seem to be coming through just fine. I'm still watching it carefully to make sure it doesn't blow up in some odd way, but so far, so good.

In about a month - depending on where I am and a number of other factors - I just might write this up. Having to manually go through and delete 500-1500 emails per day... I'm just totally over that.

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Mon, 10 May 2010 09:08:58 +0100

On Sunday 09 May 2010 22:41:43 Ben Okopnik wrote:

> Hi, all -
>
> I'm currently trying out a new anti-spam regime on my machine; it's a
> sea-change from what I've been trying up until now (SpamAssassin, etc.)
> I'm tired of "enumerating badness" - i.e., trying to figure out who the
> Bad Guys are and block them. Instead, I've hacked up a procmail-based
> challenge-and-response system.

In related news the spam-l mailing list is going through another thorough trashing of the challenge-response idea. it's also regarded as a bad idea by most posters on the mailop mailing list. I don't claim expertise, but the arguments pro and con are all here https://spamlinks.net/filter-cr.htm.

My own personal anti-spam technique of choice is greylisting. It lets some spam through, but not enough to be a big deal and it doesn't get people hot under the collar in the way that challenge-response seems to.

Neil

Top Back

René Pfeiffer [lynx at luchs.at]

Mon, 10 May 2010 11:57:38 +0200

On May 10, 2010 at 0908 +0100, Neil Youngman appeared and said:

> [...]
> My own personal anti-spam technique of choice is greylisting. It lets some 
> spam through, but not enough to be a big deal and it doesn't get people hot 
> under the collar in the way that challenge-response seems to.

I use a combination of Postfix options, greylisting, DSPAM and SpamAssassin. DSPAM is a superb tool for filtering: https://dspam.sourceforge.net/

Best, Ren?, who is thinking about writing a LG article about DSPAM.

Top Back

Neil Youngman [ny at youngman.org.uk]

Mon, 10 May 2010 11:04:47 +0100

On Monday 10 May 2010 10:57:38 Ren? Pfeiffer wrote:

> I use a combination of Postfix options, greylisting, DSPAM and
> SpamAssassin. DSPAM is a superb tool for filtering:
> https://dspam.sourceforge.net/

AIUI dspam is bayesian filter and SpamAssassin includes a bayesian filter, so I don't see what it adds, unless it's significantly better than the version in SpamAssassin?

Neil

Top Back

René Pfeiffer [lynx at luchs.at]

Mon, 10 May 2010 12:16:09 +0200

On May 10, 2010 at 1104 +0100, Neil Youngman appeared and said:

> On Monday 10 May 2010 10:57:38 Ren? Pfeiffer wrote:
> > I use a combination of Postfix options, greylisting, DSPAM and
> > SpamAssassin. DSPAM is a superb tool for filtering:
> > https://dspam.sourceforge.net/
> 
> AIUI dspam is bayesian filter and SpamAssassin includes a bayesian filter, so 
> I don't see what it adds, unless it's significantly better than the version 
> in SpamAssassin?

Bayesian filtering is not the only strategy that DSPAM implements: https://dspam.irontec.com/faq.shtml#1.7

Best, Ren?.

Top Back

Ben Okopnik [ben at okopnik.com]

Mon, 10 May 2010 12:35:38 -0400

On Mon, May 10, 2010 at 09:08:58AM +0100, Neil Youngman wrote:

> On Sunday 09 May 2010 22:41:43 Ben Okopnik wrote:
> > Hi, all -
> >
> > I'm currently trying out a new anti-spam regime on my machine; it's a
> > sea-change from what I've been trying up until now (SpamAssassin, etc.)
> > I'm tired of "enumerating badness" - i.e., trying to figure out who the
> > Bad Guys are and block them. Instead, I've hacked up a procmail-based
> > challenge-and-response system.
> 
> In related news the spam-l mailing list is going through another thorough 
> trashing of the challenge-response idea. it's also regarded as a bad idea by 
> most posters on the mailop mailing list. I don't claim expertise, but the 
> arguments pro and con are all here https://spamlinks.net/filter-cr.htm.

I'm pretty familiar with those arguments; in fact, much of the reason that I hadn't implemented CR years before was based on them. However, decisions of this sort are rarely (if ever) binary: there are a number of factors that go into making them, and many of these factors vary with time, etc. In this case, I had to weigh the fact that some sysadmins/ individuals might be annoyed enough that they'll block my mail against the literally hundreds of hours that I've so far wasted dealing with spam, and the hundreds (or thousands) more that I'd be wasting if I didn't implement something truly effective.

As I've mentioned, at this point, I'm getting 500-1500 emails per day, with a maximum of 20-25 of those messages being non-spam. Despite the best that SpamAssassin can do, even with Bayesian filtering, training, and user prefs configured to the max, I'm still getting a flood of spam in my inbox. The thing that makes it all completely untenable, and prevents me from "playing by the rules" as espoused by all the self-proclaimed anti-spam experts on lists of whatever description, is that I have

a) A relatively slow, fragile, connection to the Net (sometimes, I have none at all; this is, in fact, the case right at this very moment);

b) no control over/root access to my incoming mail server.

Game over. See, every single one of the wonderful solutions proposed by the boffins starts with, "first, obtain a spherical cow..." (a mail server that I can configure attached to an always-on high-speed net connection.) Well, I don't have a spherical cow. Based on the way that I've chosen to live (aboard a cruising sailboat), I have no way to feed and house a spherical cow. Therefore I CANNOT implement any of their wonderful, brilliant, and above all, non-geek-annoying solutions.

I've been looking for one for, oh, some ten years now, with no success; as far as I can tell, a solution that takes my case into account does not exist. So, I've been putting up with losing - again, let me emphasize - hundreds of hours of my productive time - of my life - to handling spam. This, mostly, because the Great Spam Council has decreed that this is what I'm supposed to do.

The time has come for me to say, Blow THAT For A Game Of Soldiers.

(Nothing against you, Neil; this is just my whole point and/or rant on the subject. Thanks for bringing it up, though.

> My own personal anti-spam technique of choice is greylisting. It lets some 
> spam through, but not enough to be a big deal and it doesn't get people hot 
> under the collar in the way that challenge-response seems to.

I've looked at a number of greylisting solutions; none of them work for my situation.

I'm going to be refining my CR setup - e.g., I'm going to keep a database of addresses to which I've sent verfication emails, and will not respond to them more than, say, once per week if they haven't been whitelisted; this should eliminate the "backscatter spam" argument. Perhaps I'll also limit the total number of verfication emails sent to a given address; after all, if someone hasn't responded after, say, three tries, chances are, they're never going to. I'm also going to tweak my MUA so that all the recipients from my outgoing emails will get automatically added to the whitelist.

Other than that - well, some people just chuck verification emails; too bad for them, I guess. I don't propose to spend my time coddling their preferences while they choose to get huffy about mine.

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Thomas Adam [thomas at xteddy.org]

Mon, 10 May 2010 19:35:47 +0100

On Mon, May 10, 2010 at 12:35:38PM -0400, Ben Okopnik wrote:

> Other than that - well, some people just chuck verification emails; too
> bad for them, I guess. I don't propose to spend my time coddling their
> preferences while they choose to get huffy about mine.

Seems a tad anti-social to me. Mind you, the rest of us who've been brain-washed, switched to using gmail, whereby all of these problems go away.

Works For Me (tm). Welcome to the revolution.

-- Thomas "I used to run my own mailserver, not anymore. Yay, free time!" Adam

-- "It was the cruelest game I've ever played and it's played inside my head." -- "Hush The Warmth", Gorky's Zygotic Mynci.

Top Back

Neil Youngman [ny at youngman.org.uk]

Mon, 10 May 2010 20:07:12 +0100

On Monday 10 May 2010 17:35:38 Ben Okopnik wrote:

> The thing that makes it all completely untenable, and
> prevents me from "playing by the rules" as espoused by all the
> self-proclaimed anti-spam experts on lists of whatever description, is
> that I have
>
> a) A relatively slow, fragile, connection to the Net (sometimes, I have
> none at all; this is, in fact, the case right at this very moment);
>
> b) no control over/root access to my incoming mail server.

It seems to me that b is the major issue here and C/R is solving the wrong problem. Might a better vhost solve the problem?

I know my vhost doesn't give me root access, but I can configure the anti-spam measures to a point where they do a pretty good job of keeping the spam out. The last time anyone got some spam past my vhost was over a week ago.

Neil

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 10 May 2010 17:04:05 -0400

On Mon, May 10, 2010 at 07:35:47PM +0100, Thomas Adam wrote:

> On Mon, May 10, 2010 at 12:35:38PM -0400, Ben Okopnik wrote:
> > Other than that - well, some people just chuck verification emails; too
> > bad for them, I guess. I don't propose to spend my time coddling their
> > preferences while they choose to get huffy about mine.
> 
> Seems a tad anti-social to me.

What, them or me? (Mind you, "yes" is a perfectly fine answer.

> Mind you, the rest of us who've been
> brain-washed, switched to using gmail, whereby all of these problems go
> away.  
> 
> Works For Me (tm).  Welcome to the revolution.

I've thought about doing that; essentially using GMail as a filter. That is, setting up all my .forward files to point to it, then just collect my mail from them. I might yet do that if this CR test doesn't prove out in one way or another.

GMail, of course, brings its own raft of problems - e.g., the privacy issues, etc. The thing is, I don't know that the average user [1] can afford to have that kind of scruples. ((

I don't like thinking that way... because for many people, that's the slippery slope to staying with Windows. "What do you mean, I have to learn to do things a new/different way to use Linux? I can't afford the time; I've got a business to run."

[1] In this respect, I unfortunately have to place myself in that category - since it appears that you do indeed need to have your own, erm, spherical cow to have any chance of battling spam on your own.

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Steve Brown [steve.stevebrown at gmail.com]

Mon, 10 May 2010 22:29:45 +0100

On Mon, May 10, 2010 at 05:04:05PM -0400, Ben Okopnik wrote:

> On Mon, May 10, 2010 at 07:35:47PM +0100, Thomas Adam wrote:
> > On Mon, May 10, 2010 at 12:35:38PM -0400, Ben Okopnik wrote:

> > Mind you, the rest of us who've been
> > brain-washed, switched to using gmail, whereby all of these problems go
> > away.  
> > 
> > Works For Me (tm).  Welcome to the revolution.  
> 
> I've thought about doing that; essentially using GMail as a filter. That
> is, setting up all my .forward files to point to it, then just collect
> my mail from them. I might yet do that if this CR test doesn't prove out
> in one way or another.
> 
> GMail, of course, brings its own raft of problems - e.g., the privacy
> issues, etc. The thing is, I don't know that the average user [1] can
> afford to have that kind of scruples. ((

I was going to suggest this as a potential solution, with one minor change:

Use Gmail to manage your grey list, everything that you've whitelisted is handled as normal. Everything else goes to Gmail. As you add every recipient of a reply to your whitelist, only one email gets sent to Gmail.

-- Steve

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 10 May 2010 19:30:41 -0400

On Mon, May 10, 2010 at 10:29:45PM +0100, Steve Brown wrote:

> On Mon, May 10, 2010 at 05:04:05PM -0400, Ben Okopnik wrote:
> > 
> > I've thought about doing that; essentially using GMail as a filter. That
> > is, setting up all my .forward files to point to it, then just collect
> > my mail from them. I might yet do that if this CR test doesn't prove out
> > in one way or another.
> 
> I was going to suggest this as a potential solution, with one minor change:
> 
> Use Gmail to manage your grey list, everything that you've whitelisted
> is handled as normal. Everything else goes to Gmail. As you add every
> recipient of a reply to your whitelist, only one email gets sent to
> Gmail.

That sounds rather intriguing, although I'm not really clear on the details (I'm having trouble parsing that last sentence, for some reason.) Let me see if I have it right:

1) Set up procmail to dump blacklisted email, route whitelisted email directly to me, and forward anything that remains to GMail.

2) Retrieve mail from GMail, and auto-add anything that they've passed through to the whitelist.

3) Optionally, once in a while, pull down the "Spam" folder from GMail and add it to the blacklist.

How did I do?

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Ben Okopnik [ben at okopnik.com]

Mon, 10 May 2010 19:39:32 -0400

On Mon, May 10, 2010 at 08:07:12PM +0100, Neil Youngman wrote:

> On Monday 10 May 2010 17:35:38 Ben Okopnik wrote:
> > The thing that makes it all completely untenable, and
> > prevents me from "playing by the rules" as espoused by all the
> > self-proclaimed anti-spam experts on lists of whatever description, is
> > that I have
> >
> > a) A relatively slow, fragile, connection to the Net (sometimes, I have
> > none at all; this is, in fact, the case right at this very moment);
> >
> > b) no control over/root access to my incoming mail server.
> 
> It seems to me that b is the major issue here and C/R is solving the wrong 
> problem.

Sorry, I'm not clear on what you mean. The problem is that I'm having to spend lots of hours dealing with spam, and C/R handles that.

> Might a better vhost solve the problem?

"Better" in this case meaning "more expensive" - well, yes, it would solve one problem... while creating another. I really don't see paying a hundred bucks a month for a dedicated vhost just so I can solve my spam problem. At that point, using either C/R or GMail seems like a huge win.

I guess my underlying premise here is, "I've got Linux. I've got pretty good programming skills. Why can't I have an inexpensive, relatively easy, effective anti-spam solution?" It just doesn't seem right.

> I know my vhost doesn't give me root access, but I can configure the anti-spam 
> measures to a point where they do a pretty good job of keeping the spam out. 
> The last time anyone got some spam past my vhost was over a week ago.

Perhaps I'm missing a Large Clue somewhere, then - despite all my study on this subject. Could you describe, in more or less general terms, how your greylisting setup works?

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Tue, 11 May 2010 08:29:33 +0100

On Tuesday 11 May 2010 00:39:32 Ben Okopnik wrote:

> "Better" in this case meaning "more expensive" - well, yes, it would
> solve one problem... while creating another. I really don't see paying a
> hundred bucks a month for a dedicated vhost just so I can solve my spam
> problem. At that point, using either C/R or GMail seems like a huge
> win.

OK, vhost was the wrong term, I have a hosted email setup that costs me about ?15 p.a.

> > I know my vhost doesn't give me root access, but I can configure the
> > anti-spam measures to a point where they do a pretty good job of keeping
> > the spam out. The last time anyone got some spam past my vhost was over a
> > week ago.
>
> Perhaps I'm missing a Large Clue somewhere, then - despite all my study
> on this subject. Could you describe, in more or less general terms, how
> your greylisting setup works?

I leave the details to my email host. It allows me to tune a few details like the spam score at which I reject and the use of different blacklists. I have some fairly aggressive blacklisting and it all just works.

Neil

Top Back

Steve Brown [steve.stevebrown at gmail.com]

Tue, 11 May 2010 08:48:41 +0100

On 11/05/2010 00:30, Ben Okopnik wrote:

> On Mon, May 10, 2010 at 10:29:45PM +0100, Steve Brown wrote:
>>  On Mon, May 10, 2010 at 05:04:05PM -0400, Ben Okopnik wrote:
>>  >
>>  >  I've thought about doing that; essentially using GMail as a filter. That
>>  >  is, setting up all my .forward files to point to it, then just collect
>>  >  my mail from them. I might yet do that if this CR test doesn't prove out
>>  >  in one way or another.
>>
>>  I was going to suggest this as a potential solution, with one minor change:
>>
>>  Use Gmail to manage your grey list, everything that you've whitelisted
>>  is handled as normal. Everything else goes to Gmail. As you add every
>>  recipient of a reply to your whitelist, only one email gets sent to
>>  Gmail.
>
> That sounds rather intriguing, although I'm not really clear on the
> details (I'm having trouble parsing that last sentence, for some
> reason.) Let me see if I have it right:
>
> 1) Set up procmail to dump blacklisted email, route whitelisted email
> directly to me, and forward anything that remains to GMail.
>
> 2) Retrieve mail from GMail, and auto-add anything that they've passed
> through to the whitelist.
>
> 3) Optionally, once in a while, pull down the "Spam" folder from GMail
> and add it to the blacklist.
>
> How did I do?
>
>

Pretty good Ben.

Only one point to make - I wouldn't auto-add everything that GMail passes. Some rare spam does get past the GMail filters.

My last sentence does need some explanation in the cold light of day.

You expressed some concerns over the privacy of emails passed through the GMail servers. My thinking was this:

Given that you already add every recipient of an email from you to your whitelist, and that every email conversation that you have will pass through your whitelist and not be delivered to GMail.

If someone writes to you for the first time, their email gets passed to the GMail account, passes the spam filter and waits in the inbox for your reply. At this point GMail only holds one email in an email chain, all subsequent emails will be caught by your whitelist and will not be sent to GMail at all. Thus the effect on your privacy is limited to the initiating email. Would that represent a minimal intrusion on your privacy?

-- Steve

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 May 2010 23:01:57 -0400

On Tue, May 11, 2010 at 08:48:41AM +0100, Steve Brown wrote:

> On 11/05/2010 00:30, Ben Okopnik wrote:
> >
> >That sounds rather intriguing, although I'm not really clear on the
> >details (I'm having trouble parsing that last sentence, for some
> >reason.) Let me see if I have it right:
> >
> >1) Set up procmail to dump blacklisted email, route whitelisted email
> >directly to me, and forward anything that remains to GMail.
> >
> >2) Retrieve mail from GMail, and auto-add anything that they've passed
> >through to the whitelist.
> >
> >3) Optionally, once in a while, pull down the "Spam" folder from GMail
> >and add it to the blacklist.
> >
> >How did I do?
> >
> 
> Pretty good Ben.
> 
> Only one point to make - I wouldn't auto-add everything that GMail
> passes. Some rare spam does get past the GMail filters.

I'm finding that with the C/R system, getting a single piece of spam is actually OK: I can instantly see how it got in, and move that address from the 'allow' to the 'deny' database. I can't think of an easy way to first vet and then add all the GMail-filtered emails to the whitelist - so perhaps this is still a pretty good approach, especially if I have a macro in my .muttrc that moves an entry from the whitelist to the blacklist.

> My last sentence does need some explanation in the cold light of day.
> 
> You expressed some concerns over the privacy of emails passed
> through the GMail servers. My thinking was this:
> 
> Given that you already add every recipient of an email from you to
> your whitelist, and that every email conversation that you have will
> pass through your whitelist and not be delivered to GMail.
> 
> If someone writes to you for the first time, their email gets passed
> to the GMail account, passes the spam filter and waits in the inbox
> for your reply. At this point GMail only holds one email in an email
> chain, all subsequent emails will be caught by your whitelist and
> will not be sent to GMail at all. Thus the effect on your privacy is
> limited to the initiating email. Would that represent a minimal
> intrusion on your privacy?

I see what you mean; that makes a lot of sense. Sure, that does alleviate some of my concerns.

The only thing I'd have to figure out is how to sync everything between my mail server (where the mail triage is going to happen), my machine (where the spam/ham decisions are going to be made), and GMail. Hmmm... have to give that some thought.

Thanks for helping stir up my idea mill!

-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *

Top Back

Steve Brown [steve.stevebrown at gmail.com]

Wed, 12 May 2010 09:07:52 +0100

On 12/05/2010 04:01, Ben Okopnik wrote:

> On Tue, May 11, 2010 at 08:48:41AM +0100, Steve Brown wrote:

> The only thing I'd have to figure out is how to sync everything between
> my mail server (where the mail triage is going to happen), my machine
> (where the spam/ham decisions are going to be made), and GMail. Hmmm...
> have to give that some thought.
>
> Thanks for helping stir up my idea mill!

I couldn't resist adding some more grist ;-)

You can either collect the emails using GMail's POP and IMAP interfaces or have GMail forward the emails on to you.

Both options allow you to delete emails remaining on GMail's servers if you wish. Spam will be left at GMail for them to look after, and you get the greylist free of cruft.

Top Back