...making Linux just a little more fun!
Kapil Hari Paranjape [kapil at imsc.res.in]
Thu, 1 Feb 2007 17:58:37 -0800
Hello,
I was just wondering if someone has thought about what is the purpose of SPAM messages which only contain some mish-mash text. The qualifying criteria are:
1. Only text without attachments. 2. Text that contains incoherent sentences or disconnected sentences. [*] 3. Definitely messages from addresses not known to the recipient.(The last is to exclude e-mail messages written by friends who are not quite sober at the time of writing ).
Here are some possiblities that occured to me.
A. These are messages that are designed to test/mar the efficiency of the spam detection systems currently employed by servers.
B. These messages contain coded messages that are flooded across the internet in an attempt to disguise their true origin/destination. The real message could be short one such as "the machine from which this has been sent has serious security holes".
C. This is generated for someone's research project.
D. This is the result of some spam generating software/virus which has bugs.
I don't know if this is worth wondering about ... except ... why is someone going to some trouble to make (a program which is making) life difficult for everyone?
Pointers to prior discussion welcome.
Thanks and regards,
Kapil. [*] Clearly (2) is subjective which makes such spam hard to detect automatically. --
Rick Moen [rick at linuxmafia.com]
Thu, 1 Feb 2007 18:10:43 -0800
Quoting Kapil Hari Paranjape (kapil at imsc.res.in):
> I was just wondering if someone has thought about what is the purpose > of SPAM messages which only contain some mish-mash text.
I'm guessing it's intended to poison ("untrain") Bayesian databases used for spam-detection -- specifically, those there are set to "auto-learn".
The lesson is that you don't want auto-learn, but instead want to periodically manually feed the Bayesian classifier spam and "ham" (non-spam mail) to keep it on-track.
That of course will not, per se, do anything to intercept the gibberish mails, but will prevent it from doing any harm beyond merely being there.
-- Cheers, "If these walls could talk... they'd probably say 'No! Rick Moen Not the nails again! Not the hammer! NOT THE HAMMER!!!!'" rick at linuxmafia.com -- Jennifer A. Ford
Thomas Adam [thomas.adam22 at gmail.com]
Fri, 2 Feb 2007 02:14:32 +0000
On Thu, Feb 01, 2007 at 05:58:37PM -0800, Kapil Hari Paranjape wrote:
> Hello, > > I was just wondering if someone has thought about what is the purpose > of SPAM messages which only contain some mish-mash text. The > qualifying criteria are: > 1. Only text without attachments. > 2. Text that contains incoherent sentences or disconnected > sentences. [*] > 3. Definitely messages from addresses not known to the > recipient. > (The last is to exclude e-mail messages written by friends who are > not quite sober at the time of writing ).
You're forgetting so-called "image spam" whereby the message is as an image, a kind of CAPTCHA, but not.
> Here are some possiblities that occured to me. > > A. These are messages that are designed to test/mar the efficiency of > the spam detection systems currently employed by servers.
That's always the sole purpose of spam obfuscation.
> B. These messages contain coded messages that are flooded across the > internet in an attempt to disguise their true origin/destination. The > real message could be short one such as "the machine from which this > has been sent has serious security holes".
That's a little bit of fiction, isn't it?
> C. This is generated for someone's research project.
Heh.
> D. This is the result of some spam generating software/virus which has > bugs.
Unlikely. Note that some spam messages don't even bother to hide the fact that they're spam anymore. Take the classic 'replica watches' spam. Years ago, you'd only get away selling fake watches if you pretended they were real and genuine. Now, spam doesn't even bother to do that, instead making the person away they're fake is a bonus. ;)
> I don't know if this is worth wondering about ... except ... why is someone > going to some trouble to make (a program which is making) life difficult > for everyone?
Because they're appealing to the mass of AOL users who generally don't know one plug-end from another. The law of averages says that at least one of them will be taken in by it.
-- Thomas Adam
-- "Wanting to feel; to know what is real. Living is a lie." -- Purpoise Song, by The Monkees.
Benjamin A. Okopnik [ben at linuxgazette.net]
Thu, 1 Feb 2007 23:11:53 -0700
On Thu, Feb 01, 2007 at 05:58:37PM -0800, Kapil Hari Paranjape wrote:
> Hello, > > I was just wondering if someone has thought about what is the purpose > of SPAM messages which only contain some mish-mash text. The > qualifying criteria are: > 1. Only text without attachments. > 2. Text that contains incoherent sentences or disconnected > sentences. [*] > 3. Definitely messages from addresses not known to the > recipient. > (The last is to exclude e-mail messages written by friends who are > not quite sober at the time of writing ).
...or are going for a degree in philosophy or politics.
> Here are some possiblities that occured to me. > > A. These are messages that are designed to test/mar the efficiency of > the spam detection systems currently employed by servers.
I don't think that the "testing" idea plays out very well, actually; most of the possibilities seem to be non-winning ones.
1) A server such as Qmail, as described by Rick in https://linuxgazette.net/131/moen.html is going to do the Great White shark act and swallow anything directed to it, thus producing no useful data for the spammers.
2) The average Joe's mail server isn't going to be configured in any sensible way even though it could be - same response, same lack of data.
3) A server set up to use the Sender Policy Framework (SPF) will usually reject the message (given that spammers usually try to forge their addresses) - but this is not content dependent, so the rejection still gives no useful data. In addition, some of these smart folks are going to be teergrubing the idiots spamming them - so that firing blanks of this sort just accumulates data on the receiving end, which will then be used to tarpit the buggers. To put it mildly, that will still not be a useful result to the spammers.
4) Some small percentage of servers will indeed be configured to a) check for compressed pork content and b) reject it with a "polite note" (e.g., a 554.) This might allow spammers to tune their garbage... but that data would be poisoned by the fact that the 554 is just as likely to be based on the source IP as on the content.
If we postulate, however, that one of the above-mentioned drunks or philosophers (it can be difficult to tell, sometimes...) uses Qmail - those facts may even be related - and decides to answer the gibberish, perhaps by arguing with it, then some spammer-usable data could result. You must admit, though, that this is a low-percentage possibility; there must be some other reason. Perhaps, as has been mentioned here, the "cache poisoning" effect is what they're trying for.
As far as I recall, SpamAssassin at least has auto-learning disabled by default. Does anyone here have any data on other Bayesian filters policies? Perhaps most importantly, Bayesian filters on Wind0ws machines and (e.g.) Yahoo's spam filters? I suspect the latter is intentionally shrouded in Deep Mystery - given how "effective" the 'security through obscurity' tactic has proven to be.
(The practice of "always fighting the last war" is, alas, not solely the province of the Pentagon; the habit of reacting rather than acting in time with intent and intelligence is far too common of a characteristic. Hell, I guess if Breitenfeld and Kasserine Pass aren't enough of a clue for the brass hats, then we certainly shouldn't expect the average Micr0s0ft-certifried /soi-disant/ "system administrator" to do any better...)
> I don't know if this is worth wondering about ... except ... why is someone > going to some trouble to make (a program which is making) life difficult > for everyone?
I've just taken a look at the spam in my spam bucket; out of the rather large pile of messages in there, not a single one matches your description. The ones that contain gibberish either have an image attachment (the majority of these appear to be stock promotion schemes), some "real" text following the gibberish (pharmaceutical/herbal ads), or - and this is rather cute - a bunch of fake HTML/XML tags interspersed with the ad text (e.g., '<this>Bu<is>y ou<fake>r Vi<html>ag<crap>ra'). The latter tend to be rather rare, which is a data point in figuring out the level of sophistication of the average spammer (still fairly low.) If you have one of these emails handy, and could make it available by posting it somewhere and providing a link, I'd like to examine it.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *
Kapil Hari Paranjape [kapil at imsc.res.in]
Fri, 2 Feb 2007 10:51:35 -0800
Hello,
Thanks for all the suggestions.
On Thu, 01 Feb 2007, Benjamin A. Okopnik wrote:
> If you have one of these emails handy, and could make it available by > posting it somewhere and providing a link, I'd like to examine it.
Oops. Just cleaned out the "junk" folder. Doubtless more will accumulate over the weekend!
Regards,
Kapil. --
Kapil Hari Paranjape [kapil at imsc.res.in]
Wed, 21 Feb 2007 12:53:13 -0800
Hello,
On Thu, 01 Feb 2007, Benjamin A. Okopnik wrote:
> If you have one of these emails handy, and could make it available by > posting it somewhere and providing a link, I'd like to examine it.
This kind of spam is really quite rare but here is an example:
https://www.imsc.res.in/~kapil/worthless_spam.txt
Regards,
Kapil. --
Ben Okopnik [ben at linuxgazette.net]
Sat, 24 Feb 2007 21:05:52 -0500
On Wed, Feb 21, 2007 at 12:53:13PM -0800, Kapil Hari Paranjape wrote:
> Hello, > > On Thu, 01 Feb 2007, Benjamin A. Okopnik wrote: > > If you have one of these emails handy, and could make it available by > > posting it somewhere and providing a link, I'd like to examine it. > > This kind of spam is really quite rare but here is an example: > > https://www.imsc.res.in/~kapil/worthless_spam.txt
I suspect that it's a broken attempt by a spammer that didn't quite work out - much like the ones titled "[VAR114]" (or something similar) that end up in my spambucket. Spammers don't do their experiments in a sandbox; their crap, whether successful (i.e., carrying their payload) or not, ends up scattered all over the Net.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *