Ever wondered where your spam comes from? Some partial answer may be found within. We will also explain why over 98% of the Internet are a good place to be.
I let the SpamAssassin plugin which I wrote in response to SpamAssassin Bug 4770 run for a couple of days on my spamtrap. Then I wrote a small script to group the received spams by ASN (Autonomous System Number) — see the full report data.
Origin by ASN
Not surprisingly, the top spots are large end-user netblocks. “Country-bashing” or “provider-bashing” is not very helpful, because there is a very long tail of ASNs if data is aggregated by country or provider — those with the highest absolute number of computers have by the highest count of originating spam (Mr. Obvious just called…). Overall 6’477 spams originated from 982 unique ASNs
Click to enlarge
Origin by Prefix
There is an even greater variety if we do not only consider the ASNs, but also the prefixes (“IP blocks”) within these ASNs. For example spam from the top spam source (AS3352, Telefonica España) originated from no less than 70 different prefixes; the runner-up (AS3320, Deutsche Telekom) shows up only with 7 different prefixes. However this is not really meaningful, since providers announce prefixes of remarkably different sizes (Deutsche Telekom /10 to /14, Telefonica España mostly /16s).
Overall, spam was received out of 3’686 different prefixes; the distribution is somewhat less skewed than that of the ASNs – only 40 prefixes delivered more than 10 spams. If the data is normalized by the size of the prefixes, the distribution becomes almost “flat”. It’s interesting to note that there is a correlation between smaller prefixes (eg /24) and higher “penetration” of spam sources. This may indicate “dirty networks” rented out to spammers directly which tend to be typically smaller than end-user provider prefixes.
The worst non-ISP prefix happens to be 204.14.0.0/21. Senderbase.org shows that the domain name “sls-hosting.com” (registered to a postal address in Bulgaria with phone numbers in the US and connectivity by Time Warner Telecom) is used for rDNS in that prefix which has a track-record in news.admin.net-abuse.sightings — combined with the token “optin” on all their rDNS names, this looks highly suspicious. Currently only a subset (204.14.1.0/25) of that prefix seems to be used but I guess we will hear more from them in the future.
You can download the prefix distribution data here
Conclusions
Besides being a possible input to bayesian filtering, AS and prefix data are useful tool to identify “hotspots” of spam sources. It is especially useful to identify “dirty netblocks” which are big enough so that activity out of these prefixes remains under the trigger threshold for blacklists.
It would be too dangerous to rely on the gross number of spams out of a given ASN as the sole indicator; it is necessary to also take the size of individual prefixes within a given AS into account and to apply professional judgement by consulting additional sources. The “ratio” column (count * 100 / number of IP addresses) in the above linked prefix distribution data may be a first approximation.
And finally: according to Team Cymru, there are 234’168 announced prefixes on the Internet as of the time of writing this article. We received spam from 3’686 such prefixes, meaning that we did not receive spam from 98.4% of the Internet. Not so bad, after all :-)