Friday, November 11. 2005A day in the life of a spammerIn this article, I investigate one particular wave of spam received on a medium-sized mail system. The first part of the article is an analysis of how the spam is sent and displays some identifying characteristics. The second part then aggregates this information into a “current best practice” from a Spammer point of view, while the third part suggests effective counter measures against this “best practice”. Most of what I write here is not new knowledge. However, this text may help connect some dots to form a sharper picture of worthwile technical measures against spam on the recipient side. Data AnalysisThe sample collected consists of over 21’000 unambigously identified spam messages, sent by the same — unknown — entity, targetted at over 4’000 addresses over a period of five days. The mails were sent in five distinct waves, each between 18 and 24 hours apart. For the 21’000 messsages, the sender (ab)used 150 domains for the From: addresses. Of these 150, only 3 were invalid (meaning: no associated A or MX records, so that the receiving mailservers refused the transaction). Each of the 21’000 messages was delivered in a unique SMTP transaction. From one wave to the next, the spammer delivered messages to an additional two valid addresses. Within a wave, the delivery was sorted alphabetically by domain, ie, each wave was actually a number of peaks, each peak corresponding to a specific domain. Due to the alphabetical distribution of the domains and their respective size, the first two peaks overlapped, while the third peak followed shortly after. Within a given peak, and from a given IP address, the addresses were sorted alphabetically by the localpart. As far as we can tell from the data at hand, there were no invalid addresses in the spammer’s list. However, data on that matter is a bit weak, and no premature conclusions should be drawn. Delivery was done over 852 distinct IP addresses. At the time of the delivery, none of these IP addresses were listed on one of the DNSBLs used (Spamhaus SBL and Spamhaus XBL, which consists of opm.blitzed.org, cbl.abuseat.org and njabl.org). However, on the fifth day when all IP addresses were re-tested, 750 of 852 IP addresses (88%) were listed on at least one DNSBL. On the same day, a random sample of thirty IP addresses was chosen for closer inspection. All but 6 of those IP addresses were already “dead” (ie neither reachable through an ICMP ping nor through an “nmap -sT” scan). The remaining 6 IP addresses all had a listening service on port 6667/tcp plus a number of services common to Microsoft Windows networking (eg 445/tcp). Despite the small sample, it is safe to assume that the pool consisted mainly (exclusively?) of hijacked end-user computers. Although port 6667/tcp indicates some IRC-like functionality, none of these machines accepted regular IRC client connections on this port. Most likely, some basic access control is in place. The investigation of a suspected botnet would be worth a separate study. Interesting data was gathered from the analysis of the distribution of IP addresses. Only 15 addresses were used in more than one wave. The remaining 837 IP addresses had an average time-to-live (TTL) of 4.3 minutes, during which 25.9 messages were delivered (median TTL: 2.5 minutes, 9 messages). 100 IP addresses were seen only once (ie only one message deliverd), and 262 IP addresses were seen for less than one minute. 110 IP addresses were seen for ten minutes or more. ![]() Distribution of Amount of Messages vs. Time To Live. Well over 50% are below 10 minutes, more than 95% under 25 minutes. Target AnalysisAll spam messages were basically identical in their structure and have been seen a thousand times: a bit of advertising text, a URL, and some trailing hash-buster garbage. The spamvertized URLs (ie the targets of the spam messages) pointed at domains which had usually been registered two days before the wave where it has been used. Each wave had it’s own URL. The domains were registered through different registrars; for the same registrar, very similar (if not identical) whois data was present – albeit it can safely be assumed that all data shown there is faked, with the exception of the e-mail address, which most likely are throw-away accounts. The nameservers for these domains remained mostly stable between the five waves. However, there was a steady shift of nameservers across different providers or within the same network neighborhood. At the fifth day (when we conducted the tests), all IP addresses of the nameservers were listed on the Spamhaus SBL. Later follow-up tests showed that for subsequently used domains, at least one nameserver has already been “SBLed”. During the observation period, the nameservers were located in IP spaced managed by Korean ISP epnetworks.co.kr, while they were later shifted to other areas of Korea. Those later listings regarding the nameservers are, according to Spamhaus data, attributed to ROKSO spammer Michael Lindsay On the seventh day (ie two days after the initial tests), some of the domains used to behave erratically, ie the nameserver IP addresses stopped functioning, some domains suddenly pointing to Verisign nameservers, and some domains disappeared completely from DNS (although they were still available through whois, which usually lags behind DNS data by some stretch of time). The hosting of the actual web-pages was first located in China (CNC Group HuNan, Uplink through Savvis.net). Later, some of the DNS moved to the CNC Group HuNan space, while the web-pages moved to CNC Group Guilin. However, all these webhosting machines are merely proxies which fetch their content from an offsite location. The proxies return a “Server:” header value of “Apache/1.3.31 (Unix)”. The assumption that this is a true Linux machine is supported by the output of nmap, which further shows the ports 21/tcp, 22/tcp, 25/tcp, 53/tcp and 80/tcp open and an uptime of 170 days in one instance. Although the mail server identifies itself as a Microsoft product, it very much looks like a sendmail installation. The ftp server identifies itself as a ProFTPD. However, the web-page returned by the proxies have references to .asp files, which is a bit unusual, although not unthinkable, for a Unix machine. We can now safely assume that the spammer is using a setup with a number of easily disposed “rented” Linux boxes, which act as proxies for at least one Windows machine. According to the timestamp in the SMTP greeting, the SMTP server is located in GMT +03:00. GMT +03:00 contains only a handful of countries according to the timezone map on Wikipedia – parts of western Russia, the arab peninsula, eastern Africa and Madagascar. Highlighting the Spammer’s operationThis spammer (or more likely: gang of spammers) is no freshman. He assembled a quite impressive machinery:
Additionally, there is most likely dedicated machinery to control the botnet and for other purposes (like seed proxies etc). Most likely, this can not be done by a single individual or a single group alone. It must be feared that there are specialists for the various parts, and that those specialists have a more or less close collaboration. Spamming is of course not an end of it’s own, but is rather a mean to sell whatever goods. When we suppose that this operation is not a full scam (ie get users to pay and then fail on delivery), there must be some supply organisation and a money trail. In the early days of spamming (which I would put at around 1994 to 1999), it was assuring to know that spammers where dumb and/or socially inept. The high degree of organisation which became obvious in investigating this incident, however, requires higher skills than simply being able to abuse a poorly configured mailserver. Still, one must be morally challenged to work in such an organisation. Anecdotal evidence available in newsgroups and web-boards also shows that some of those engaged in spamming have other deficits as well, but it’s undoubtful that we are dealing with highly motivated and clearly focussed individuals and groups. As with any organised crime, the most worrying thought is that these groups enlarge their reach into more-or-less legal areas. This is not only about money laundering, but also about leveraging the skills learned. Effective Security ControlsAs I already noted, the effectiveness of simple blacklisting based on general-purpose lists is declining. One reason is the short time-to-live of the sending proxies as shown above – 90% of the proxies were used for less than 25 minutes. Given the inevitable delay in adding IP addresses to blacklists, it’s hard to imagine that the accuracy of blacklists can be maintained at the levels of a few months back with catch rates of > 30%. Blocking based on sending patterns (eg detecting “waves” and “peaks” and thus dynamically adapting filters) carries a high risk of false positives, while other possible characteristics based on the connection (eg HELO or MAIL FROM domain) have lost (most of) their usefulness long ago. In effect, this shifts the burden of spam filtering to the next layer, based non-connection-oriented characteristics. “Simple” characteristics like keyword lists do not scale and are prone to false positives, as every spamfilter administrator of a reasonably sized and diversified user community will confirm. Statistical approaches (either local or collaborative) still help to a certain extent, but may be expensive to run and maintain (eg in terms of IO- and CPU-load). This leaves the chain of URL – domain – nameservers – registration information as the third pillar of effective spam filtering. Currently, it is still effective to filter based on blacklisting of the IP addresses “behind” a given URL. In the near future, additional reputation measurements will be necessary, like eg the “age” of a domain or the registry through which it was registered. In essence, a filter would then be able to apply stricter rules to domains of bad or unknown reputation. Domain ReputationManually assigning reputation to domains does not scale given the number of domains in use. Additionally, especially smaller receivers would have a very thin base of “trust reports” for their inbound data flow. On the other hand, questions of privacy and abusive reputation notices forbid a fully open collaborative system of sharing reputation. A simple algorithm to assign and share reputation on domains could be built as follows:
Everything in this list can be done with existing mechanisms and tools, with the exception of the first one — determining the age of a domain. In general, this can only be queried through whois, which is notorious for it’s unsuitability for automatic parsing. A better approach would be to — again — use the well established and light-weight DNS. Standards for such queries would need to be established, but should not be too difficult or controversial. To query the age of the domain “leisi.net” at the “domain age list” dal.example.org, the query could be “leisi.net.dal.example.org”, which would return either a TXT record with the registration date in an easily-parsable format (eg “20030926”) or an A record of 127.0.x.y, where x would be the age in days. Y could be the number of days since the last change (eg “127.0.255.64”). The maximum inherent in x and y (255) should not be a problem, since older domains should already be either “known good” or “known bad”. The complete setupDespite their decreasing effectiveness, SBL- and XBL-type DNSBLs remain an important first line of defense. The same goes for keyword-based filters (more so if you need to fulfill compliance guidelines) and for statistical approaches (if you can afford the load). The domain reputation will become more important than it already is today, and it can be enhanced with a certain “dynamisation”, which remains fully automated. Spamtraps become more important as well, since they provide a clearer picture than the typical user, who will more often than not be confused by his legitimate, paid-for investors newsletter and will report it as spam. The feedback loop with the user, finally, should rather strengthen the “positive scoring”, either in that they report false positives or by transparently using the outgoing stream as a basis for whitelisting the incoming stream. This can of course work on senders/recipients, keywords or domain reputation. |
QuicksearchBlog abonnierenBlog AdministrationRights & Wrongs |

As shortly mentioned in my other day’s posting on spammer strategy, I’ve written an experimental DNS server to query the age of a domain. The DNS server and the whois parser are written in Perl, and it uses GNU jwhois for the actual whois quer
Tracked: Nov 13, 19:13
The Register reports findings that the Time To Live (TTL) of spamvertized domains is extremely low: 40% are live for under a day (full story). This nicely matches with what I wrote two days ago and the experimental DNS server. Today, I added some addit
Tracked: Nov 14, 19:23