Thursday 19 December 2013

Get Off My Server! Geocoding Attack Clients

So this year I had the joy of helping manage a WordPress (WP) instance for the FOSS4G conference in Nottingham.  That introduced me to the joy that is the paranoia of the WP manager. The justified paranoia.

WP managers lose sleep over exploits. Not sleeping is the only way they can be sure of never waking up to discover that their site has been cracked, and is now serving up malware, scraping users credentials, or part of a vast BitCoin-mining botnet. Patch everything, often.

There's also a lot of security plugins for WordPress, but I figured we ought to have something at a lower level, and my favourite first-line tool is fail2ban. You set up pattern-matching expressions, and when log files match those patterns, the system adds rules to the iptables ruleset to kick that connection.

After watching the log files and seeing the server slow down as WP tried to process hundreds of invalid requests, I figured out a rule that seemed to match most of them. My suspicion was that a lot of the WP exploit attempts used a kit, and that kit had a fairly clear signature. So along with the other handy rules in my fail2ban config, I added my rule too.

One of the outputs of fail2ban's logs is the IP address of each banned host. So I thought it might be nice to geocode them via the GeoIP database and see where they have all been coming from. "China" and "Russia" are the answers that most people seem to give when you ask them to speculate on the source of these attacks. Are they right?

So first, I took the log files that I had and extracted the IP address and timestamp of the ban. Then, using the Python GeoIP module, translated all the IP addresses to lat-long and country code. That gave me about 1200 locations from one month of retained log files.

Here's a table of the number of bans for the top few countries.

CountryBan Count
USA 573
Germany 76
France 50
Japan 49
Poland 39
Netherlands 37
Turkey 29
Spain 27
Indonesia 27
Vietnam 24
Great Britain 21
Myanmar 21
India 20
Russia 19
Austria 15

So the USA is clearly the big trouble here, with China coming in way down. Of course that's not to say all these US PCs aren't being controlled by Chinese or Russian botnets.

Now we have lat-long, we can save all this as a shapefile, and load into QGIS. Plot on an OpenStreetMap background.
Note that this map contains overlapping points and so isn't a perfect representation of density. Also, the spatial precision of the MaxMind GeoIP database varies wildly.

First I'd like to thank Australia and New Zealand for not bothering to try and hack our server. Much appreciated. Let's look east first:

Quite a good representation here, including Iran, most of south-east Asia. I don't know why Vietnam scored so highly in the table. Let's look at Europe:
Europe has a good spread of banned IP addresses, but Portugal, Greece, and Ireland have nothing. Maybe everyone unplugs their machines in the countries hit hardest by the financial crisis? Off to the biggie now, lets' check out the USA:
Mostly an east coast thing here. Examining the west coast in more details shows a lot more activity from LA than San Fran or points further north, up into Vancouver - thanks hipsters! What's going on on the east coast though? As they say on CSI, "enhance that area"...
Time to switch to Stamen BW maps here, just because I can, and because its a bit less distracting. Quite a few attackers around the state, but let's go closer. Take me into Lower Manhattan and enhance - with Bing Aerial Maps...
At this point the CSI team head off to the NYU tennis courts and find a guy with a laptop sitting in the middle on that patch of grass, trying to hack into the FOSS4G server. Of course we don't really have data at anything like that fine precision so it's quite meaningless. Only the NSA (and Mac Taylor and friends) can track you down that closely!

I don't know if there's any value in doing any more analysis of this particular data set, but it is at least handy to reverse some of those prejudices of people who say all the cyber attacks come from China or Russia. I've not used the timestamps of the data here, so it could be possible to create an animation of attack points from the data. If you'd like a copy of the data, get in touch.

Update!

I found another monthly tranch of fail logs. This looks very different, and we can point a finger at the Russians. Here's the top table:
Country Ban Count
USA 872
Russia 795
Japan 375
Peru 314
Thailand 308
Mexico 265
Philippines 206
Ukraine 182
Turkey 179
Ecuador 154
Kazakhstan 151
Iran 140
India 138
Vietnam 130
Indonesia 118
- which is a bit different! How did Peru get up there?

I had a quick play with some of QGIS' plotting functions, and discovered that if I used an SVG symbol with a few dots on it, and used the data-driven symbology to randomly rotate it, and set the opacity to something fairly small, I could get a much better impression of the density, including where overlapping points create hotspots. There's probably a density estimation plugin somewhere for QGIS, but until then, or until I can load the data into R and do a proper kernel-density estimation, this will have to do.