Monday, 16 November 2009

Choropleth mapping challenge in Qgis

Making choropleth maps - how hard can it be? It's just a bunch of data colouring another bunch of polygons, right?

This all started over at Flowing Data where Nathan used Python to hack an existing SVG file to colour the counties of the US according to data read in from a file of county unemployment levels. Dave Smith of Revolution Computing issued a challenge: do this in R. I'm always up for a challenge so I found a shapefile of the US counties with the FIPS county codes, and then using the rgdal and sp packages had a plot up in about twenty minutes. The next day Dave posted the challenge results.

Today on the R-Sig-Geo mailing list there has been a bit of traffic on using R or GIS for cartography. Various people arguing one way or the other about how what constitutes 'publication quality' and various opinions bouncing around. So I thought I'd revisit the challenge - this time using open-source GIS software.

Problem one is getting the unemployment data and map data - the unemployment data is linked as a CSV file from the Flowing Data blog entry, and shapefiles of US counties are downloadable after a quick google search.

There are various variations of the US counties depending on the time they were defined, whether Alaska and Hawaii are in their proper place or tucked neatly around the 'lower 48' to make mapping easier, and whether the data has the right FIPS codes. I found a shapefile with county data from 2000, full FIPS codes, and with AK and HI slotted in.

The next step is to create a geographic data set with the unemployment data as attributes. But first I had to fiddle with the CSV file. It had the FIPS codes in two parts - one for the state and one for the county. The full FIPS code is five digits, made up from the two parts with leading zeroes. I didn't want to use R or Python for this, so I loaded the CSV into Calc, the spreadsheet component of OpenOffice.

The first thing I did was to shift all the data down one row and add some headers. The only important ones are the two parts of the FIPS codes and the unemployment percentage.

To create the full FIPS code I have to concatenate the two parts (in columns B and C) with the leading zeroes - I put this formula into J2 and copy it down the column:

 =CONCATENATE(TEXT(B2;"00");TEXT(C2;"000"))

then I add the word 'FIPS' as the header in J1 for this column.

Now I have a spreadsheet with a FIPS and an UNEMPLOYMENT column, and a shapefile with a FIPS attribute. How do I join them?

A couple of ways spring to mind:
  1. Load everything into PostGIS and do a join or create a view
  2. Create a shapefile with the Unemployment data in the DBF
I think doing (1) isn't too difficult, but because I had OpenOffice open I figured I'd save the data as a DBF and see what I could do with Quantum GIS

In the Tools menu of Qgis I found the Data Management options and the Join Attributes function. Exactly what I needed! I set my Join Data to be the DBF of the unemployment data and FIPS to be the field to join with. Hit OK, and in under a minute I had a shapefile with the DBF data in place.

 
The new layer was loaded. Now all that was left was a bit of clicking in the Layer.. Properties.. Symbology dialog to colour the map in six classes with the same intervals as in the challenge, and using the same colours from the Color Brewer system. I also overlaid a state map with transparent fill and a thick solid white outline to get the same effect as the challenge map.

Here's how it looks in Qgis:

Note the large chunk of Alaska missing - that's a non-matching FIPS code. Oh well, nothing's perfect!

Qgis also has a map composer where you can do basic page layout for cartography. Here's one rendering of the unemployment map:


Here you can add layers, legends, and text, and also play with fonts and spacing and sizes. The maps can be output to SVG and PDF.

I'd call these publication-quality - at least they're a better quality than many maps I've seen published (wacky handwriting font excepted).

This has taken me about 45 minutes while waiting for the rain to stop so I can go home. Any gvSIG orArcGIS people out there want to have a go?

2 comments:

  1. Hi Barry,

    I just stumbled upon this article. I'm actually creating maps with R, and I've been having a tough time finding a shape file of the US with Alaska and Hawaii slotted in, like the one you have. You said you found it but did not indicate a source. I was hoping you could send me the file or a link to its location?

    Thanks,

    Nemrasp

    ReplyDelete
  2. **bump**

    I also like how your map looks -- could you reference your source for the county and state shapefile used in this tutorial? Thanks in advance -- Kevin.

    ReplyDelete