Monday 16 November 2009

Choropleth mapping challenge in Qgis

Making choropleth maps - how hard can it be? It's just a bunch of data colouring another bunch of polygons, right?

This all started over at Flowing Data where Nathan used Python to hack an existing SVG file to colour the counties of the US according to data read in from a file of county unemployment levels. Dave Smith of Revolution Computing issued a challenge: do this in R. I'm always up for a challenge so I found a shapefile of the US counties with the FIPS county codes, and then using the rgdal and sp packages had a plot up in about twenty minutes. The next day Dave posted the challenge results.

Today on the R-Sig-Geo mailing list there has been a bit of traffic on using R or GIS for cartography. Various people arguing one way or the other about how what constitutes 'publication quality' and various opinions bouncing around. So I thought I'd revisit the challenge - this time using open-source GIS software.

Problem one is getting the unemployment data and map data - the unemployment data is linked as a CSV file from the Flowing Data blog entry, and shapefiles of US counties are downloadable after a quick google search.

There are various variations of the US counties depending on the time they were defined, whether Alaska and Hawaii are in their proper place or tucked neatly around the 'lower 48' to make mapping easier, and whether the data has the right FIPS codes. I found a shapefile with county data from 2000, full FIPS codes, and with AK and HI slotted in.

The next step is to create a geographic data set with the unemployment data as attributes. But first I had to fiddle with the CSV file. It had the FIPS codes in two parts - one for the state and one for the county. The full FIPS code is five digits, made up from the two parts with leading zeroes. I didn't want to use R or Python for this, so I loaded the CSV into Calc, the spreadsheet component of OpenOffice.

The first thing I did was to shift all the data down one row and add some headers. The only important ones are the two parts of the FIPS codes and the unemployment percentage.

To create the full FIPS code I have to concatenate the two parts (in columns B and C) with the leading zeroes - I put this formula into J2 and copy it down the column:

 =CONCATENATE(TEXT(B2;"00");TEXT(C2;"000"))

then I add the word 'FIPS' as the header in J1 for this column.

Now I have a spreadsheet with a FIPS and an UNEMPLOYMENT column, and a shapefile with a FIPS attribute. How do I join them?

A couple of ways spring to mind:
  1. Load everything into PostGIS and do a join or create a view
  2. Create a shapefile with the Unemployment data in the DBF
I think doing (1) isn't too difficult, but because I had OpenOffice open I figured I'd save the data as a DBF and see what I could do with Quantum GIS

In the Tools menu of Qgis I found the Data Management options and the Join Attributes function. Exactly what I needed! I set my Join Data to be the DBF of the unemployment data and FIPS to be the field to join with. Hit OK, and in under a minute I had a shapefile with the DBF data in place.

 
The new layer was loaded. Now all that was left was a bit of clicking in the Layer.. Properties.. Symbology dialog to colour the map in six classes with the same intervals as in the challenge, and using the same colours from the Color Brewer system. I also overlaid a state map with transparent fill and a thick solid white outline to get the same effect as the challenge map.

Here's how it looks in Qgis:

Note the large chunk of Alaska missing - that's a non-matching FIPS code. Oh well, nothing's perfect!

Qgis also has a map composer where you can do basic page layout for cartography. Here's one rendering of the unemployment map:


Here you can add layers, legends, and text, and also play with fonts and spacing and sizes. The maps can be output to SVG and PDF.

I'd call these publication-quality - at least they're a better quality than many maps I've seen published (wacky handwriting font excepted).

This has taken me about 45 minutes while waiting for the rain to stop so I can go home. Any gvSIG orArcGIS people out there want to have a go?

Thursday 5 November 2009

Workshop Report

I've just got back from London after a trip for a one-day workshop on mapping software in health research organised by The Infectious Disease Research Network. It was held at the Royal Geographical Society buildings - a wonderful place filled with history from RGS expeditions of the past.

The day was arranged by Mike Head of the IDRN and the meeting chaired by Prof. Graham Moon of Southampton University. The talks were varied enough to sustain interest for over a hundred people stuck in one room for the day. The applied talks before lunch combined the usual stories of fieldwork adventures with the gritty details of database servers and web mapping platforms.

During lunch I set up my e-presentation. It was a series of animated slides talking about the use of open-source geospatial software. I also took the opportunity to distribute some flyers for my course in January and the OSGeo group. I talked through my case-studies several times to interested people and was losing my voice by the time I was finishing off my little sticky chocolate cake.

After lunch two talks grabbed the audience - Tim Fendley's amusing tales of new map systems for pedestrians around London was illustrated by screenfuls of comic signage, many of which I encountered on my London wanderings the day after. Chris Phillips from MapAction talked about the work of getting maps and GIS technology out to disaster areas to help co-ordinate search, rescue, and aid deployment.

 The final talk was from Mikaela Keller on the Healthmap.org project. This is a system that takes news feeds and health data and maps it in real time. She talked about how a combination of computer language processing and human scanning produce reliable maps.  I'm still not sure how you could use this data to do rigorous statistical analysis but I don't think that's the point of it.

 After a summing-up from Graham we adjourned for wine and juice outside the hall, and then as numbered dwindled we remaining few jumped into a couple of taxis and headed for the John Snow pub in Soho. Dr John Snow was perhaps the first person to combine health and mapping when he plotted cases of cholera in 1854 and concluded the source was a water pump near where the pub that bears his name now stands. We raised our glasses to the good doctor after a day of discussion of the sort of work made possible by standing on his shoulders.