Sunday, 27 December 2009

Making Interactive Plots with Flot

I've been working on a demonstration web site of a meningitis case reporting system in Africa. The current demo uses OpenLayers to show a map of the spatial distribution of cases, and a couple of PNG plots generated from R to show the cases in time for a particular area. One plot showed the cases for the past year, and one was a plot of the previous four weeks with our prediction of the next four weeks cases (plus 95% confidence interval).

But I thought these were a bit static. So I thought I'd add a bit more functionality. Options? Well, I could use my imagemap package for R to create hotspot areas on the plot so users can hover over points and get information, or click on points to zoom to epidemic predictions and so on. But I decided to try a Javascript plotting library instead. There's a few out there, and I settled on using Flot - although jqPlot looks pretty capable too.

Flot lets you plot lines, points and bars, and you can make filled areas by constructing a data series that goes along the top and back along the bottom, completing a polygon. You can shade and style the points and lines, and you get a legend. If your X-axis is time, you can feed it milliseconds and get dates. Here's what I've got to start with:

The buttons at the bottom control the time period. By clicking on 'Prediction', the user can zoom in to see the most recent few weeks, plus our two- and four-week predictions with 50% and 95% confidence interval. Flot also lets you add hover and click events to points:

Here we're showing that our 2-week ahead prediction has a 92% chance of exceeding our 10-case threshold.

Flot also allows zooming in on plots. See that earlier peak around July? We can select that area:

and Flot can show that time period in detail:

All very nice. I've note tested this on Internet Explorer yet, but jQuery and Flot should hide browser-specifics from the author so it should work. This all worked out pretty smoothly, except for a few problems. The biggest one is that if the user resizes the browser window, the plot doesn't resize properly. Not a problem if your DIV element has a fixed width, but I had width: 80% in my CSS. The fix was to redo the plot on a resize event of the window. In jQuery, that's:

                      plot = $.plot(plotDiv, d, options );

If anyone wants some more code, or the full example, then just ask!


Thursday, 17 December 2009

Polygon Shapefile Editing with QGIS

This is a little excursion into editing polygons with some of the new features in Qgis. I'm using a fresh-out-of-SVN of Qgis 1.4. Thanks to whoever fixed the right-click crash bug one hour after I reported it!

I was given a bitmap file representing the new boundaries of regions we are supposed to be working on for a project. I couldn't get this in a spatial format no matter how hard I tried. I did have some similar boundaries but there were a few changes here and there. I loaded the bitmap into the Qgis georeferencer and using my old boundary shapefile matched some points up and created a world file. I could then load the bitmap into Qgis. Here it is:

There's no way I want to redigitize the boundaries when I have something pretty close, and all the overlapping labels on this mean it would be tricky to automatically redigitize the boundaries. Zoom in and see the resolution for yourself.

So I loaded the nearest shapefile and overlaid it, set the fill colour to something transparent (if you set a fill to 'no fill' then you can't see the selection highlight), and activated the magic super labelling from the labelling plugin:

They match up pretty well, but zooming around shows a few problems:

I want to fix the border between Loga and Dosso to better match the underlying raster. First up, activate the editing mode:

Now I hit a problem. Since polygon shapefiles are doubly-digitised, I couldn't see how to move the points on both polygonal boundaries together. And if I moved first one, and then the other, I would probably end up with slivers or polygons that didn't quite fit. I hit on another strategy.

First, select Loga region:

then use the 'Split Feature' tool to cut off a little bit of Loga in the corner away from the edge. This is just to keep the attributes of Loga somewhere, since in a second we're going to merge it with Dosso:

Now we have two features with the same attributes. It's possible that ID numbers have changed here, so don't rely on them. That might explain a bug to appear later... So now, select Dosso and the main part of Loga, and use the Merge Selected Features tool:

This pops up the dialog asking you where the merged feature should get its attributes from. In this case we want the feature to take the attributes of Dosso:

This leaves a few stray bits lying around on the border:

So use the Delete Ring tool to get rid of them:

Now we are going to split Dosso along the new border. First select it:

Then use the Split Feature tool to split by following the border on the underlying raster:

Here's the bug. It's labelled Dosso with 'Loga'. If you check the attribute table at this point it still says 'Dosso', so maybe everything is okay. We'll press on. What we do next is to select the two parts that will make up the new Loga in preparation for another 'Merge Selected Features':

We do the merge, and get the attributes from the little corner of Loga that we cut off earlier:

Now we have the geometry we want:

Except it's still labelled Dosso as Loga. The attribute table seems fine though - if I select Dosso the right line is highlighted in the attribute table:

Never mind, let's turn off the editing and see what happens:

Ooh! It's all looking good! Dosso is Dosso and Loga is Loga.

Now I hope there is an easier way of doing this - a way of editing points along common boundaries such that both polygons are modified. But I can't find it, and I've asked on #qgis and nobody seems to know. If not, then this might be the best way to do it!

Friday, 11 December 2009

Why Screencasts Can Suck

A screencast is an instructional video where you see someone's computer screen as they do something, together with a voice-over of them telling you what they are doing. Google trends shows them starting in 2006 and increasing in search volume ever since. Everyone wants to do a screencast.

But are they being used for the right thing? They have their place, just as a screwdriver is the right tool for putting in screws, but there seems to me to be a lot of people using them in the wrong place.

Here's a few reasons why they suck, together with fixes for ameliorating the suckage:
  • Speech and graphics aren't searchable in a browser or indexable in a search engine
    • Fix: include a transcript or at least a number of keywords and keyphrases on the page.
  • Video is hard to jump to a precise point. If I didn't follow a point, or I want to see how you set something up near the start, I'd like to be able to jump back and forth to key points. This also makes it hard to go through a screencast at your own pace.
    • Fix: I think YouTube videos can have chapter points. It may also be possible that screencasting software can give you some links for this. Include them.
    • Fix: Alternatively, overlay your screencast with big chunky step numbers or text in the corner. Then as I scroll the fiddly little control back I can see it change from (3) to (2).
  • Speech and text can't be cut and pasted. This is a pain with command-line videos, less so with gui clickage.
    • Fix: supply a text transcript of commands and speech where appropriate.
  • I'm listening to music.  The voiceover doesn't go with my tunes.
    • Fix: include subtitles. Also helps people with hearing difficulties, which will include me if I carry on listening to Muse at these volumes.
Those problems apply to all screencasts. Here are some things that make bad screencasts:
  • Loss of resolution. If you are squeezing your entire monitor onto a little YouTube video something is going to get lost. Zoom in if you need to show detail.
  • Two Many Misteaks. I don't want to see you mistyping things. Or going "oops". Or stuttering. Or forgetting something. Shout "cut" to yourself and do another take. Or do several takes and edit. Remember you only have to get this right once, thousands of people may have to watch it.
  • Nothing happening. Perhaps Andy Warhol would enjoy an unchanging screen with some rambling banter over the top. But I'm not sure everyone else does. This is supposed to be instructable, not minimalist.
Most of these problems can be worked round by creating a series of screenshots and annotating them. I've used Wink for this in the past. It creates a Flash file which you can either play or step through one stage at a time. You can add callout annotations and make the mouse cursor sparkle when it clicks something. The user can step back and forth.

It's extra work to do this, but then it's extra work to do a screencast properly with transcript, subtitles, or key point markers. Terse and precise text is much better than a rambling voice-over. Screencasts may be great for demonstrating anything that has lots of animation in it, but to demonstrate the operation of a piece of point-and-click software it has it's limitations.

Just sayin'.

Tuesday, 1 December 2009

Converting an mdb to spatialite

There's loads of data on the European Environment Agency's web site. And I'm looking for data for my Open Source Geospatial course in January. That's next month now. A lot of the data is in shapefile or geoTiff form, which you can read right into Qgis, R, and with a bit of fiddling, OpenLayers. However there are some databases on there in .mdb format. That's Microsoft Access format. Oh great.

Luckily there's a bunch of command-line tools for extracting data from mdb files. Then I can import them into spatialite and add the geographic data to the tables.

Here's what I did. First get the mdb file from the EPER data set downloads. I'm not sure why the March 2008 version is considered later than the August version, but never mind. We unzip and now have EPER_dataset_27-03-2008.mdb to play with. I'm using the bash shell here for my unix commands.

Let's just set a variable to the filename because it's such a fiddle to type:


Now fire up spatialite-gui and create a new database called eper.sqlite. It should have some tables in it, namely: geom_cols_ref_sys, geometry_columns, and spatial_ref_sys . Quit spatialite-gui.

The next job is to generate the tables in the spatialite file. You can dump the schema from the mdb file and read it into spatialite easy enough, and I found that the sybase version of the schema was accepted better by spatialite than the default access option, or the oracle version:

mdb-schema $mdb sybase | spatialite eper.sqlite

Ignore the errors, they occur because the schema script tries to drop all tables before creating them.

Next up, read all the table data from the mdb and load into sqlite. I use the -I option of mdb-export to dump SQL INSERT statements, and the -R option to add the semicolons that spatialite needs:

for t in `mdb-tables $mdb` ; do
 echo $t
 mdb-export -I -R ";\n" $mdb $t |spatialite eper.sqlite

All my table names and column names are single words, so I don't need any quotes or the -S option to sanitize names. That's not always the case.

Now we have to spatially enable the Facility table. Fire up the gui again on the eper.sqlite file. You should see the new tables.

The Facility table has Longitude and Latitude columns which look like WGS84 coords, and indeed there is a Geographic Coordinate System column. Let's just check:

SELECT * FROM "Facility" where GeographicCoordinateSystem <> 'WGS84'

Oh dear. 28 of them appear to be in ETRS89. We'll ignore that for now, but it's always good to check.

The table needs a geometry column, so we add it thus:

SELECT AddGeometryColumn('Facility', 'the_geom', 4326, 'POINT', 2);

And now we fill that column by making points from the Lat/Long columns:

UPDATE Facility SET the_geom = MakePoint(Longitude,Latitude,4326)

From that point we can load the database in Qgis 1.4.x using the spatialite layer and plot the points, query them etc etc.

The database is a complex beast if you wish to relate the other emissions data to the facility points. I've constructed what I think is an approximate picture of the relations between the tables, but I may be wrong!

 The PDF/DBDesigner XML of this is available from me if you ask nicely! You could probably use this now to do stuff like creating a view in spatialite that mapped all emissions of a certain substance in various countries, or summarise emissions counts in countries and so on. There's a Flash Interface to the EPER data, but it could easily be done in OpenLayers... Anyone?

Anyway, that's how to get some mdb database into spatialite and plot it in Qgis!