At the OSGIS UK conference last week the prize for best presentation went to Ken Arroyo Ohori of TU Delft and his talk on "Automatically repairing polygons and planar partitions". The presentation, jointly credited to Hugo Ledoux and Martijn Meijers detailed a method for cleaning up dodgy polygons which we've all probably seen in our geo-lifetimes.
I've noticed that computational geometry theorists seem averse to publishing code - its as if they don't want to sully the purity of their theory with dirty code, dealing with edge cases and arithmetic precision and all that. Not these guys. The code is on github, and I was eager to try it out. There was a lot of buzz about integrating this with PostGIS or QGIS or R, the stumbling block seeming to be the requirement for the CGAL library.
Anyway, back at Lancaster, and check out the code. Easy as:
git clone https://github.com/tudelft-gist/pprepair.git
make -f Makefile.linux
There were a couple of problems which I submitted as issues, but I managed to fix them before any response. Check these on the github issue tracker.
Then I wanted a sample dataset to test it on. From the workshop I gave at the UseR! conference I remembered some awful polygons outlining local police neighbourhood regions. I'd used it as an example of bad data. Some crime locations had fallen into cracks between the neighbourhood polygons, and some were in locations covered by more than one polygon. Maybe you could escape justice by claiming you were outside a police jurisdiction this way?
Here's what a small region looks like in Qgis:
Notice the white spaces between some polygons, and also the overlapping made visible by setting transparency.
The pprepair process is currently a command-line tool, but its options are simple. An input shapefile, an output shapefile and some repair parameters. But to start, all you need is "-fix".
pprepair -i neighbourhoods.shp -o fixed.shp -fix
Loading this into Qgis looks much nicer:
Ken's presentation claimed big wins on speed and memory usage over similar algorithms in both open-source and proprietary GIS packages. Once the licensing requirements of the CGAL library are solved, it would be great to see this algorithm pop up in all the open-source geospatial systems.