Tuesday, 11 September 2012

Fixing Polygons with pprepair

At the OSGIS UK conference last week the prize for best presentation went to Ken Arroyo Ohori of TU Delft and his talk on "Automatically repairing polygons and planar partitions". The presentation, jointly credited to Hugo Ledoux and Martijn Meijers detailed a method for cleaning up dodgy polygons which we've all probably seen in our geo-lifetimes.

I've noticed that computational geometry theorists seem averse to publishing code - its as if they don't want to sully the purity of their theory with dirty code, dealing with edge cases and arithmetic precision and all that. Not these guys. The code is on github, and I was eager to try it out. There was a lot of buzz about integrating this with PostGIS or QGIS or R, the stumbling block seeming to be the requirement for the CGAL library.

Anyway, back at Lancaster, and check out the code. Easy as:

git clone https://github.com/tudelft-gist/pprepair.git

and then 

make -f Makefile.linux

There were a couple of problems which I submitted as issues, but I managed to fix them before any response. Check these on the github issue tracker.

Then I wanted a sample dataset to test it on. From the workshop I gave at the UseR! conference I remembered some awful polygons outlining local police neighbourhood regions. I'd used it as an example of bad data. Some crime locations had fallen into cracks between the neighbourhood polygons, and some were in locations covered by more than one polygon. Maybe you could escape justice by claiming you were outside a police jurisdiction this way? 

Here's what a small region looks like in Qgis:
Notice the white spaces between some polygons, and also the overlapping made visible by setting transparency.

The pprepair process is currently a command-line tool, but its options are simple. An input shapefile, an output shapefile and some repair parameters. But to start, all you need is "-fix".

pprepair  -i neighbourhoods.shp -o fixed.shp -fix

Loading this into Qgis looks much nicer:
Everything is now clean, there's no overlaps and no holes (note the colours are generated by my semi-random topological colouring system so are different). You now have a consistent topology that won't cause odd errors when using other analysis techniques.

Ken's presentation claimed big wins on speed and memory usage over similar algorithms in both open-source and proprietary GIS packages. Once the licensing requirements of the CGAL library are solved, it would be great to see this algorithm pop up in all the open-source geospatial systems.


  1. Hi, I'm on Ubuntu and getting /usr/bin/ld: cannot find -lgdal1.6.0

    does it compiles just with gdal 1.6?

  2. Lucky you,
    I'm currently blocked on :
    make -f Makefile.linux

  3. > toirao : just edit "Makefile.linux" and set -lgdal1.6.0 to -lgdal

    it should now pass

    I encontred a situation after that with :
    CGAL::Default>, false>)]+0x448): undefined reference to `CGAL::precondition_fail(char const*, char const*, int, char const*)'
    collect2: ld returned 1 exit status
    make: *** [pprepair] Error 1

  4. Guys, suggest you take any problems to the github site and report an issue. Ken has now acknowledged and fixed the two I reported. Since it is just using 'make' there will be problems with dependencies not being checked which is probably why gdal1.6 is hard coded, and Herizo's last problem looks like missing CGAL headers...