Saturday, 29 September 2012

The Taarifa Project

I seem to have been drawn in to helping a bit with The Taarifa Project for one reason or another. Mostly the enthusiasm of Mark Iliffe, but also the possibility of helping out with an interesting project.

For those who don't know and are too lazy to click the link I thoughtfully provided, here's the elevator pitch:

The Taarifa Platform is an open source web application for information collection, visualization and interactive mapping. It allows people to collect and share their own stories using various mediums such as SMS, Web Forms, Email or Twitter, placing these reports into a workflow. Where these reports can be followed up and acted upon, while engaging citizens and communities.

Currently it has been deployed in Uganda, and further deployments are planned. The current codebase is a heavily modified fork of Ushahidi, but the application has grown out of that now. The new project is a reboot and a new existence as a Django application.

Which suits me fine. I've done a lot of work with Django, including content management systems, applications for disease modelling and monitoring (that's my day job), and a social site that built a linked system of interesting technological new items. 

The Django version of Taarifa, djangorifa, is open source, and hosted on github. Getting the source and the requirements took about ten minutes. I first tried to get it running with Spatialite, but currently that doesn't work because Spatialite has some limitations that mean it can't work out distances given lat-long coordinates. Seems a bit odd since any schoolkid can do it with a calculator with sin and cos buttons. But never mind.

So I stuck PostGIS on my virtual box and away it went. Using the sample OpenStreetMap dump file supplied I had a demo site working, and I could start pretend-reporting things.

There's clearly a lot to do. I think from a top-level there's still plenty of functionality to re-think. I believe the PHP version had a complicated workflow for dealing with reports, so maybe that needs doing. There's also a lot of basic web site functionality too. I thought of a few things:

  1. Translation. For something intended for developing countries it has to be multilingual. Django supports this out of the box, and translations are supported in templates, python code, and even in URLs themselves, so a site could have URLs in several languages that go to the same effective page, and that page would be translated into that language. For example, /voiture/12 would show the page for car number 12 in French, and /car/12 would show it in English, and /auto/12 would show it in German. Not sure how you handle pages where the same word is used in two languages...
  2. Search. Vital for this system. I've used Haystack integrated into Django-CMS talking to a whoosh backend. Pretty heavy, and works well. I had a quick look for new django search projects and gave django-watson a test-drive. Worked straight out of the box. Uses Postgres' search facilities so indexing is all handled there. Might be my new goto- django search system.
  3. REST API. Data in is one thing, data out is another. For a community-centric system such as this, it would be great to let people get the data and do that web 2.0 mash-up think that you may remember from a few years ago. 
  4. Spatial Analytics. Take the report data and look for hotspots, time trends and so on. Present it all as nice graphs and maps. Make that available via a WMS server for mash-up purposes.
Check the project out if it looks interesting to you! http://taarifa.org/

Tuesday, 11 September 2012

Fixing Polygons with pprepair

At the OSGIS UK conference last week the prize for best presentation went to Ken Arroyo Ohori of TU Delft and his talk on "Automatically repairing polygons and planar partitions". The presentation, jointly credited to Hugo Ledoux and Martijn Meijers detailed a method for cleaning up dodgy polygons which we've all probably seen in our geo-lifetimes.

I've noticed that computational geometry theorists seem averse to publishing code - its as if they don't want to sully the purity of their theory with dirty code, dealing with edge cases and arithmetic precision and all that. Not these guys. The code is on github, and I was eager to try it out. There was a lot of buzz about integrating this with PostGIS or QGIS or R, the stumbling block seeming to be the requirement for the CGAL library.

Anyway, back at Lancaster, and check out the code. Easy as:

git clone https://github.com/tudelft-gist/pprepair.git

and then 

make -f Makefile.linux

There were a couple of problems which I submitted as issues, but I managed to fix them before any response. Check these on the github issue tracker.

Then I wanted a sample dataset to test it on. From the workshop I gave at the UseR! conference I remembered some awful polygons outlining local police neighbourhood regions. I'd used it as an example of bad data. Some crime locations had fallen into cracks between the neighbourhood polygons, and some were in locations covered by more than one polygon. Maybe you could escape justice by claiming you were outside a police jurisdiction this way? 

Here's what a small region looks like in Qgis:
Notice the white spaces between some polygons, and also the overlapping made visible by setting transparency.

The pprepair process is currently a command-line tool, but its options are simple. An input shapefile, an output shapefile and some repair parameters. But to start, all you need is "-fix".

pprepair  -i neighbourhoods.shp -o fixed.shp -fix

Loading this into Qgis looks much nicer:
Everything is now clean, there's no overlaps and no holes (note the colours are generated by my semi-random topological colouring system so are different). You now have a consistent topology that won't cause odd errors when using other analysis techniques.

Ken's presentation claimed big wins on speed and memory usage over similar algorithms in both open-source and proprietary GIS packages. Once the licensing requirements of the CGAL library are solved, it would be great to see this algorithm pop up in all the open-source geospatial systems.