Tuesday 1 December 2009

Converting an mdb to spatialite

There's loads of data on the European Environment Agency's web site. And I'm looking for data for my Open Source Geospatial course in January. That's next month now. A lot of the data is in shapefile or geoTiff form, which you can read right into Qgis, R, and with a bit of fiddling, OpenLayers. However there are some databases on there in .mdb format. That's Microsoft Access format. Oh great.

Luckily there's a bunch of command-line tools for extracting data from mdb files. Then I can import them into spatialite and add the geographic data to the tables.

Here's what I did. First get the mdb file from the EPER data set downloads. I'm not sure why the March 2008 version is considered later than the August version, but never mind. We unzip and now have EPER_dataset_27-03-2008.mdb to play with. I'm using the bash shell here for my unix commands.

Let's just set a variable to the filename because it's such a fiddle to type:

 mdb=EPER_dataset_27-03-2008.mdb

Now fire up spatialite-gui and create a new database called eper.sqlite. It should have some tables in it, namely: geom_cols_ref_sys, geometry_columns, and spatial_ref_sys . Quit spatialite-gui.

The next job is to generate the tables in the spatialite file. You can dump the schema from the mdb file and read it into spatialite easy enough, and I found that the sybase version of the schema was accepted better by spatialite than the default access option, or the oracle version:

mdb-schema $mdb sybase | spatialite eper.sqlite

Ignore the errors, they occur because the schema script tries to drop all tables before creating them.

Next up, read all the table data from the mdb and load into sqlite. I use the -I option of mdb-export to dump SQL INSERT statements, and the -R option to add the semicolons that spatialite needs:

for t in `mdb-tables $mdb` ; do
 echo $t
 mdb-export -I -R ";\n" $mdb $t |spatialite eper.sqlite
done


All my table names and column names are single words, so I don't need any quotes or the -S option to sanitize names. That's not always the case.

Now we have to spatially enable the Facility table. Fire up the gui again on the eper.sqlite file. You should see the new tables.

The Facility table has Longitude and Latitude columns which look like WGS84 coords, and indeed there is a Geographic Coordinate System column. Let's just check:

SELECT * FROM "Facility" where GeographicCoordinateSystem <> 'WGS84'

Oh dear. 28 of them appear to be in ETRS89. We'll ignore that for now, but it's always good to check.

The table needs a geometry column, so we add it thus:

SELECT AddGeometryColumn('Facility', 'the_geom', 4326, 'POINT', 2);

And now we fill that column by making points from the Lat/Long columns:

UPDATE Facility SET the_geom = MakePoint(Longitude,Latitude,4326)

From that point we can load the database in Qgis 1.4.x using the spatialite layer and plot the points, query them etc etc.

The database is a complex beast if you wish to relate the other emissions data to the facility points. I've constructed what I think is an approximate picture of the relations between the tables, but I may be wrong!




 The PDF/DBDesigner XML of this is available from me if you ask nicely! You could probably use this now to do stuff like creating a view in spatialite that mapped all emissions of a certain substance in various countries, or summarise emissions counts in countries and so on. There's a Flash Interface to the EPER data, but it could easily be done in OpenLayers... Anyone?

Anyway, that's how to get some mdb database into spatialite and plot it in Qgis!

5 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. All my comments does not apply since I am using Windows XP to follow the instructions. Need mdbtools which is not available for Windows. Noli

    ReplyDelete
  4. I know a lot of other types of files. But one of them,such as access files I din't know what to do next. Yesterday something happened with its. And I used the next software - Recovery Access. The utiltiy solved my issue for seconds and for free as far as I remembered.

    ReplyDelete
  5. Hi
    Great post. I was actually doing the exact same thing as you with the EPRTR dataset, as I wanted to get some pollutant data for educational uses. I'll try out your workflow for converting .mdb to .sqlite. My previous solution was to work one table at a time, exporting it to csv and then importing in sqlite (quite tedious).

    Thank you

    ReplyDelete