tag:blogger.com,1999:blog-25731564393533703602024-03-14T17:12:21.127+00:00Barry Rowlingson's GeoSpatial BlogAnonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.comBlogger42125tag:blogger.com,1999:blog-2573156439353370360.post-72375150303407794332013-12-19T15:05:00.001+00:002013-12-19T18:05:41.654+00:00Get Off My Server! Geocoding Attack ClientsSo this year I had the joy of helping manage a WordPress (WP) instance for the FOSS4G conference in Nottingham. That introduced me to the joy that is the paranoia of the WP manager. The justified paranoia.<br />
<br />
WP managers lose sleep over exploits. Not sleeping is the only way they can be sure of never waking up to discover that their site has been cracked, and is now serving up malware, scraping users credentials, or part of a vast BitCoin-mining botnet. Patch everything, often.<br />
<br />
There's also a lot of security plugins for WordPress, but I figured we ought to have something at a lower level, and my favourite first-line tool is fail2ban. You set up pattern-matching expressions, and when log files match those patterns, the system adds rules to the iptables ruleset to kick that connection.<br />
<br />
After watching the log files and seeing the server slow down as WP tried to process hundreds of invalid requests, I figured out a rule that seemed to match most of them. My suspicion was that a lot of the WP exploit attempts used a kit, and that kit had a fairly clear signature. So along with the other handy rules in my fail2ban config, I added my rule too.<br />
<br />
One of the outputs of fail2ban's logs is the IP address of each banned host. So I thought it might be nice to geocode them via the GeoIP database and see where they have all been coming from. "China" and "Russia" are the answers that most people seem to give when you ask them to speculate on the source of these attacks. Are they right?<br />
<br />
So first, I took the log files that I had and extracted the IP address and timestamp of the ban. Then, using the Python GeoIP module, translated all the IP addresses to lat-long and country code. That gave me about 1200 locations from one month of retained log files.<br />
<br />
Here's a table of the number of bans for the top few countries.<br />
<br />
<table>
<tbody>
<tr> <th>Country</th><th>Ban Count</th> </tr>
<tr> <td align="right">USA</td> <td align="right">573</td> </tr>
<tr> <td align="right">Germany</td> <td align="right">76</td> </tr>
<tr> <td align="right">France</td> <td align="right">50</td> </tr>
<tr> <td align="right">Japan</td> <td align="right">49</td> </tr>
<tr> <td align="right">Poland</td> <td align="right">39</td> </tr>
<tr> <td align="right">Netherlands</td> <td align="right">37</td> </tr>
<tr> <td align="right">Turkey</td> <td align="right">29</td> </tr>
<tr> <td align="right">Spain</td> <td align="right">27</td> </tr>
<tr> <td align="right">Indonesia</td> <td align="right">27</td> </tr>
<tr> <td align="right">Vietnam</td> <td align="right">24</td> </tr>
<tr> <td align="right">Great Britain</td> <td align="right">21</td> </tr>
<tr> <td align="right">Myanmar</td> <td align="right">21</td> </tr>
<tr> <td align="right">India</td> <td align="right">20</td> </tr>
<tr> <td align="right">Russia</td> <td align="right">19</td> </tr>
<tr> <td align="right">Austria</td> <td align="right">15</td> </tr>
</tbody></table>
<br />
So the USA is clearly the big trouble here, with China coming in way down. Of course that's not to say all these US PCs aren't being controlled by Chinese or Russian botnets.<br />
<br />
Now we have lat-long, we can save all this as a shapefile, and load into QGIS. Plot on an OpenStreetMap background.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWC1j0ocz8J77kRU4aCD6MxuFl3E91sc9Anim6rFgTV181pMffXv_b47v9BQuHW4bJw49r-zTWbGYf3Sp6unnaU3yMBG6JtA4ULCfOzGrC5x1L-aRKF6XjTjhFuDqvHsi21Kt3XDkVUz8/s1600/global.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWC1j0ocz8J77kRU4aCD6MxuFl3E91sc9Anim6rFgTV181pMffXv_b47v9BQuHW4bJw49r-zTWbGYf3Sp6unnaU3yMBG6JtA4ULCfOzGrC5x1L-aRKF6XjTjhFuDqvHsi21Kt3XDkVUz8/s320/global.png" width="320" /></a></div>
Note that this map contains overlapping points and so isn't a perfect representation of density. Also, the spatial precision of the MaxMind GeoIP database varies wildly.<br />
<br />
First I'd like to thank Australia and New Zealand for not bothering to try and hack our server. Much appreciated. Let's look east first:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaUERuxpywGRfsldK-BpmfcH6AXsLCGULnfA-j8-QMGoGeWzap7K8LC9EzmXVmfhaQt2oVel46KOp7wT9zwFg2NBBOJXaA6cPc7zCOG-xsbwuHSIxl8BPRn4yesWL6eOUM45BYqvBteVY/s1600/east.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaUERuxpywGRfsldK-BpmfcH6AXsLCGULnfA-j8-QMGoGeWzap7K8LC9EzmXVmfhaQt2oVel46KOp7wT9zwFg2NBBOJXaA6cPc7zCOG-xsbwuHSIxl8BPRn4yesWL6eOUM45BYqvBteVY/s320/east.png" width="320" /></a></div>
<br />
Quite a good representation here, including Iran, most of south-east Asia. I don't know why Vietnam scored so highly in the table. Let's look at Europe:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaPtORLCFhp0GVLZVYwNpAqBlpNM3vkNM2zUYZuXzDV_HSGKm4E3y7-gBFPrjUHjb5RqisPlIJgEK-Jc3K3FgZ0c_wPZFKM4bAp9EjGwFZ3oh_ipLY5jY1aiamPdYSNZEPZjIyYOK-l3Q/s1600/europe.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaPtORLCFhp0GVLZVYwNpAqBlpNM3vkNM2zUYZuXzDV_HSGKm4E3y7-gBFPrjUHjb5RqisPlIJgEK-Jc3K3FgZ0c_wPZFKM4bAp9EjGwFZ3oh_ipLY5jY1aiamPdYSNZEPZjIyYOK-l3Q/s320/europe.png" width="320" /></a></div>
Europe has a good spread of banned IP addresses, but Portugal, Greece, and Ireland have nothing. Maybe everyone unplugs their machines in the countries hit hardest by the financial crisis? Off to the biggie now, lets' check out the USA:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWreAdwypPZYu0Ydq26tTfe0IgT50XjBXEshVnscauHxypAGVf2uv-HTgT8AbvIEdbrhlgrPHqUzCfKNGfLGynwh1j2_8ykfDbbVIl0tp5PRJQf3CaLqLkcEmKOKt4LEIuLnV3Yi03Z4k/s1600/northamerica.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWreAdwypPZYu0Ydq26tTfe0IgT50XjBXEshVnscauHxypAGVf2uv-HTgT8AbvIEdbrhlgrPHqUzCfKNGfLGynwh1j2_8ykfDbbVIl0tp5PRJQf3CaLqLkcEmKOKt4LEIuLnV3Yi03Z4k/s320/northamerica.png" width="320" /></a></div>
Mostly an east coast thing here. Examining the west coast in more details shows a lot more activity from LA than San Fran or points further north, up into Vancouver - thanks hipsters! What's going on on the east coast though? As they say on CSI, "enhance that area"...<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVbtkYHGAQgnpOyljQDKU1yJ3Q1DhY42ev-WvFuG6ZnMQme0N5Lbtswl6Sy6why5O3dElS7JtZ3rTt_lPE6Pf2pzwWq243HRUdO9kezcMMFaR2YyY1Q5-jre_ADdLZv-zg6P8_eYU4kTQ/s1600/ny.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVbtkYHGAQgnpOyljQDKU1yJ3Q1DhY42ev-WvFuG6ZnMQme0N5Lbtswl6Sy6why5O3dElS7JtZ3rTt_lPE6Pf2pzwWq243HRUdO9kezcMMFaR2YyY1Q5-jre_ADdLZv-zg6P8_eYU4kTQ/s320/ny.png" width="320" /></a></div>
Time to switch to Stamen BW maps here, just because I can, and because its a bit less distracting. Quite a few attackers around the state, but let's go closer. Take me into Lower Manhattan and enhance - with Bing Aerial Maps...<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDce9MZB-J9ZYFuvRbvraU5TSimt7iv5a83D7A8P4aNxkmWchAzTNQy8ffM4G8oFX08Br5FV4wIQsnB-2I9D1cZC2MRn4HkTfd5HVGnrlPNMh2A4GV0qL32DcKudKMcuVCkklZjNFvHGA/s1600/nyzoomsat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDce9MZB-J9ZYFuvRbvraU5TSimt7iv5a83D7A8P4aNxkmWchAzTNQy8ffM4G8oFX08Br5FV4wIQsnB-2I9D1cZC2MRn4HkTfd5HVGnrlPNMh2A4GV0qL32DcKudKMcuVCkklZjNFvHGA/s320/nyzoomsat.png" width="320" /></a></div>
At this point the CSI team head off to the NYU tennis courts and find a guy with a laptop sitting in the middle on that patch of grass, trying to hack into the FOSS4G server. Of course we don't really have data at anything like that fine precision so it's quite meaningless. Only the NSA (and Mac Taylor and friends) can track you down that closely!<br />
<br />
I don't know if there's any value in doing any more analysis of this particular data set, but it is at least handy to reverse some of those prejudices of people who say all the cyber attacks come from China or Russia. I've not used the timestamps of the data here, so it could be possible to create an animation of attack points from the data. If you'd like a copy of the data, get in touch.<br />
<br />
<h2>
Update!</h2>
<div>
I found another monthly tranch of fail logs. This looks very different, and we can point a finger at the Russians. Here's the top table:</div>
<table>
<tbody>
<tr> <th>Country </th> <th>Ban Count </th> </tr>
<tr> <td align="right">USA </td> <td align="right">872 </td> </tr>
<tr> <td align="right">Russia </td> <td align="right">795</td> </tr>
<tr> <td align="right">Japan </td> <td align="right">375</td> </tr>
<tr> <td align="right">Peru </td> <td align="right">314</td> </tr>
<tr> <td align="right">Thailand </td> <td align="right">308</td> </tr>
<tr> <td align="right">Mexico </td> <td align="right">265</td> </tr>
<tr> <td align="right">Philippines </td> <td align="right">206</td> </tr>
<tr> <td align="right">Ukraine </td> <td align="right">182</td> </tr>
<tr> <td align="right">Turkey </td> <td align="right">179</td> </tr>
<tr> <td align="right">Ecuador </td> <td align="right">154</td> </tr>
<tr> <td align="right">Kazakhstan </td> <td align="right">151</td> </tr>
<tr> <td align="right">Iran </td> <td align="right">140</td> </tr>
<tr> <td align="right">India </td> <td align="right">138</td> </tr>
<tr> <td align="right">Vietnam </td> <td align="right">130</td> </tr>
<tr> <td align="right">Indonesia </td> <td align="right">118</td> </tr>
</tbody></table>
- which is a bit different! How did Peru get up there?<div>
<br /></div>
<div>
I had a quick play with some of QGIS' plotting functions, and discovered that if I used an SVG symbol with a few dots on it, and used the data-driven symbology to randomly rotate it, and set the opacity to something fairly small, I could get a much better impression of the density, including where overlapping points create hotspots. There's probably a density estimation plugin somewhere for QGIS, but until then, or until I can load the data into R and do a proper kernel-density estimation, this will have to do.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgj1EamnBXxE6X1kDLJfY4ra3TgbvlLrQhiYubOCQRplDT3TwN_y1FXgZmSkzvqpzR-p7D5CM5BW-Qeppn07fMFajPSL5aUlBy_IkNQUMMa9uZfVwRNK4h3PMht9kgS-G0wfC8vrrA-9yc/s1600/earlydataspatter.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgj1EamnBXxE6X1kDLJfY4ra3TgbvlLrQhiYubOCQRplDT3TwN_y1FXgZmSkzvqpzR-p7D5CM5BW-Qeppn07fMFajPSL5aUlBy_IkNQUMMa9uZfVwRNK4h3PMht9kgS-G0wfC8vrrA-9yc/s320/earlydataspatter.png" width="320" /></a></div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.comEllel, Lancashire, UK54.007768761934777 -2.78640747070312553.970432261934775 -2.8670884707031252 54.045105261934779 -2.7057264707031248tag:blogger.com,1999:blog-2573156439353370360.post-62016908641160367482013-10-07T12:50:00.003+01:002013-10-07T12:50:49.381+01:00Refugee Camp Spatial AnalysisZaatari Refugee camp was established in 2012 to host Syrian refugees fleeing the fighting in the Syrian civil war [<a href="http://en.wikipedia.org/wiki/Zaatari_refugee_camp" target="_blank">Wikipedia</a>]. After a tweet and a blog post from <a href="http://www.statisticsviews.com/details/feature/5322231/Environmental-Geospatial-Statistics-of-Zaatari-Refugee-Camp.html" target="_blank">Lillian Pierson</a> I thought I'd see what data was around.<br />
<br />
The UNOSAT web site has <a href="http://www.unitar.org/unosat/node/44/1832" target="_blank">shelter locations</a> in the camp,as a shapefile, but I also had a look at OpenStreetMap to see what was there. As well as shelter locations, it also has locations of other facilities such as toilets, kitchens, mosques etc.<br />
<br />
So for starters I thought a simple kernel density estimate of shelters might be something to do. The plan was to get the OSM data using the R package <b>osmar</b>, then create a 2d smoothing, then write that as a raster and map it in QGIS. Here's the R code:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"><b># need these packages:</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>require(osmar)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>require(KernSmooth)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>require(raster)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># define the source for OSM data</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>api = osmsource_api()</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># define the area and get the data</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>box=corner_bbox(36.310,32.281,36.338,32.303)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>camp=get_osm(box,source=api)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b><br /></b></span>
<span style="font-family: Courier New, Courier, monospace;"><b># subset the shelters - first find shelter ids:</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>shelters=find(camp,node(tags(v == "shelter")))</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># then subset:</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>shelters=subset(camp,node_ids=shelters)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b><br /></b></span>
<span style="font-family: Courier New, Courier, monospace;"><b># convert to spatial object for kernel smoothing</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>sh=as_sp(shelters)</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># set bandwidth by trial and error</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>k1 = bkde2D(coordinates(sh$points),bandwidth=0.0002,gridsize=c(200,200)</b></span><b style="font-family: 'Courier New', Courier, monospace;">)</b><br />
<span style="font-family: Courier New, Courier, monospace;"><b># convert to raster</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>rsh = raster(list(x=k1$x1,y=k1$x2,z=k1$fhat))</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># set CRS to lat-long</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>projection(rsh)=CRS("+init=epsg:4326")</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b># write a GeoTIFF file</b></span><br />
<span style="font-family: Courier New, Courier, monospace;"><b>writeRaster(rsh,"shelters.0002.tiff","GTiff")</b></span><br />
<div>
<br /></div>
<div>
With that done, you can load the tiff into QGIS and plot it over OSM basemap data. Stick that in a Map Composition and get this:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9INjRNLb_Pd7NBZaJvVMpYTgMjHdTE606v8Rvwh4Sj72G7qQIMv39svMijPacNOGhTbA3VqIiest7fZzZ6J_w9BIN-VfryXXHKXFk7R3p4v0QeIFEpRGzIqi9ch55aTo4glObqXhPap4/s1600/density.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="219" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9INjRNLb_Pd7NBZaJvVMpYTgMjHdTE606v8Rvwh4Sj72G7qQIMv39svMijPacNOGhTbA3VqIiest7fZzZ6J_w9BIN-VfryXXHKXFk7R3p4v0QeIFEpRGzIqi9ch55aTo4glObqXhPap4/s320/density.jpg" width="320" /></a></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
Obviously there's some problems with this - I should probably convert everything to a proper cartesian coordinate system so the units can be in people per square meter rather than people per square degree (which varies by latitude...), but this was a quick analysis on a Monday morning. The bounding box is also a bit small on the left and the top, the bandwidth was chosen to make the map look good and so on.</div>
<div>
<br /></div>
<div>
I'm not sure this analysis in itself is any practical use. I don't know how much GIS analysis the camp management are using, but this illustrates what can be done with open data and free and open source software. Anyone can do this.</div>
<div>
<br /></div>
<div>
Next steps might be to see how shelters relate to facilities, which is a first step to planning new facilities and is a classic GIS problem. With Zaatari becoming one of the largest settlements in Jordan, there is clearly a need for expansion planning in the camp.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-31286523416825288282013-09-23T22:55:00.000+01:002013-09-23T22:55:43.942+01:00My Contribution to FOSS4GThe purpose of this post is to log all the stuff I did for FOSS4G 2013 in Nottingham. I'm excluding stuff we all did, like writing the proposal, choosing presentations, proofreading and general help. We all did lots of that. I tried to keep my tasks to those that required more communication with computers rather than with human beings, and left that messy business to others on the committee.<br />
Its in sort-of chronological order, but plenty of these tasks overlap.<br />
<br />
<h3>
Setting up the WP skin</h3>
After Jo and Barry K had set up the Amazon server and installed WP and MySQL, we all messed around styling it for a bit until I worked out the basics of WP skinning, and built the current skin. That involved some CSS and understanding the WP method for getting page templates. I found a simple WP plugin for handling the sponsors which we thought might be useful for more of the conference data systems. I did the image carousel for the home page. Some PHP was written. Sorry.<br />
<br />
<h3>
Sysadmin duties</h3>
<div>
Jo set up email backup of the MySQL DB. I set up incremental network backups of the database and sections of the filestore to a large external USB device on my work desktop.</div>
<div>
I requisitioned a spare server from work and used it to set up a development environment with WP and MySQL on it. Changes tested there were pushed to the live server.<br />
<br /></div>
<h3>
OJS selection setup and admin</h3>
<div>
Before deciding on OJS for the Academic Track submissions, I looked at other possibilities including hosted solutions such as EasyChair. Eventually I installed, adminned, and themed slightly an OJS instance alongside WP on the Amazon box, and our academic chairs managed the process from there.<br />
<br /></div>
<h3>
Graphic design and logo from contest winner</h3>
<div>
Naomi Gale's logo was selected as the winner, and I tweaked it a bit to make some usable SVG and PDF files. I also produced some sample style guides, as well as B+W and inverse colour versions. I also made some A5 flyers (laid out 2 on an A4 page for slicing) early on for publicity.<br />
<br /></div>
<h3>
OSGeo-discuss mailings</h3>
<div>
Somehow I got the job of keeping the OSGeo-discuss list up to date with progress. We felt this necessary after the previous year's FOSS4G fell apart - one of the criticisms was lack of communication between that local team and OSGeo. So every two weeks or so, after our committee meetings, I'd summarise progress and write up a little note for the mailing list. These mailings were linked on the OSGeo wiki site.<br />
<br /></div>
<h3>
Maptember - concept, website, logo, shirt design</h3>
<div>
I can't remember who first noticed all these good things going on in September. I think the first phrase was 'Geotember', which was a bit jarring to my ears. So I coined 'maptember'. I designed a simple web site, animated logo, and hosted it on the same Amazon server. I handled incoming event requests and found some myself to add to the page. I designed the t-shirts (and Rollo sorted the printing).<br />
<br /></div>
<h3>
Workshop booking system</h3>
<div>
We opened registration before we had the workshops arranged, or even chosen. At that point, people could book one day or two days of workshops. Given that they might have ended up wanting to attend two half day workshops on different days, we decided to go for a one- or two-day of credits scheme, and allow flexibility.</div>
<div>
So I built the booking system. At this time I was also investigating conference management solutions, including hosted solutions from Eldarion (using the open-source Symposion system). However, most of what they do we had already done (conference registration, paper submission) so instead I took large chunks from the German python user group's Django site (pyconde) and built the workshop reservation system. This was developed on a home Linux box and managed via github - changes were simply pushed to the live server.</div>
<div>
Most of the user handling stuff was already there (logins, passwords etc). I created a new Django app to create a few tables for the workshops in the database. The user front-end for booking carefully made sure the user didn't book more than they'd paid for or book overlapping workshops. In all it registered about 300 people.</div>
<div>
User data came into the system via a spreadsheet emailed daily from Claire, taken out of the main registration system. I ran this through a python script that updated the database and then printed a list of new user email addresses to whom I would then send an announcement. This step could have been automated completely, but I wanted to keep a close check on the process so ran it by hand most days.</div>
<div>
Some custom reports were written for the workshop system so that the admin desk staff and workshop presenters can check off attendees and possibly open up new spaces if people don't show.<br />
<br /></div>
<h3>
Conference timetable system - data integration from submissions and OJS. </h3>
<div>
The program selection was done via shared google docs spreadsheets on which the committee recorded their votes and the community votes were added. After the process Rollo arranged the selected presentations on a spreadsheet in timetable format with sessions. I created the first provisional timetable on the web site by basically saving that as HTML and tweaking it greatly to make it a bit more usable. However this was not easily maintainable so I set about thinking of a better system.</div>
<div>
Meanwhile Rollo was refining the order of presentations on the timetable. To make this easier for him I developed 'slotdrop', so he could drag-and-drop presentations between slots on the web page, with the changes in slot assignments being reflected in the database. The database now had presentations, sessions, people, locations, tags and so on.</div>
<div>
Parts of the pyconde system already being used for bookings could have handled this, since it had facility for submissions, rooms, and schedules. However I considered that we already had a lot of this functionality already sorted and it would be easier to simply build a few more Django database tables and construct it all in custom views. </div>
<div>
We were now considering the Django database as the master record of presentation sessions, and all changes were being made against that. This was becoming out of sync with the web site's provisional timetable and people were starting to notice. So I then spent a few days developing the database-driven timetable pages, including plenary sessions and other special events. These pages included a python-whoosh search index, tags to highlight certain talks, hyperlinks between sessions, presentations, authors, buildings etc. A simple cookie-based favourites system was implemented, with an ICS calendar file download option.</div>
<div>
I used Leaflet.js for the first time to produce building location map pages.<br />
<br /></div>
<h3>
Mobile App</h3>
<div>
Although the timetable was usable on mobile apps it wasn't completely optimal - for one thing it needed a live internet connection. I went looking for suitable mobile apps with disconnected operation, although not having an iOS device meant I knew it would be hard to find one for that platform.</div>
<div>
There are a number of companies who will take your conference and produce a stylish mobile conference app for it. For a price (and when the price isn't specified, you know its a big price). I looked for free and preferably open solutions.</div>
<div>
"Giggity" provided that solution for Android . It is an open-source app that reads schedule files and keeps them for off-line use. It can show a timetable, room streams, give reminders and so on. I wrote a Django view to convert our conference programme into the right XML format and publicised the URL for it.</div>
<div>
For iOS, I did find a similar application, but the file format wasn't documented. It was plain text, and I started a reverse-engineering effort, but without a device to test it on I didn't want to waste my time to much. The app came with a Windows program for creating schedules but for one of our size that wasn't an option. Recently that app's web site has been down, so I'm not sure if it is well-supported.</div>
<div>
I also looked at using HTML5 offline storage for a mobile solution. The Chaos Computer Club (a group of hackers I first encountered in 1985, but that's another story) have an HTML/JS/CSS solution for their conference schedule which I attempted to convert to using the HTML5 Application Cache. I did have this working but I was unsure about exactly how the cache worked. And anyway, we plan to have good connectivity at the conference.<br />
<br /></div>
<h3>
Volunteer system</h3>
<div>
Abi Page took on the job of volunteer wrangler for the conference. I created some Django database tables for recording volunteer activity and some reports so she could see what volunteers were signed up for. This included daily volunteer roster sheets and tables showing where volunteers were needed. Abi entered the data into the system.<br />
<br /></div>
<h3>
Pledge site, concept, implementation, admin</h3>
<div>
During one of our fortnightly conference calls I had this pledge page idea. I pitched it to Steven and he loved it. First I thought I could implement it using Google Forms, and investigated ways to customise Google Forms pages and collect pledges that way. However at the time I was also setting up Django for the workshop database and decided it would be easier done that way. The site is one form with some client and server-side validation and a simple mathematical CAPTCHA field. New pledges are notified by email. I can then use the Django admin to accept, reject or delete the pledge. The page of pledges shows ten random pledges and allows the viewer to 'like' a pledge. A simple cookie is used to stop trivial multiple voting. For the end of the conference I quickly re-purposed a Javascript "Star Wars" style scroller found online to display all the pledges during the run-up to the closing session.<br />
<br /></div>
<h3>
Map Exhibit</h3>
<div>
I created the web page with the voting system, and the screens for the iPad wall. Ken supplied some static graphics from each map entry and produced the movie for the large plasma screens. I wrote a web page that showed thumbnails of all the entries in random order, using Isotope for a dynamic box layout and fancybox to popup a larger preview. A link went to the URL for web entries, the PDF for static maps, or YouTube for movie entries. Voting used a similar system to the pledge voting to count the popular vote. I wrote a report to list the current vote count, and checked for obvious ballot-stuffing.<br />
The wall of iPads was driven by a web page that took a range of map entries and displayed each one, with a title and author overlay, with a 'Ken Burns'-style transition between them. This was all made easy with various jQuery plugins. With 77 entries we had 7 maps on each of 11 iPads, and a 12th iPad was used to display a set of credits and information.<br />
<br /></div>
<h3>
Nerds</h3>
<div>
I suggested we try and hire The Festival of the Spoken Nerd for the Gala Night after seeing some of their performances on TV and on YouTube. I contacted their agent and despite only two of them being available we decided they could still put on a great show. And they did.</div>
<br />
<br />Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-15412860633729032352013-06-12T17:15:00.001+01:002013-06-24T17:28:03.687+01:00 Formation of a Research Computing Users Group at LancasterAs part of my Fellowship with the Software Sustainability Institute, I want to encourage better communication and collaboration between people using computing in their research. Anyone who spends a chunk of their research time writing programs, manipulating data or producing interesting graphs - as well as those who have an interest in improving their skills in those areas - are welcome to the inaugural meeting of the totally informal Research Computing Users Group. This will bring together researchers across faculties to share best practice and new skills in research computing.<br />
<br />
The initial meeting will be at 1pm on the 5th of July in the A54 Lecture Theatre of the Postgraduate Statistics Centre. As I have no idea of numbers at this stage, could you please fill in this <a href="http://www.doodle.com/xkpvaxwsq8k8xhdd" target="_blank">doodle poll</a> if you are interested in attending. A light lunch will be supplied, sponsored by the <a href="http://www.software.ac.uk/" target="_blank">Software Sustainability Institute</a>.<br />
<br />
I've also prepared a short <a href="https://docs.google.com/forms/d/1iBrpGK7RkxLYfuKEZ7q9YsvUGE2nVUloSXOW3JfXxkY/viewform" target="_blank">questionnaire</a> on aspects of research computing.<br />
<br />
For the first meeting we'll probably spend 20 minutes in general discussion, and I'll spend 20 minutes with a short presentation on magically turning your data into publications. If anyone would like to do a short presentation on any aspect of their research computing processes, please email me with details.<br />
<br />
Please encourage your fellow researchers and PhD students to attend.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-31279606704612031912013-04-04T14:56:00.000+01:002013-04-04T14:56:03.976+01:00Why You Should Not Use WinBUGS (or OpenBUGS)The BUGS project did a lot to bring Gibbs sampling to the masses. But by using a non-mainstream development environment, it has painted itself into a corner.
<br />
<br />
My initial objection to using BUGS was that WinBUGS only ran on Windows PCs. That meant I couldn't run it on my desktop, and I couldn't run it on the university high-end cluster. The only way I could run it on a Linux box would be to set up a Windows Virtual Machine.
<br />
<br />
Attempts to run it using Wine on my desktop would fail with odd errors - current one is "trap #101" and a traceback list, although I've also seen various memory errors once I have started it. I can't trust it to be doing anything right.
<br />
<br />
Trust was another reason I was far from keen on using WinBUGS. It was released under a closed-source license. There was no right to modify or redistribute WinBUGS. There was even a fee for a registration code that unlocked some features - although that fee is currently $0 and the unlocking key is freely available.
<br />
<br />
The last patch for WinBUGS was dated August 2007, over five years ago. The most recent update on the WinBUGS development page talks about a beta-test period ending nearly nine years ago. Because things have moved on. In 2004, OpenBUGS was born. The code was made available under the GNU GPL, a license that allows modifications and redistributions, and most of all allows - indeed insists on - access to the source code.<br />
<br />
So let's look at the source code. The Download page on the OpenBUGS site even has a Linux Download section with a link to a 'source package'. But this turns out to be misleading - its actually a binary shared library with a short C file that links to it and calls the main routine in it. Where is the source code? It supposes to be on SourceForge, but although there appears to be code checkins, I find nothing in the SVN repo: http://openbugs.svn.sourceforge.net/viewvc/openbugs/
<br />
<br />
I'm sure this has all worked before. So what's going on now? In desperation, I emailed one of the developers. He reports they are ironing out a few wrinkles in the source code, and everything will be back to normal within a few weeks.
<br />
<br />
Even when I do have the source code, development is problematic. The code is written in Component Pascal, and can only be read with the BlackBox Component Builder from Oberon Microsystems. The files are not plain text. The Component Builder only runs on Windows. An email on the OpenBUGS mailing list in July 2012 claims that the BlackBox tools have been abandoned by their own developers, which is why there's no true 64-bit OpenBUGS.
<br />
<br />
How the Linux binary is produced is also a mystery. I can't find any clues on the OpenBUGS developer pages about how to build it. Clearly some kind of cross-compilation or embedding of Windows code is needed. But a handy guide to do it, I can not find.
<br />
<br />
Other open-source software is not as difficult as this. Most R packages install from source code with a one-line build command. Installing python packages can be as simple as 'pip install packagename'. I can configure, compile, and build a major open-source GIS in two command lines. Why has OpenBUGS got so difficult?
<br />
<br />
<ul>
<li>It used a very niche language and toolset. Pascal, and Component Pascal, was a minority interest back when the BUGS project started. But they got in deep pretty soon. Even though the language is perhaps richer than the C and C++ systems of its time, they've been hit by the problem outlined in <a href="http://prog21.dadgum.com/168.html" target="_blank">"Don't be Distracted By Superior Technology"</a> by James Hague.</li>
<li>It targeted Windows only, providing a low barrier to entry for simple users. However, by not building a cross-platform solution it alienated more technical users running Unix variants, the sort of people who form the majority of developers.</li>
<li>It failed to build a community. By not attracting developers, development was reliant on the paid staff. The current page of "Future Developments" has plenty of things that people could be getting on with, but the esoteric and restricted development environment shuts out many people.</li>
<li>It opened up too late. Now we have true open-source alternatives that can run many of the BUGS models - such as JAGS, stan, PyMC, and several specialised R packages. These packages are also easy to extend with custom distributions and models. Few people are going to put development effort into OpenBUGS now.</li>
</ul>
<div>
So what of the future? If the BlackBox Component Builder is dead, then there is a big brick wall ahead once that stops working with the current version of Windows. It is Open Source, but probably requires an elite skill set to get it working. But if that goes not only will you not be able to run your models, you won't even be able to open your model files. That's so important I'll repeat that. You will not be able to read your model files. Imagine not being able to open your old Word files. Oh wait, you probably cant.</div>
<div>
<br /></div>
<div>
There's a paper in Statistics in Medicine (2009) by David Lunn et al that talks about a foundation for BUGS code, but there's no sign of this existing as of now. Plenty of other projects thrive without needing a foundation, you just have to open the code, make working with it easy, and people will come. I'm not sure if this is possible without re-implementing OpenBUGS in a mainstream language such as C++ or Python. Without doing that, OpenBUGS is a dead end.</div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com6tag:blogger.com,1999:blog-2573156439353370360.post-12344559512378501482013-02-17T23:24:00.000+00:002013-02-17T23:24:11.396+00:00An NHS Twitter Sentiment MapOne of the project proposals at the NHS Hack Day in Oxford was about doing some kind of sentiment analysis using twitter and NHS hashtags and ids. I had a brief word with the proposer since I'd recently seen something similar done in R but he was on another project.<br />
<br />
But this weekend I thought I'd have a go at doing something. The main idea came from Jeffrey Breen's blog <a href="http://www.inside-r.org/howto/mining-twitter-airline-consumer-sentiment" target="_blank">Mining Twitter for Airline Consumer Sentiment</a> - there he writes some simple R functions that looks for positive and negative words in tweets to give a score. I pretty much used his code for scoring exactly like that.<br />
<br />
I found a <a href="https://twitter.com/devices4/nhs/members" target="_blank">twitter list of NHS</a> accounts which I quickly pulled out of Firefox. This could probably be done with the twitteR package in R but I found it just as quick to save it from Firebug and parse the HTML with Python and BeautifulSoup. Make sure you've scrolled down to see all the members, then save the list from the HTML tab in Firebug. That gave me a parsable list of accounts with id, description, and image URL for good measure.<br />
<br />
Then I could run the sentiment analysis, looking for tweets that mention the NHS accounts, and computing the score. This would hopefully be tweets from people mentioning those accounts, and expressing their opinion of the service.
<br />
There were some problems with escape characters (the RJSONIO package worked better than the rjson package) and other encodings, but I managed to work round them. Soon I had a table where I could order the NHS accounts by sentiment.<br />
<br />
But I'm a map geek. How could I map this? Most of the twitter accounts were geographic, but some were individual trusts or hospitals, and some were entire PCTs, SHAs, or even the new CCGs. There were also a few nationwide NHS accounts.<br />
<br />
So I adopted a manual approach to geocoding. First I generated a shapefile with all the accounts located in the North Sea. I loaded this into a QGIS with an OpenStreetMap background layer, and then dragged each account around until it was roughly in its place. I left the national accounts floating in the sea. 156 drags later, they all had a place.<br />
<br />
Back in R, I put the sentiment scores and counts of number of tweets back onto the geodata and saved a shapefile. But how to distribute this? I could save as a KML and people could use Google Earth, but I thought I'd give <a href="http://www.cartodb.com/" target="_blank">CartoDB</a> a try. This is a cloud mapping service where you can upload shapefiles and make pretty maps out of them. With a bit of styling, I had it. Here it is, blue for positive sentiments and reds for negative ones (to the nearest whole number):
<br />
<br />
<iframe width='400' height='400' frameborder='0' src='http://barryrowlingson.cartodb.com/tables/nhs_twitter_sentiment_analysis/embed_map?title=true&description=true&search=false&shareable=false&cartodb_logo=true&sql=&sw_lat=49.3537557183099&sw_lon=-13.403320312499998&ne_lat=55.3791104480105&ne_lon=7.690429687499999'></iframe>
<br />
Or use the <a href="http://cdb.io/YrbUAi">direct link</a> to the map
<br />
Of course you need to take a lot of this with a pinch of salt. The locations are approximate. The sentiment scoring system is quite naive. The scores are based on fairly small numbers of tweets (click on a point to see). And the scores were a snapshot of when I ran the analysis.
Nevertheless, its a start. It would be interesting to automate this, and see how sentiment changes over time, but I think it requires a lot more tweets to get something amenable to statistical analysis. Once I figure out a statistical distribution for these scores (difference of two Poissons maybe) I could map significance of sentiment scores, which would take into account the small samples. Exeter may look red and angry, but that's from one single tweet!
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-78936700291371175812013-02-06T18:10:00.000+00:002013-02-06T18:10:08.073+00:00A new paradigm for spatial classesI have a love-hate relationship with R. So many things annoy me - even things not in the R Inferno. But we can fix some of them.<br />
<br />
For example, <span style="font-family: Courier New, Courier, monospace;">SpatialPolygonsDataFrames </span>(and the other spatial data frame classes) aren't really data frames. They don't inherit from <span style="font-family: Courier New, Courier, monospace;">data.frame</span>, but they almost behave like data frames. And when they don't, that's an annoyance, and you end up having to get the <span style="font-family: Courier New, Courier, monospace;">foo@data</span> member which really is a data frame.<br />
<br />
So why can't data frames have a spatial component? Partly because the columns of a data frame can only be vectors of atomic objects. You can't have a column of list objects. Or a column of model fits. Or a column of dates.. wait, yes you can...<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">> ds = seq(as.Date("1910/1/1"), as.Date("1999/1/1"), "years")</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">> df = data.frame(date=ds,i=1:90)</span><br />
<span style="font-family: Courier New, Courier, monospace;">> head(df)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> date i</span><br />
<span style="font-family: Courier New, Courier, monospace;">1 1910-01-01 1</span><br />
<span style="font-family: Courier New, Courier, monospace;">2 1911-01-01 2</span><br />
<span style="font-family: Courier New, Courier, monospace;">3 1912-01-01 3</span><br />
<span style="font-family: Courier New, Courier, monospace;">4 1913-01-01 4</span><br />
<span style="font-family: Courier New, Courier, monospace;">5 1914-01-01 5</span><br />
<span style="font-family: Courier New, Courier, monospace;">6 1915-01-01 6</span><br />
<br />
Dates like this are atomic. Strip away its class and a date object is just a number:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">> unclass(ds[1])</span><br />
<span style="font-family: Courier New, Courier, monospace;">[1] -21915</span><br />
<br />
and its the S3 OO system that prints it nicely. So how can we store geometry in an atomic data item? We can use WKT. This is a text representation of points, polygons and so on.<br />
<br />
So here's a function to take a <span style="font-family: Courier New, Courier, monospace;">SpatialPolygonsDataFrame</span> and return a data frame with a geometry column. It adds a column called t<span style="font-family: Courier New, Courier, monospace;">he_geom</span> of class <span style="font-family: Courier New, Courier, monospace;">"geom"</span>, and attaches an attribute to the data frame so that we know where the geometry column is. Anyone who has used PostGIS will recognise this.<br />
<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">as.newsp.SpatialPolygonsDataFrame = function(spdf){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> require(rgeos)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> spdf$the_geom = writeWKT(spdf,byid=TRUE)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> class(spdf$the_geom) = c("geom")</span><br />
<span style="font-family: Courier New, Courier, monospace;"> df = spdf@data</span><br />
<span style="font-family: Courier New, Courier, monospace;"> attr(df,"the_geom") = "the_geom"</span><br />
<span style="font-family: Courier New, Courier, monospace;"> class(df) = c("newsp","data.frame")</span><br />
<span style="font-family: Courier New, Courier, monospace;"> df</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<br />
<div>
<br /></div>
<div>
Now if you convert a SPDF to a <span style="font-family: Courier New, Courier, monospace;">"newsp"</span> object you get a data frame. With geometry. What can you do with it? Well, anything you can do with a data frame, because that's exactly what it is. You want to do spatial things with it? Ah, well, for now here's some code that gets the geometry columns and turns it back into a SPDF:</div>
<div>
<br /></div>
<div>
<div>
<br />
<span style="font-family: Courier New, Courier, monospace;">as.SpatialPolygonsDataFrame.newsp = function(nsp){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> geom_column= attr(nsp,"the_geom")</span><br />
<span style="font-family: Courier New, Courier, monospace;"> geom = nsp[[geom_column]]</span><br />
<span style="font-family: Courier New, Courier, monospace;"> class(nsp) = "data.frame"</span><br />
<span style="font-family: Courier New, Courier, monospace;"> geom = lapply(geom,readWKT)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> glist = lapply(geom,function(p){p@polygons[[1]]})</span><br />
<span style="font-family: Courier New, Courier, monospace;"> for(i in 1:length(glist)){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> glist[[i]]@ID=as.character(i)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace;"> SpatialPolygonsDataFrame(SpatialPolygons(glist),nsp,match.ID=FALSE)</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<div>
<br /></div>
</div>
</div>
<div>
This just does the reverse of the previous function. So you can write a plot method for these <span style="font-family: Courier New, Courier, monospace;">newsp</span> objects:</div>
<div>
<br /></div>
<div>
<br />
<span style="font-family: Courier New, Courier, monospace;">plot.newsp = function(x,...){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> d = as.SpatialPolygonsDataFrame.newsp(x)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> plot(d,...)</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<div>
<br /></div>
</div>
<div>
So far, so pointless? Eventually if you developed this you'd write code to interpret the WKT (or better still, use WKB for compactness) and draw from that. This is just proof-of-concept.</div>
<div>
<br /></div>
<div>
These WKT strings can be very long, and that messes up printing of these data frames. In order to neaten this up, let's write some methods for the geom class to truncate the output:</div>
<div>
<br /></div>
<div>
<br />
<span style="font-family: Courier New, Courier, monospace;">as.character.geom = function(x,...){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> paste0(substr(x,1,10),"...")</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">format.geom = function(x,...){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> as.character.geom(x)</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<div>
<br /></div>
</div>
<div>
But now we hit a problem with subclassing any classed vector. Facepalm. R drops the class. </div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> z=1:10</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> class(z)="foo"</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> z</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> [1] 1 2 3 4 5 6 7 8 9 10</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">attr(,"class")</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">[1] "foo"</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> z[3:10]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">[1] 3 4 5 6 7 8 9 10</span></div>
<div>
<br /></div>
</div>
<div>
Annoying. But how does a vector of dates work around this? Well, it defines a "[" method for date vectors. We'll just copy that line for line:</div>
<div>
<br /></div>
<div>
<br />
<span style="font-family: Courier New, Courier, monospace;">"[.geom" = function (x, ..., drop = TRUE) </span><br />
<span style="font-family: Courier New, Courier, monospace;">{</span><br />
<span style="font-family: Courier New, Courier, monospace;"> cl = oldClass(x)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> class(x) = NULL</span><br />
<span style="font-family: Courier New, Courier, monospace;"> val = NextMethod("[")</span><br />
<span style="font-family: Courier New, Courier, monospace;"> class(val) = cl</span><br />
<span style="font-family: Courier New, Courier, monospace;"> val</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<div>
<br /></div>
</div>
<div>
<br /></div>
<div>
and while we're at it, we need a print method for the geom class too:</div>
<div>
<br /></div>
<div>
<br />
<span style="font-family: Courier New, Courier, monospace;">print.geom = function(x,...){</span><br />
<span style="font-family: Courier New, Courier, monospace;"> print(paste0(substr(x,1,10),"..."))</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
</div>
<div>
<br /></div>
<div>
Now look what I can do (using the Columbus data from one of the <span style="font-family: Courier New, Courier, monospace;">spdep</span> examples):</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> cnew = as.newsp.SpatialPolygonsDataFrame(columbus[,1:4])</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> head(cnew)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> AREA PERIMETER COLUMBUS_ COLUMBUS_I the_geom</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">0 0.309441 2.440629 2 5 POLYGON ((...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">1 0.259329 2.236939 3 1 POLYGON ((...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">2 0.192468 2.187547 4 6 POLYGON ((...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">3 0.083841 1.427635 5 2 POLYGON ((...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">4 0.488888 2.997133 6 7 POLYGON ((...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">5 0.283079 2.335634 7 8 POLYGON ((...</span></div>
</div>
<div>
<br /></div>
<div>
This is what kicked this all off. Someone couldn't understand why <span style="font-family: Courier New, Courier, monospace;">head(spdf)</span> didn't do what he expected - in some cases:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> pp=data.frame(x=1:10,y=1:10,z=1:10)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> coordinates(pp)=~x+y</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">> head(pp)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Error in `[.data.frame`(x@data, i, j, ..., drop = FALSE) : </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> undefined columns selected</span></div>
</div>
<div>
<br /></div>
<div>
Try it. Understand it. Then tell me that spatial data frames that inherit from data.frame with the spatial data in a column aren't a better idea. Its the principle of least **BOO!** surprise. Of course it would take a lot of work to re-tool all the R spatial stuff to use this, so it's not going to happen.</div>
<div>
<br /></div>
<div>
Another possibility may be to drop <span style="font-family: Courier New, Courier, monospace;">data.frames</span> and use <span style="font-family: Courier New, Courier, monospace;">data.tables</span> instead... Anyway, just an afternoon hack when I realised you could put WKT in a column. </div>
<div>
<br /></div>
<br />
<br />Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-45569541752414169182013-01-29T17:46:00.001+00:002013-01-29T18:05:33.997+00:00NHS Hack Day Oxford - The Good, The Bad, and The Ugly<h2>
My Experience at <a href="http://nhshackday.com/">NHS Hack Day Oxford</a></h2>
<div>
Let's get the "Ugly" out of the way. By the end of day two (and possibly earlier) - the first stall had the seat broken and strewn on the floor, the second had no paper, and the third had been filled by a mysterious bulging carrier bag. Given this was a hospital, nobody had dared investigate its contents.</div>
<div>
<br /></div>
<div>
The "Bad"?. The networking. Getting reliable internet connections for an event like this is a must. There had been some effort to get BT Wireless connectivity, but this was costing £20 for the weekend per person, and even then there were reliability problems. I connected freely using the Eduroam system for academics but the connection would regularly drop. With an average of 2.1 WiFi devices per person at this event (that's a rough guess!) it can be a deal breaker. At least one person arrived late on Sunday to take advantage of the unlimited fast broadband at their friend's house where they were staying - although he also admitted the unlimited toast and Australian tennis action was another factor. I wonder if wired network connections and a few switches and hubs scattered round the venue might be a better solution. For the <a href="http://2013.foss4g.org/">FOSS4G Conference</a> in Nottingham this year (shameless plug) the network connectivity has been top of our agenda since our first meeting. </div>
<div>
<br /></div>
<div>
For future Hack Day events, it might be worth putting a reminder out for power extension leads a bit further in advance. Also, most of us have a couple of lanyards kicking around from previous conferences and a reminder to bring and share lanyards might be an idea too. Sticky labels were okay, but sub-optimal!</div>
<div>
<br /></div>
<div>
So, onto the work itself. Given the huge number of people who turned up I was surprised at the level of traffic on the email and the google group. Perhaps a dozen or so individuals voiced up on the pre-conference channels, yet at least ten times that number walked through the doors on the day. What were that 90% thinking beforehand?</div>
<div>
<br /></div>
<div>
Partly I suspect some of the work groups were formed in advance. There seems to be a classification of hack day work groups:</div>
<div>
<ul>
<li>The group of people who have met and worked before. They have a plan for the hack day worked out already, and are perhaps using the weekend to set aside two days from the sound of ceaseless door-knocking (or the ping of arriving emails) in their day jobs to get this done. These groups can be hard for an outsider to help out with, since their work programme is decided and the division of labour is understood. They get down and get on.</li>
<li>The small group that has a plan, and knows what it needs, and knows it can find it at the hack day. Perhaps one or two people have the kernel of an idea together, but need a database expert, or a web scraper. They get them, and a designer and maybe a doctor and a patient jump in too as representative end users. Soon a group forms that adds value to the original idea and the development snowballs.</li>
<li>The randoms. A group of people with no great agenda, but assorted skill sets, who get together and come up with both a problem and a solution on the day. It may be nothing to do with any of the individual ideas, but emerges from the sum of their parts.</li>
</ul>
<div>
I skipped the presentations and judging partly because I'm slightly uncomfortable with judging and prizes, especially for works of art (don't talk to me about the Oscars, the Man Booker prize, or the Queen's Honours list) and I was getting a bit cabin-fever and I fancied some fresh air. I wandered down to the river and was rewarded by a refreshing downpour followed by a <a href="http://www.youtube.com/watch?v=OQSNhk5ICTI" target="_blank">DOUBLE RAINBOW!</a> which inspired me with some new ideas. I find a combination of solitary thinking time with group work most inspirational, and a giant double rainbow arcing right across the sky can only help that process.</div>
</div>
<div>
<br /></div>
<div>
The time spent with other people is the real "Good" of the NHS Hack Day. Even if none of the code cut at the weekend goes into production then the value in making those connections makes the weekend worthwhile. What this means is that even if the WiFi doesn't work and there's no electricity, just stick everyone in the room, lock the doors, push sandwiches and cake in at midday and keep a flow of coffee and good things will still happen.</div>
<div>
<br /></div>
<div>
The big challenge is getting these things continuing, which requires funding, further collaboration, acceptance and adoption. The greater challenge is integrating the spirit of the hack day into the processes that produced, for example, the ePortfolio that was so hated that a hack day group spent the weekend conspiring to produce a better one. So-called solutions are seemingly often imposed from above, based on marketing promises from big business, and the vast majority of users have no input to the problem. This is not a problem confined to the NHS, and there is a need to "loop back up" so that managers have a better awareness of user requirements, something they could learn from modern agile computer science development methods and open-source practices. </div>
<div>
<br /></div>
<div>
So overall a great weekend - and I'll probably keep an eye out on some of the projects to see how they develop. All the details are available from the <a href="http://www.nhshackday.com/">NHS Hack Day</a> web site.</div>
<div>
<br /></div>
<div>
Comments etc to <complete id="goog_1114962067">@geospacedman on twitter</complete></div>
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-3175234480239007482012-09-29T18:29:00.001+01:002012-09-29T18:37:47.596+01:00The Taarifa ProjectI seem to have been drawn in to helping a bit with <a href="http://taarifa.org/">The Taarifa Project</a> for one reason or another. Mostly the enthusiasm of Mark Iliffe, but also the possibility of helping out with an interesting project.<br />
<br />
For those who don't know and are too lazy to click the link I thoughtfully provided, here's the elevator pitch:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">The Taarifa Platform is an open source web application for information collection, visualization and interactive mapping. It allows people to collect and share their own stories using various mediums such as SMS, Web Forms, Email or Twitter, placing these reports into a workflow. Where these reports can be followed up and acted upon, while engaging citizens and communities.</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">Currently it has been deployed in Uganda, and further deployments are planned. The current codebase is a heavily modified fork of Ushahidi, but the application has grown out of that now. The new project is a reboot and a new existence as a <a href="http://www.djangoproject.com/">Django</a> application.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Which suits me fine. I've done a lot of work with Django, including content management systems, applications for disease modelling and monitoring (that's my day job), and a social site that built a linked system of interesting technological new items. </span><br />
<span style="font-family: inherit;"><br /></span>
The Django version of Taarifa, djangorifa, is open source, and hosted on github. Getting the source and the requirements took about ten minutes. I first tried to get it running with Spatialite, but currently that doesn't work because Spatialite has some limitations that mean it can't work out distances given lat-long coordinates. Seems a bit odd since any schoolkid can do it with a calculator with sin and cos buttons. But never mind.<br />
<br />
So I stuck PostGIS on my virtual box and away it went. Using the sample OpenStreetMap dump file supplied I had a demo site working, and I could start pretend-reporting things.<br />
<br />
There's clearly a lot to do. I think from a top-level there's still plenty of functionality to re-think. I believe the PHP version had a complicated workflow for dealing with reports, so maybe that needs doing. There's also a lot of basic web site functionality too. I thought of a few things:<br />
<br />
<ol>
<li><b>Translation.</b> For something intended for developing countries it has to be multilingual. Django supports this out of the box, and translations are supported in templates, python code, and even in URLs themselves, so a site could have URLs in several languages that go to the same effective page, and that page would be translated into that language. For example, /voiture/12 would show the page for car number 12 in French, and /car/12 would show it in English, and /auto/12 would show it in German. Not sure how you handle pages where the same word is used in two languages...</li>
<li><b>Search.</b> Vital for this system. I've used Haystack integrated into Django-CMS talking to a whoosh backend. Pretty heavy, and works well. I had a quick look for new django search projects and gave <a href="https://github.com/etianen/django-watson/">django-watson</a> a test-drive. Worked straight out of the box. Uses Postgres' search facilities so indexing is all handled there. Might be my new goto- django search system.</li>
<li><b>REST API.</b> Data in is one thing, data out is another. For a community-centric system such as this, it would be great to let people get the data and do that web 2.0 mash-up think that you may remember from a few years ago. </li>
<li><b>Spatial Analytics.</b> Take the report data and look for hotspots, time trends and so on. Present it all as nice graphs and maps. Make that available via a WMS server for mash-up purposes.</li>
</ol>
<div>
Check the project out if it looks interesting to you! <a href="http://taarifa.org/">http://taarifa.org/</a></div>
Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-82375013059141968272012-09-11T14:56:00.000+01:002012-09-11T14:56:08.655+01:00Fixing Polygons with pprepairAt the OSGIS UK conference last week the prize for best presentation went to Ken Arroyo Ohori of TU Delft and his talk on "Automatically repairing polygons and planar partitions". The presentation, jointly credited to Hugo Ledoux and Martijn Meijers detailed a method for cleaning up dodgy polygons which we've all probably seen in our geo-lifetimes.<br />
<br />
I've noticed that computational geometry theorists seem averse to publishing code - its as if they don't want to sully the purity of their theory with dirty code, dealing with edge cases and arithmetic precision and all that. Not these guys. The code is on github, and I was eager to try it out. There was a lot of buzz about integrating this with PostGIS or QGIS or R, the stumbling block seeming to be the requirement for the CGAL library.<br />
<br />
Anyway, back at Lancaster, and check out the code. Easy as:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">git clone https://github.com/tudelft-gist/pprepair.git</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">and then </span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">make -f Makefile.linux</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">There were a couple of problems which I submitted as issues, but I managed to fix them before any response. Check these on the github issue tracker.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Then I wanted a sample dataset to test it on. From the workshop I gave at the UseR! conference I remembered some awful polygons outlining local police neighbourhood regions. I'd used it as an example of bad data. Some crime locations had fallen into cracks between the neighbourhood polygons, and some were in locations covered by more than one polygon. Maybe you could escape justice by claiming you were outside a police jurisdiction this way? </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Here's what a small region looks like in Qgis:</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcpzE7XU6HRW8rBT_r8SsaI3b_6vcIIryMg_ZZlKnJ_-OW4PeFYufQKwtTqwxxra-uYjymcQJvWlVyyVlz-8BZhpPEdLQh3Fzrm2d6L9xdOmZntnkPuRpM6IuppKrgG_gId2PowsF1wQ4/s1600/screen_52.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="275" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcpzE7XU6HRW8rBT_r8SsaI3b_6vcIIryMg_ZZlKnJ_-OW4PeFYufQKwtTqwxxra-uYjymcQJvWlVyyVlz-8BZhpPEdLQh3Fzrm2d6L9xdOmZntnkPuRpM6IuppKrgG_gId2PowsF1wQ4/s320/screen_52.png" width="320" /></a></div>
<span style="font-family: inherit;">Notice the white spaces between some polygons, and also the overlapping made visible by setting transparency.</span><br />
<span style="font-family: inherit;"><br /></span>
The <span style="font-family: Courier New, Courier, monospace;">pprepair</span> process is currently a command-line tool, but its options are simple. An input shapefile, an output shapefile and some repair parameters. But to start, all you need is "-fix".<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">pprepair -i neighbourhoods.shp -o fixed.shp -fix</span><br />
<br />
Loading this into Qgis looks much nicer:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBhtEsphfjgL9SBZxPWDK_CjSZ_CJE4m9Je15N2tK83XPfowMmGjlBSQApFwWkFcqoeGU1bI6KhJGgX-vYl2fIn4RTOI6H6Oy_J0fhZi6OSDwTKAO7hRfDO86mlTpV0BrgDX7MEsYhTP8/s1600/screen_51.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="275" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBhtEsphfjgL9SBZxPWDK_CjSZ_CJE4m9Je15N2tK83XPfowMmGjlBSQApFwWkFcqoeGU1bI6KhJGgX-vYl2fIn4RTOI6H6Oy_J0fhZi6OSDwTKAO7hRfDO86mlTpV0BrgDX7MEsYhTP8/s320/screen_51.png" width="320" /></a></div>
Everything is now clean, there's no overlaps and no holes (note the colours are generated by my semi-random topological colouring system so are different). You now have a consistent topology that won't cause odd errors when using other analysis techniques.<br />
<br />
Ken's presentation claimed big wins on speed and memory usage over similar algorithms in both open-source and proprietary GIS packages. Once the licensing requirements of the CGAL library are solved, it would be great to see this algorithm pop up in all the open-source geospatial systems.<br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com4tag:blogger.com,1999:blog-2573156439353370360.post-59743580778062636072012-07-10T19:27:00.003+01:002012-07-10T19:28:24.786+01:00The World Wine WebThe ggmap package caused a buzz at UseR! last month. Well, anything gg-something plus anything to do with maps will do that. It lets you download map tiles to use as backgrounds to ggplot graphics, and has some other useful functions such as interfacing to google's routing engine and geocoding.<br />
<br />
Dave Smith of Revolution Analytics used it to map his <a href="http://blog.revolutionanalytics.com/2012/07/making-beautiful-maps-in-r-with-ggmap.html#comments">local wineries</a>, a map someone on twitter called 'useful'. I disagree...<br />
<br />
That map shows me one thing - that the wineries that you don't need an appointment for tend to be close to the main roads, and the ones you do need an appointment for are up in the hills. Its a static, dead map. If you decide to visit a winery, you've then got to do some spatial query to find out the name and address. That's not very useful.<br />
<br />
One poster on Dave's blog asked if it could be linked to Google Vis to produce a dynamic map. Well, better than that. We can use R to create a web map with all open source tools and open data.<br />
<br />
My off-CRAN package, <a href="https://r-forge.r-project.org/projects/webmaps/">webmaps</a> has two capabilities - it can download map tiles like ggmap, and it can create slippy webmaps using OpenLayers. Recent changes to OpenLayers have however broken the slippy map functionality, and I'm working on a dedicated replacement package called openlayr. However, enough of the webmaps functionality exists to give me a good start on making a dynamic map of the wineries.<br />
<br />
If it was all working, all I'd need to do is to divide the wineries into those that need appointments and those that don't, and then do:<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">osmMap(layer(Appt,name="Yes",style=s1),layer(NoAppt,name="No",style=s2),outputDir="WineMap")</span><br />
<br />
and I'd get something like this:
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirCK6caisuUyl66s66wCxfeJZZ8zfF-eLsyaZWbkrzGWlJhiJR176dXyC2jM9ViP_JmIKc1SqjJGPDSHQYxRFj4JSvcop1shopPrt4C2rMlCEVuiAVOQPhCys03OgVCWIE9buiPSK_N8s/s1600/Screenshot-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirCK6caisuUyl66s66wCxfeJZZ8zfF-eLsyaZWbkrzGWlJhiJR176dXyC2jM9ViP_JmIKc1SqjJGPDSHQYxRFj4JSvcop1shopPrt4C2rMlCEVuiAVOQPhCys03OgVCWIE9buiPSK_N8s/s200/Screenshot-1.png" width="184" /></a></div>
- a web map with two layers of points (the objects s1 and s2 define the colours). Each layer can be activated on or off via the checkboxes in the pullout menu, and you can zoom and pan.
And if you click on a winery...
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8ZpqtXVA2FVoqqfi_I3dEFynJ6TrBnY9TqYCNHOdxo8JW9CY6_SOaKdhGz5B6TuNIrhPV75gi3rB_OmGOvROqoSDc5rvmjxGZ34O1nfOQ83_66WROoXib7FVI4Q0baDRLt7wT36C-Zj8/s1600/Screenshot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8ZpqtXVA2FVoqqfi_I3dEFynJ6TrBnY9TqYCNHOdxo8JW9CY6_SOaKdhGz5B6TuNIrhPV75gi3rB_OmGOvROqoSDc5rvmjxGZ34O1nfOQ83_66WROoXib7FVI4Q0baDRLt7wT36C-Zj8/s200/Screenshot.png" width="184" /></a></div>
up pops the information. Remember apart from those s1 and s2 objects and a bit of standard dataframe manipulation, this is one line of R.<br />
<br />
The background imagery is an <a href="http://www.openstreetmap.org/">OpenStreetMap</a> tile background - so I really need to add an attribution to this. Copyright OpenStreetMap Contributors, CC BY-SA. Its also possible to add other base-layers, such as the oooh-inducing Stamen.com watercolour maps.<br />
<br />
Once I get openlayr all working nicely I can probably give you a quick script to do all this. For now, you can check out the <a href="http://www.maths.lancs.ac.uk/~rowlings/R/Wine/">map online</a> and see for yourself. What happens is that each layer is converted to GML (I might change to using GeoJSON for openlayr) and a simple Javascript and HTML wrapper is created to load the layers.<br />
<br />
With a bit more customisation you could make the popups automatically link to the winery web page! This is how easy it is to make a web map with R - remember you can push this onto a web server and anyone can see it.<br />
<br />
I'm off now for a glass of... oh, my wine cellar is empty...Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com3tag:blogger.com,1999:blog-2573156439353370360.post-45436967475918759972012-05-30T07:48:00.000+01:002012-05-30T07:48:07.547+01:00Crop CirclesThere was a question on the GIS group on Linked In about creating circular buffers that were enclosed within a polygonal parcel yet of a given size. I reckon the way to do it is to try various values of the radius, computing the overlay, and repeating until you find the radius that gets you the area. The area is strictly increasing with radius, and as long as the polygon is big enough there will be a single unique solution between the min and max values. Simple. Here's the R code that does it:
<br />
<pre>clipCir <- function(parcel,x,y,area){
### if the parcel is smaller than the target area, we're stuffed:
if(parea(parcel)<area){
stop("cant be done, parcel too small")
}
### radius is somewhere between a bit less than sqrt(a/pi)
### and the diagonal of the parcel.
bb = bbox(parcel)
rmin = sqrt(area/pi)*0.99
rmax = sqrt(diff(bb[1,])^2+diff(bb[2,])^2)
### this function returns the difference between the area
### of the clipped poly and the target area:
f = function(r){
area-clipAreaR(parcel,x,y,r)
}
### uniroot computes f for r between rmin and rmax until f~=0,
### which is when the area of the clipped poly = the target area
soln = uniroot(f,c(rmin,rmax))
### return the polygon
poly = makeClipP(parcel,x,y,soln$root)
poly
}
clipAreaR <- function(parcel,x,y,r){
return(parea(makeClipP(parcel,x,y,r)))
}
makeClipP <- function(parcel,x,y,r){
### create a polygon of a circle clipped to a polygon.
### the circle has 360 points round it, radius r, centre (x,y)
theta = seq(0,2*pi,len=360)
### hoop jumping to make a SpatialPolygons object
c = Polygon(cbind(x+r*sin(theta),y+r*cos(theta)))
pc = Polygons(list(c),ID="A")
spc = SpatialPolygons(list(pc))
proj4string(spc)=proj4string(parcel)
### intersect them
gIntersection(spc,parcel)
}
parea <- function(sp){
### compute the area of a spatial object by summing the
### area of the components
sum(unlist(lapply(sp@polygons,function(x){x@area})))
}
</pre>
The code relies on sp objects and the rgeos package. The uniroot function from the base package does the root finding.
<br />
<pre>parcels = readOGR("Data/","counties")
p = clipCir(parcels[2,],xy[1],xy[2],0.3)
plot(parcels[2,])
plot(p,add=TRUE,lwd=4)
</pre>
Here's one solution:
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbvQJb7Yr1BRaCqresT1dxp9M31v7F4zYlzMUsRdRR-H1qGqFqLl5ZgPSfRx7xAsewjzYGP2GxReyeii2TWFG-Puk6tLVDc20v8JKCiEMbZcrGr5lP8x4-gW7yfWlXwKF2QeFXsIeblQU/s1600/clip1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="152" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbvQJb7Yr1BRaCqresT1dxp9M31v7F4zYlzMUsRdRR-H1qGqFqLl5ZgPSfRx7xAsewjzYGP2GxReyeii2TWFG-Puk6tLVDc20v8JKCiEMbZcrGr5lP8x4-gW7yfWlXwKF2QeFXsIeblQU/s200/clip1.png" width="200" /></a></div>
The thin outline is the county boundary, and the thick line is a polygon centred at xy[1],xy[2] with area 0.3 (actually 0.299984).
It should be easy enough to convert this to any language that can handle geometry objects - the only tricky part might be writing a uniroot function - the easiest way is to start at the min and max, and do interval bisection - compute the function at (min+max)/2 and then decide whether the zero is left or right of the midpoint. This used to be an exercise for our first-year Fortran programming students...<br />
<br />
Here's another example with a weirder-shape parcel - here the required area was 0.005, and the returned polygon was 0.004999925 units.
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN-JqYxws-GKFu5okFgtAHAtSKeKa91m4lSO4LvDbGmstpHOMTFSXwkwMgRr76wcr5Sw6VKsTpZYs9KTfhlMAH834tYdLBrRH0WRzrs2Xe7lkQ-3TSzDOcFnRSA7Xx64j4OYd-YKrYgXY/s1600/Screenshot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="130" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN-JqYxws-GKFu5okFgtAHAtSKeKa91m4lSO4LvDbGmstpHOMTFSXwkwMgRr76wcr5Sw6VKsTpZYs9KTfhlMAH834tYdLBrRH0WRzrs2Xe7lkQ-3TSzDOcFnRSA7Xx64j4OYd-YKrYgXY/s200/Screenshot.png" width="200" /></a></div>
Some brief timings tell me you can do about 3000 of these every minute on my 2yo home PC.<br />
<br />
This is all good for a fixed centre, but someone mentioned the possibility that the centre isn't fixed, in which case you need some other condition for the shape you are finding - maximising some compactness parameter perhaps. Given that the parcel could be a complex, concave polygon (with holes?) this is a very hard problem. But I think I've solved the original question.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-15089858930775380402012-05-20T23:09:00.001+01:002012-05-20T23:10:40.816+01:00knitr + cactus + TwitterBootstrap + JqueryA few notes and tips from a week of preparing some course notes.<br />
<br />
I switched from Sweave to knitr because all the cool kids are doing it. And its better. The caching has already saved me more time than I spent switching from Sweave to knitr, which wasn't much time at all. Win<br />
<br />
At first I was building a PDF from .Rnw sources, but then thought maybe HTML would be a better delivery platform. No need to open a PDF reader. So I figured out the .Rhtml syntax and it was done. Win.<br />
<br />
But how to do my equations? Mathjax. Problem solved. Almost pixel-perfect equations using LaTeX syntax. Users without Javascript get the LaTeX source so its not all lost on them. Win.<br />
<br />
Making a simple HTML file is easy enough, but I want to make a few tutorial sessions, and some other info, so I want to use a template engine. My current fave is cactus.py - you design Django templates, then write your pages to fill in sections such as 'title' or 'content' in the template, then cactus.py will build the static pages. These can then be served by a simple web server or straight from the file system. Win.<br />
<br />
So instead of a full HTML file, I just got knitr to process a Django template chunk. However, this didn't give me the CSS for the source code syntax highlighting because there's no HEAD section in the Html. Solved by copying the knitr CSS into a file and serving it up from my template. Works fine, and serving CSS from files rather than inline in HTML is a win because now the server or client can cache it. Win.<br />
<br />
For style and structure, I just write my templates to use styles from the Twitter Bootstrap framework. Include the CSS and JS in your template base and you have good-looking responsive pages. Sure, they look like every other Twitter Bootstrap site, but the flexibility to style them is there too. Win.<br />
<br />
Then I figured something real cool. The R syntax highlighting wraps all functions with a span of class "functioncall". One line of jQuery can turn all those spans into hyperlinks to information from my current favourite R documentation - the R Graphical Manual. Here's some code which might get mangled by Blogger's formatting and/or my inexperience trying to stick hypertext tags into a Blogger doc:<br />
<br />
<span style="font-family: 'Courier New', Courier, monospace;">$("span.functioncall").replaceWith(function(){return '< a href="http://rgm2.lab.nig.ac.jp/RGM2/search.php?query='+$(this).text()+'" >'+$(this).text()+'< /a >'})</span><br />
<br />
Now every function links to the search for that function name. It's probably hard to link directly to the help for that function exactly, since the system would have to know what package each function was in to get to exactly the right help on the RGM, but this is a big help. Win.<br />
<br />
So that was my weekend. Also, Chelsea FC. Win.<br />
<br />Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com4tag:blogger.com,1999:blog-2573156439353370360.post-19618375468365352192012-02-26T22:38:00.001+00:002012-02-26T22:38:21.932+00:00Stopwatch in RThere was a discussion the other day on stackoverflow started by someone who wanted a stopwatch timer in R. It would start when they hit return, and stop when that line was finished executing. I guess they often have long-running jobs going and frequently think, "hmm, how long has that been running?".<br />
<br />
I figured something could be done by using R's mechanisms for hooking code into various places, but there seems to be no way to hook in before code runs. There's the taskCallBack mechanism, but that happens at the end of the code, just before your prompt appears. I wasted a bit of time trying to figure something out using that, and gave up.<br />
<br />
So the solution. A stopwatch function that evaluates its argument. You can do:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
stopwatch(foo(a))</div>
<div style="font-family: "Courier New",Courier,monospace;">
stopwatch({z=foo(a)})</div>
<div style="font-family: "Courier New",Courier,monospace;">
stopwatch({x=foo(a);y=foo(b)})</div>
<br />
and a handy little HH:MM:SS dialog will tick along until your code is finished. Then it will stop, and you can close it.<br />
<br />
The dialog is done with the tcltk package, using a one-second timer tick. There's just the one main function, and a timeForm function that produces the HH:MM:SS display from a time in seconds:<br />
<br />
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;">stopwatch <- function(expr){</span></span><br />
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;"># tcltk R expression timer function, Barry Rowlingson Feb 2012 </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> require(tcltk)</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> start=Sys.time()</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tt <- tktoplevel()</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tkwm.title(tt,gsub(" ","",paste(deparse(substitute(expr)),sep="",collapse=";")))</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> closeme <- function(){</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tkdestroy(tt)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> labelText <- tclVar("Running...")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> label1 <- tklabel(tt,text=tclvalue(labelText))</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tkconfigure(label1,textvariable=labelText)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tkgrid(label1)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> e = environment()</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> z <- function () {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tclvalue(labelText)=paste("Running: ",timeForm(Sys.time()-start));</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> assign("sw", tcl("after", 1000, z), env=e)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> quit.but <- tkbutton(tt,text="Close",command=closeme)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tkgrid(quit.but)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> sw = tcl("after", 1000, z)</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> finished=function(){</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tcl("after","cancel",sw)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tclvalue(labelText)=paste("Done after: ",timeForm(Sys.time()-start));</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tryCatch(eval(expr),finally=finished())</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">timeForm <- function(t){</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> t=as.numeric(t)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> s=t %% 60</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> m=((t-s)/60) %% 60</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> h = t %/% (60*60)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> stopifnot(s+m*60+h*60*60==t)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> tstring = sprintf("%02d",c(h,m,s))</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> return(paste(tstring,collapse=":"))</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> list(h=h,m=m,s=s)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
<br />
This code is hereby released for anyone to do anything with, but keep the credit in the comment unless you've really changed a lot and its unrecognizable from my original...<br />
<br />
I can think of a few improvements - possibly you might want to run a few things on different lines and have a cumulative stopwatch - a bit like split lap times. You might then need a different paradigm, something like:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<span style="font-size: x-small;">s = stopwatch()</span></div>
<div style="font-family: "Courier New",Courier,monospace;">
<span style="font-size: x-small;">timeOn(s,foo(a))</span></div>
<div style="font-family: "Courier New",Courier,monospace;">
<span style="font-size: x-small;">timeOn(s,foo(b))</span></div>
<div style="font-family: "Courier New",Courier,monospace;">
<span style="font-size: x-small;">foo(c)</span></div>
<div style="font-family: "Courier New",Courier,monospace;">
<span style="font-size: x-small;">timeOn(s,foo(c))</span></div>
<br />
This would create a TclTk dialog with a timer that only runs within the timeOn evaluations.<br />
<br />
Another possibility is a text-based timer rather than a GUI. You might also want it to return the time taken - currently it returns the value of the expression. Go ahead, knock yourself out.<br />
<br />Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-34004395996449434582012-01-01T13:57:00.000+00:002012-01-01T13:57:03.294+00:00Document Management System review - Mayan EDMSWe're redoing all our web pages in Maths and Stats. This means getting rid of our old Plone 2.1 CMS and replacing it with a groovy new Django-based system using Arkestra - see previous blog entry for more on that.<br />
<br />
There's a casualty though. We have an archive of old exam papers going back to 1997. These are sitting on our Plone server (technically as Zope File objects in the ZODB) and are accessed via some custom page templates I wrote so students can get them by calendar year, year of study, or search for a string.<br />
<br />
Its a mess. I've been wanting to put them into a proper document management system for a while. Here's my requirements:<br />
<ul>
<li>Users should be able to search by course code, year of study, and calendar year.</li>
<li>Users should be able to bulk download a zip file of selected PDFs</li>
<li>Admins should be able to bulk upload and enter metadata</li>
</ul>
<div>
I imagine Sharepoint can do this, but I'm no Sharepoint Wrangler and I don't really want to be one. There are a few open source solutions around, such as <a href="http://www.logicaldoc.com/">LogicalDoc</a> and these all seem to be Java-based Enterprisey solutions. Often there's an open-source, or 'community', version, and then an Enterprise supported version with integration features such as drag n drop, OCR, versioning etc.</div>
<div>
<br /></div>
<div>
Then the other day I noticed <a href="http://rosarior.github.com/mayan/">Mayan EDMS</a>. Its a Django-based DMS which looks like it might well do everything I want. And being open-source if it doesn't, I can have a go at making it do so.</div>
<div>
<br /></div>
<div>
<span class="Apple-style-span" style="font-size: large;">Installation</span><br />
Smooth as you like. The documentation was spot on, just download, get some dependencies with pip, and it all goes nicely in a virtual environment. The whole virtual env ends up about 90M big on my system, which is comparable to Java-based systems when they bundle the JVM with themselves (which they often seem to do).<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Setup</span><br />
The included <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">settings.py</span> file has options for development and production servers. For development you can run from the usual Django dev server, and for production you can collect all your static files into <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">/static</span> for serving from a web server. For test purposes I set the development mode and did the usual Django syncdb/migrate dance. I couldn't see what the default superuser name and password was, so I just created one with <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">./manage.py createsuperuser</span> and I was in. I was uploading documents in no time.<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Features</span><br />
Can it do what I want? The version I downloaded currently couldn't. The big missing thing - anonymous read-only access. We don't want our students to need to login to download exam papers, but the permission system on Mayan EDMS doesn't allow anonymous access to anything.<br />
<br />
I had a look at creating a new user with read-only permissions. The system has a very fine level of permission control, with lots of capabilities that can bet set on or off. The access to permission is via the usual Role/Group/User style of thing. You create a Role with permissions, then give that role to a User (or a Group of Users). So I created a 'Reader' role and a 'guest' user with that role, thinking we could use that for anonymous access. But the guest user could still create Folders. At that point I had a look at the code.<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Code</span><br />
The code is all on <a href="https://github.com/rosarior/mayan">Github</a> and it's incredibly well written, seemingly by one person. It uses lots of existing django apps for some of its functionality which is a good thing. It appears to comply with Python and Django best practices, is commented, and easy to find your way around. I found the Folder creation code and noticed there was no permission check like there was with, say, Document creation. I popped a quick issue on the issue tracker.<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Author</span></div>
<div>
The next morning I had a response from Roberto. And that was over New Year's Eve to New Year's Day. He's already added a permission check for Folder creation on a development branch. Coming out in the next version. Not sure if that will also have support for fully anonymous users without logging in (Django returns AnonymousUser if this is the case, rather than a real User object) but it should be sufficient for us. I don't think I've ever had such a quick response from an Open Source dev for a while - and certainly never over New Year celebrations! Credit also goes to Github for its fantastic tracking and notification system, but most of the credit goes to Roberto himself!<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Metadata</span><br />
You can define metadata sets in Mayan EDMS, so I defined a set for our exam papers and added the properties we want to classify our papers with - calendar year, year of study, and course code. Documents can also be tagged, so we could tag very old exams with 'obsolete' - since sometimes course codes are re-used for different courses altogether.<br />
<br />
<span class="Apple-style-span" style="font-size: large;">Problems</span><br />
I'm not sure some of the features are working properly yet. In the document details view, there's a 'Contents' box that I thought would extract the text contents of the document (PDF, OCR'd image etc) but it's blank even for a text document. Searches for text that occurs in that text document don't return the document. Then I noticed one of my PDFs did have text in the box. But only one. Here the text was all run on together with no spaces, but the search engine could find 'scenario' in the text containing 'modelscenariowas', but it would also match on run-on strings like 'wehavealsofoundthat'. This might be a problem but I know it is hard to extract words from PDF files sometimes.<br />
<br />
I also tried the OCR functionality on an image but that returned no text too - looking in the Django admin showed me the OCR Document queue - with an entry in it with an error message: "get_image_cache_name() takes exactly 3 non-keyword argument (2 given)". I can run tessaract from the command line successfully so something isn't quite right. I'll have a look at these issues before reporting them on Github (and checking to see if they are fixed in dev versions!)<br />
<br />
The other missing function I need as far as I can see is bulk downloads. From a search I'd like to click a few boxes and have a 'Download All' option, that gets everything in a ZIP (or tar.gz) file.<br />
<br />
<span class="Apple-style-span" style="font-size: large;">More Features?</span><br />
<br />
The documentation for some of the features are a bit thin. But clicking around and looking at the source code revealed that it has:<br />
<br />
<ul>
<li>Themes: change the look with a config option, several themes supplied</li>
<li>Versioning: keep old docs around. Not sure we need this.</li>
<li>History: show all the changes to files and metadata</li>
<li>Multilingual: with English, Spanish, Portuguese and Russian.</li>
<li>Smart Links, Indexes: I don't know what these are and I couldn't make them do anything. I'm sure they are wonderful.</li>
<li>Signatures: documents can be signed by uploading a signature file. I don't know how this works.</li>
<li>Public Keys: In the setup, you can upload GPG public keys which you can easily get by querying a keyserver. I'm not sure what these keys then let you do.</li>
</ul>
<br />
So overall I'm very impressed with it - it does a lot of what the Enterprisey, Java-based solutions try to give you but in a light python package. Just about the only Enterprise feature I've seen that's not here is true Filesystem integration via SMB or WebDAV, even though there's "Filesystem Integration" on the features list. I suppose we need to see what that means. I'm sure it's good!<br />
<br />
<br /></div>Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com5tag:blogger.com,1999:blog-2573156439353370360.post-8203024014305141322011-12-07T10:56:00.001+00:002011-12-07T13:10:09.572+00:00Reading AfriPop dataThe <a href="http://www.afripop.org/">AfriPop</a> project aims to produce high-resolution population density estimates for Africa. This could be very useful for some of our spatial epi projects. So I thought I'd have a look.<br />
<br />
Their methods for computing population are explained. I can't find any licensing or usage restrictions on the data, so I'll have to clear that up before doing anything serious with it. Let's download something and read it in.<br />
<br />
The download form asks for a few details, but there's no email check (or 'I have read your terms and conditions' checkbox) so fill in as honestly as your conscience requires. You'll get a zip file to download.<br />
<br />
When unzipped you'll have three files. For Benin these are called <span style="font-family: 'Courier New', Courier, monospace;">apben10v3 </span><span style="font-family: inherit;">with extensions </span><span style="font-family: 'Courier New', Courier, monospace;">hdr</span><span style="font-family: inherit;">, </span><span style="font-family: 'Courier New', Courier, monospace;">flt</span><span style="font-family: inherit;"> and </span><span style="font-family: 'Courier New', Courier, monospace;">prj.</span><br />
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span><br />
<span style="font-family: inherit;">The </span><span style="font-family: 'Courier New', Courier, monospace;">flt</span><span style="font-family: inherit;"> file is the data, the </span><span style="font-family: 'Courier New', Courier, monospace;">hdr</span><span style="font-family: inherit;"> is header info, and </span><span style="font-family: 'Courier New', Courier, monospace;">prj</span><span style="font-family: inherit;"> is projection info. Use gdalinfo to get a summary - run it on the </span><span style="font-family: 'Courier New', Courier, monospace;">flt</span><span style="font-family: inherit;"> file:</span><br />
<span style="font-family: inherit;"><br /></span><br />
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$ gdalinfo apben10v3.flt </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">Driver: EHdr/ESRI .hdr Labelled</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">Files: apben10v3.flt</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> apben10v3.hdr</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> apben10v3.prj</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[WGS84 coordinate system info here]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<div style="font-family: 'Courier New', Courier, monospace;">
Origin = (0.773989957311480,12.409206711176100)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Pixel Size = (0.000833300000000,-0.000833300000000)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Corner Coordinates:</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Upper Left (0.7739900,12.4092067) (0d46'26.36"E,12d24'33.14"N)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Lower Left (0.7739900,6.2252874) (0d46'26.36"E,6d13'31.03"N)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Upper Right (3.8522002,12.4092067)(3d51' 7.92"E,12d24'33.14"N)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Lower Right (3.8522002,6.2252874) (3d51' 7.92"E,6d13'31.03"N)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Center (2.3130951,9.3172471) (2d18'47.14"E, 9d19' 2.09"N)</div>
<div style="font-family: 'Courier New', Courier, monospace;">
Band 1 Block=3694x1 Type=Byte, ColorInterp=Undefined</div>
<div style="font-family: 'Courier New', Courier, monospace;">
NoData Value=-9999</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div>
<span style="font-family: inherit;">All pretty good. A standard ESRI file (except there's often nothing really standard about ESRI file formats). Let's read it and plot it using the raster package:</span></div>
</div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop=raster("./apben10v3.flt")</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> image(bpop)</span></div>
<div style="font-family: inherit;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdl5fUixAub935kZtzndpgctDgJ9AHV67ERN5O-2-5Zg-6ey24BfzdKzo3umewl0Lnyw0kglvLcy3H_WXopBdmVFSkLIoQDtR70wMbmzRcUdV65zeXmd7NC0uKFXQIYPRXiSSRR4bSmV0/s1600/screen_04.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdl5fUixAub935kZtzndpgctDgJ9AHV67ERN5O-2-5Zg-6ey24BfzdKzo3umewl0Lnyw0kglvLcy3H_WXopBdmVFSkLIoQDtR70wMbmzRcUdV65zeXmd7NC0uKFXQIYPRXiSSRR4bSmV0/s320/screen_04.png" width="320" /></a></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
<br /></div>
<div>
<span style="font-family: inherit;">Oh dear. Reading the docs about these files tells me that sometimes </span><span style="font-family: 'Courier New', Courier, monospace;">gdal</span><span style="font-family: inherit;"> just doesn't get it right. It said "Type=Byte", but the file is 109652696 bytes long, which is exactly 7421x3694x4 bytes. These are 4 byte values. In fact the AfriPop page says these files are </span><span style="font-family: 'Courier New', Courier, monospace;">"ESRI Float"</span><span style="font-family: inherit;">. Luckily there's an easy way to persuade gdal these are floats. Open up the </span><span style="font-family: 'Courier New', Courier, monospace;">hdr</span><span style="font-family: inherit;"> file (NOT the </span><span style="font-family: 'Courier New', Courier, monospace;">flt</span><span style="font-family: inherit;"> file. Don't open the </span><span style="font-family: 'Courier New', Courier, monospace;">flt </span><span style="font-family: inherit;">file!) in a text editor (</span><span style="font-family: 'Courier New', Courier, monospace;">emacs, vi, notepad++, notepad</span><span style="font-family: inherit;">) and add:</span></div>
</div>
<div style="font-family: inherit;">
<br /></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> pixeltype float</span></div>
<div>
<br /></div>
<div>
to the items. Then try again:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop=raster("./apben10v3.flt")</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> plot(bpop,asp=1)</span></div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIQfVP9EEVxikbwaaETbCv4Jhzkf_iOq1HOIh6ZOWyU4dxjDIybCpyuhiK2asAMIKpfmLPVcI6qAyMBJ_rehPdmBmN3HvfqxJy1903rZ08CksPUvWxNRWnwIKsB3k9n7WwMpTFpZy1j8Q/s1600/screen_08.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIQfVP9EEVxikbwaaETbCv4Jhzkf_iOq1HOIh6ZOWyU4dxjDIybCpyuhiK2asAMIKpfmLPVcI6qAyMBJ_rehPdmBmN3HvfqxJy1903rZ08CksPUvWxNRWnwIKsB3k9n7WwMpTFpZy1j8Q/s320/screen_08.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Looks good. Note we can get away with asp=1 since we're close to the equator. Or we can give it a projection:</div>
<div>
<br /></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> projection(bpop)="+proj=tmerc"</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> plot(bpop)</span></div>
<div>
<br /></div>
<div>
and that's just as good. We can also zoom in using the very nifty <span style="font-family: 'Courier New', Courier, monospace;">zoom</span> function from the <span style="font-family: 'Courier New', Courier, monospace;">raster</span> package:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjweQoVg_ihOdIUolMyuQ_XkQFD9YmocFQ9r_jfpUxn-WkKFecsCm4qfHvjrwENgc7idjYjRJT4gGVsa-_yoz5mT1isQxjNFnmSVpLEn_qz0SeRt0oncs9ntw1DNDjpsX9eWww_T15z7d4/s1600/screen_07.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjweQoVg_ihOdIUolMyuQ_XkQFD9YmocFQ9r_jfpUxn-WkKFecsCm4qfHvjrwENgc7idjYjRJT4gGVsa-_yoz5mT1isQxjNFnmSVpLEn_qz0SeRt0oncs9ntw1DNDjpsX9eWww_T15z7d4/s1600/screen_07.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhttx0-lso0xckBdh3EJLpoRKC2RsGQ3c_xQr8ShW5VMczOdXjIOHdSfvY3ByHC9wQVZBVmHFJp_l6W2EADV1PmdbHTiRnNWg3uGjmW3arP2lpXWBNKgl1fbRHl-ZicS0tV-NyAFOWLU-M/s1600/screen_06.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="291" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhttx0-lso0xckBdh3EJLpoRKC2RsGQ3c_xQr8ShW5VMczOdXjIOHdSfvY3ByHC9wQVZBVmHFJp_l6W2EADV1PmdbHTiRnNWg3uGjmW3arP2lpXWBNKgl1fbRHl-ZicS0tV-NyAFOWLU-M/s320/screen_06.png" width="320" /></a></div>
and inspect the data further.<br />
<br />
Note that the raster package does a really good job of dealing with large rasters. When plotting these things it is sampling the raster file so not all of it is read into memory. I don't have a monitor capable of showing 7000x3000 pixels (yet) so raster just grabs enough to look good. But if you try and do anything with this size of raster, you will strangle your machine and it may die. Don't do this:<br />
<br />
<br />
<span style="font-family: 'Courier New', Courier, monospace;">plot(log(bpop))</span><br />
<div>
<br /></div>
<div>
for example. Let's sample our raster down a bit for some more exploring. First I need to restart R after the last line crashed out. I'll sample down to something of the order of a few hundred pixels by a few hundred pixels by specifying the size as 500*500 - <span style="font-family: 'Courier New', Courier, monospace;">sampleRegular</span> will work out the exact resolution, keeping the aspect ratio. It's smart like that:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> library(raster)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop=raster("apben10v3.flt")</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> projection(bpop)="+proj=tmerc"</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> plot(bpop)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop2 = sampleRegular(bpop, 500*500, asRaster=TRUE)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop2</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">class : RasterLayer </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">dimensions : 708, 352, 249216 (nrow, ncol, ncell)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">resolution : 0.008744915, 0.008734349 (x, y)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">extent : 0.77399, 3.8522, 6.225287, 12.40921 (xmin, xmax, ymin, ymax)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">coord. ref. : +proj=tmerc +ellps=WGS84 </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">values : in memory</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">min value : 0 </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">max value : 336.4539 </span></div>
<div>
<br /></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> plot(bpop2)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> plot(log(bpop2))</span></div>
</div>
<div>
<br /></div>
<div>
[Perhaps we should use 'resample' here?] Now the log plot is doable:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhp26JAeXXLRj14qNfKRkzyQ4QPyCwC6jxphNRQ3-1KB_VtOLrtZZsOVcqWiGbmrN_QMnoEjmZfpRs5sHwccdPKqt2jA2n88kMJ2eW8lWHRLY2EESwsDUHyODAkgW5oIGU8zZgwnuA_7YE/s1600/screen_09.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="302" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhp26JAeXXLRj14qNfKRkzyQ4QPyCwC6jxphNRQ3-1KB_VtOLrtZZsOVcqWiGbmrN_QMnoEjmZfpRs5sHwccdPKqt2jA2n88kMJ2eW8lWHRLY2EESwsDUHyODAkgW5oIGU8zZgwnuA_7YE/s320/screen_09.png" width="320" /></a></div>
<div>
<br /></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
BUT these values are now wrong. The units of the original data are persons per grid square. Our grid squares are now bigger then the original grid squares, so more people live in them. We can correct this by scaling by the ratio of the cell sizes:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> bpop2= bpop2 * prod(res(bpop2))/prod(res(bpop))</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> plot(log(bpop2))</span></div>
</div>
<div>
<br /></div>
<div>
and now we get a more correct map:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjMuFqkuXwr0IdqEn88OdDLPadDE82I-YlLfSOp37aejziNJomVylK8xQVqMz6wK5eBULZY2g9QRph98UjAFd6yTfiSpsOcHG5e-CoFJwXN-A8SJZaM-RTF6Jie-HTBuxG6A2vtMr8w9o/s1600/screen_10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjMuFqkuXwr0IdqEn88OdDLPadDE82I-YlLfSOp37aejziNJomVylK8xQVqMz6wK5eBULZY2g9QRph98UjAFd6yTfiSpsOcHG5e-CoFJwXN-A8SJZaM-RTF6Jie-HTBuxG6A2vtMr8w9o/s320/screen_10.png" width="320" /></a></div>
<div>
I'm not sure how 'correct' or optimal this is of downsampling population density grids is - for example the total population (sum of grid cells) will be slightly different. Perhaps it is better to rescale the sampled grid to keep the total population the same? Thoughts and musings in comments for discussion please!</div>Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0University of Lancaster, Physics Ave, Lancashire LA1, UK54.011400326470948 -2.783489227294921954.00207032647095 -2.8032302272949217 54.020730326470947 -2.7637482272949221tag:blogger.com,1999:blog-2573156439353370360.post-72013775619477986232011-10-18T14:24:00.000+01:002011-10-18T14:24:50.425+01:00optifix - optim with fixed valuesSo you've written your objective function, fr, and want to optimise it over the two parameters. Fire <span style="font-family: inherit;">up optim and away you go:</span><div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> fr <- function(x) { ## Rosenbrock Banana function</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> x1 <- x[1]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> x2 <- x[2]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> }</span></div>
<div style="font-family: inherit;">
<br /></div>
</div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> optim(c(-1.2,1), fr)[c("par","value")]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$par</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] 1.000260 1.000506</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$value</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] 8.825241e-08</span></div>
</div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
Great. But sometimes you only want to optimise over some of your parameters, keeping the others fixed. Let's consider optimising the banana function for x2=0.3</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> fr <- function(x) { ## Rosenbrock Banana function</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> x1 <- x[1]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> x2 <- 0.3</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> }</span></div>
</div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<div style="font-family: 'Courier New', Courier, monospace;">
> optim(-1.2, fr,method="BFGS")[c("par","value")]</div>
<div style="font-family: 'Courier New', Courier, monospace;">
$par</div>
<div style="font-family: 'Courier New', Courier, monospace;">
[1] -0.5344572</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div style="font-family: 'Courier New', Courier, monospace;">
$value</div>
<div style="font-family: 'Courier New', Courier, monospace;">
[1] 2.375167</div>
<div style="font-family: 'Courier New', Courier, monospace;">
<br /></div>
<div>
<span style="font-family: inherit;">(R gripes about doing 1-d optimisation with Nelder-Mead, so I switch to BFGS)</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">Okay, but we had to write a new objective function with 0.3 coded into it. Now repeat for a number of values. That's a pain.</span></div>
</div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">The conventional solution would be to make x2 an extra parameter of fr:</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> fr <- function(x,x2) {</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> x1 <- x[1]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> }</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">and call optim with this passed in the dot-dot-dots:</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> optim(-1.2,fr,method="BFGS",x2=0.3)[c("par","value")]</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$par</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] -0.5344572</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$value</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] 2.375167</span></div>
<div style="font-family: inherit;">
<br /></div>
</div>
<div style="font-family: inherit;">
Okay great. But suppose now we want to fix x1 and optimise over x2. You're going to need another objective function. Suppose your function has twelve parameters... This is becoming a pain.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
You need optifix. Go back to the original 2 parameter banana function, and do:</div>
<div style="font-family: inherit;">
<br /></div>
<div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">> optifix(c(-1.2,0.3),c(FALSE,TRUE), fr,method="BFGS")</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$par</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] -0.5344572</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$value</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] 2.375167</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$counts</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">function gradient </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> 27 9 </span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$convergence</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] 0</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$message</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">NULL</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$fullpars</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] -0.5344572 0.3000000</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">$fixed</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;">[1] FALSE TRUE</span></div>
<div style="font-family: inherit;">
<br /></div>
</div>
<div style="font-family: inherit;">
Here the function call is almost the same as the 2-parameter optimisation, except with a second argument that tells which parameters to fix - and they are fixed at the values given in the first argument. The values in that argument for non-fixed parameters are starting values for the optimiser as usual.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
The output is the output from the reduced optimiser. In this case it is one parameter value, but there's also a couple of other components, $fullpars and $fixed, to tell you what the overall parameter values were and which ones were fixed.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Optifix takes your objective function and your specification of what's fixed to create a wrapper round your function that only takes the variable parameters. The wrapper inserts the fixed values in the right place and calls your original function. It then calls optim with a reduced vector of starting parameters and the wrapped objective function. Hence the wrapped function is only every called with the variable parameters, and the original function is called with the variables and the fixed values.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Optifix should work on optim calls that use a supplied gradient function - it wraps that as well to produce a reduced-dimensional gradient function for optim to work on.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Code? Oh you want the code? Okay. I've put a file in a <a href="http://www.maths.lancs.ac.uk/~rowlings/R/Optifix/">folder on a server</a> for you to download. Its very beta, needs testing and probably should find a package for its home. All things to do.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
<br /></div>Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0Lancaster University, County Ave, Lancaster, Lancashire LA1 4, UK54.012295559310274 -2.784433364868164154.011129059310278 -2.7869008648681639 54.013462059310271 -2.7819658648681642tag:blogger.com,1999:blog-2573156439353370360.post-12090019770353328732011-07-26T18:41:00.000+01:002011-07-26T18:41:43.377+01:00Mind The GapEvery so often Prof D watches Hans Rosling's TED talk or TV show and thinks to himself, "Barry could do that". And so the other week he said to me, "Could we do graphics like that in R?". Because he only really understands R, and Ben had recently cooked up some interactive R plots using tk widgets. Here's the kind of thing that www.gapminder.org can serve up with their Adobe Flash widget:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_vBrqmTY58AOFM61lmPA4gsCrx81M6enwzVA3FvpgVsTi78s9juxbIYDf9EhzSBxi5Mnx0PdkVPve9HlTegXPZn_yISCy57mlX-AVU626gJyhb4YaiRAxI-AuRXnBre9H6GrqTcmKsiQ/s1600/screen_28.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="152" width="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_vBrqmTY58AOFM61lmPA4gsCrx81M6enwzVA3FvpgVsTi78s9juxbIYDf9EhzSBxi5Mnx0PdkVPve9HlTegXPZn_yISCy57mlX-AVU626gJyhb4YaiRAxI-AuRXnBre9H6GrqTcmKsiQ/s200/screen_28.png" /></a></div><br />
Well obviously you could do this in R. Or something like it. But why? On CRAN there's the googleVis package which pumps your data into google's visualisation library and you can do just this kind of chart. Doing it in vanilla R would be painful, and the whole chart would have to be redrawn every time. It's not really designed for it.<br />
<br />
So I looked for another solution. Javascript in the browser is pretty swift now, and there's visualisation packages. I recently heard of D3, and thought I'd give it a go. Pretty soon I'd knocked out a space-time visualisation plot, and then spent another day or so mimicking gapminder's charts.<br />
<br />
Here then is preview number 1.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjWYBOuMpovRVucq_jw0somxSDs3JDRV33XrH2oBGZhJaeHZsx26vuy_jXoE76XnTJbsP76P1Cqf8XZ-GRg13HeRUNrn-yd9U6olYnrAlT9V7DnOpeBAFQQMW8iwV2EAUpomDpQ0zrTY/s1600/screen_27.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="200" width="195" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjWYBOuMpovRVucq_jw0somxSDs3JDRV33XrH2oBGZhJaeHZsx26vuy_jXoE76XnTJbsP76P1Cqf8XZ-GRg13HeRUNrn-yd9U6olYnrAlT9V7DnOpeBAFQQMW8iwV2EAUpomDpQ0zrTY/s200/screen_27.png" /></a></div><br />
This is created by a pretty tiny (but at the moment, untidy) chunk of javascript plus some R to write out the data in JSON format. The heavy lifting is done by D3, which has nice data-driven ways of controlling the visualisation. The user can click on the back/fwd links and the data advances a year, and it transitions smoothly between the data points. The colouring is by geographic region, and the circle size is by one of the variables. It's easy to switch the axis or size variables from the Firefox javascript console, and will be even easier when a proper GUI is wired up.<br />
<br />
I'll never have the time to make it as polished as Gapminder or Googlevis, but here is the start of an open-source version of these charts.<br />
<br />
Ooh did I say 'open-source'? Anyone want the code?Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-783205114250883882011-07-25T16:45:00.000+01:002011-07-25T16:45:15.113+01:00A Middle-aged Geek's Guide To The Arkestra : First Movement<h1>Intro</h1><br />
"Arkestra is an intelligent semantic Django-based CMS for organisations and institutions". Which is seemingly exactly what we want for our department web pages.<br />
<br />
The showcase site, Cardiff University's School Of Medicine is rather impressive.<br />
<br />
Arkestra is built on Django-CMS, and is open source. So I decided to play around with it. This blog entry will relate my progress. All this work is being done on an Ubuntu 10.04 box.<br />
<br />
<h1>Installation Packages</h1><br />
First, create a new virtualenv and get into it. Install the required packages using pip, and the cutting-edge stuff from source. Some of these packages are necessary for django-cms and Arkestra, and some are just necessary for the Arkestra example. I'm not sure which is which, so I grab everything.<br />
<br />
<pre>virtualenv --no-site-packages Test
. Test/bin/activate
cd Test
pip install PIL
pip install django
pip install BeautifulSoup
pip install django-typogrify
pip install pyquery
pip install -e git+http://github.com/stefanfoulis/django-widgetry#egg=django-widgetry
pip install django-filer # gets easy-thumbnails, django-mptt
pip uninstall -y django-mptt
pip install django-cms
pip install django-polymorphic
pip install django-appmedia
pip install -e hg+https://bitbucket.org/spookylukey/semanticeditor#egg=semanticeditor
pip install -e git+git://github.com/ojii/django-classy-tags.git#egg=classytags
git clone git://github.com/evildmp/Arkestra.git src/arkestra
</pre><br />
I tried to do all this with a pip requirements file but failed - the problem seems to be that django-filer will pull in django-mptt, but django-cms can't work with it.<br />
<br />
For reference, here's the output of pip freeze telling you what versions of what I have this working with:<br />
<br />
<pre>BeautifulSoup==3.2.0
Django==1.3
PIL==1.1.7
South==0.7.3
distribute==0.6.10
django-appmedia==1.0.1
-e git://github.com/ojii/django-classy-tags.git@8ad613b3bc0310ba9afb206136cfadbdfa8e6b01#egg=django_classy_tags-dev
django-cms==2.1.3
django-filer==0.8.2
django-polymorphic==0.2
django-typogrify==1.2.2
-e git+http://github.com/stefanfoulis/django-widgetry@57bbf529509e805a74b3e8f36e6938bef591e505#egg=django_widgetry-dev
easy-thumbnails==1.0-alpha-17
lxml==2.3
parti-all==0.0.7.22
pyquery==1.0
-e hg+https://bitbucket.org/spookylukey/semanticeditor@3a40b4766f32262be2cfe131cf20a19703bfc339#egg=semanticeditor-dev
smartypants==1.6.0.3
textile==2.1.5
wsgiref==0.1.2
</pre><br />
<h1>Config and Run</h1><br />
Head into the arkestra directory that git got for you, and try running the server:<br />
<br />
<pre>cd src/arkestra/example
export PYTHONPATH=.. # to find the arkestra modules
python manage.py runserver
</pre><br />
It should run, but you will fail to log in because there's no user accounts. Also, the example.db file isn't compatible with the version of django-cms and possibly other packages. So you can move it and run syncdb again:<br />
<br />
<pre>mv example.db example.db.orig
python manage.py syncdb # create db tables, create admin user
python manage.py runserver
</pre><br />
Now you can log in and create Page objects, and view them on the site. Don't forget to tick the published and navigation boxes if needed. To add content, you add a Text/Layout plugin and put the content in there.<br />
<br />
<h1>Some 404s</h1><br />
Watch your log output. At some point you may see some 404s caused by some magic autocomplete jQuery javascripts not being found. Try this from the example/ folder:<br />
<br />
<pre>cd media
ln -s ../../../arkestra/arkestra_utilities/static/ javascript
</pre><br />
That should stop the 404s.<br />
<br />
<h1>Loading the Demo Site</h1><br />
Note that there's a JSON-encoded version of the database supplied in example.json. In theory all you need to do is:<br />
<br />
<pre>python manage.py loaddata example.json
</pre><br />
over an empty database and get it all going. But there's those pesky database differences to deal with. I did go through it editing the changes and got it mostly working, but you'll probably want to build your Arkestra site from scratch. Which is what the second movement in this piece will probably be about.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-36176824404833485192011-06-26T16:44:00.000+01:002011-06-26T16:44:01.008+01:00Thoughts on OSGIS 2011OSGIS is the UK Open-Source GIS conference. This is the third year it has run, and the first time I've made it there. Nottingham University has been its home every time, mostly due to the presence of the Centre for Geospatial Science on campus, and the enthusiasm of Suchith Anand and Jeremy Morley, who seem to be running on free energy all the time.<br />
<br />
The meeting follows the standard template for scientific and technical conferences - there's some workshops, some invited keynote papers, and some general presentations. There's no poster session, and no novel ideas like lightning talks or group discussions - except perhaps in the Open Gov Workshop, but I wasn't there. Not that group discussions don't happen elsewhere, but only over coffee, great lunches, and beers.<br />
<br />
Links to the agenda and presentations are all on the <a href="http://cgs.nottingham.ac.uk/~osgis11/os_committee_scientific.html">Web Site</a> so I won't go into details here. The talks were also webcasted live and recorded, and should be available soon.<br />
<br />
The workshop sessions on <a href="http://www.qgis.org/">QGIS</a> were run by <a href="http://www.faunalia.co.uk/">Faunalia</a> who let us take away their 8Gb memory sticks packed with a bootable Linux. Very handy. There were also lots of OSGEO Live DVD discs kicking around too, used for the workshop on Ordnance Survey data. Clearly open-sourcers are finding more ways of working in labs full of Windows boxes. I once ran a workshop where we used Virtual Box and an OSGEO Live DVD image to hide the horrors of Windows XP.<br />
<br />
So what was hot or not at the conference? Here's my impressions: <br />
<br />
On the desktop:<br />
<ul><li>Hot hot hot: Lots of mentions of QGIS.</li>
<li>Also Hot: gvSIG (a day of workshops), GRASS here and there, OpenJump poked its nose in.</li>
<li>Not Hot: No-show GIS packages: uDig, SAGA.</li>
<li>Hot: OpenLayers and GeoExt.</li>
<li>Not hot: The Cloud. Its not the answer to everything.</li>
</ul><br />
On the server:<br />
<ul><li>Hot: PostGIS continues its dominance.</li>
<li>Not Hot: Mentions of Oracle Spatial were usually accompanied by apologies. Usual excuse was 'we had the license anyway'.</li>
<li>Not Hot: No sign of NoSQL DB technology - where's the MongoDB Spatial people?</li>
<li>Hot: WFS, WMS, WCS and even WPS becoming almost commonplace.</li>
</ul><br />
In the hands of the programmer:<br />
<ul><li>Warm: Only one statistics package mentioned - <a href="http://www.r-project.org/">R</a> of course - although the <a href="http://orange.biolab.si/">Orange Toolbox</a> gets a look-in.</li>
<li>Hot Programming languages: Python, Java, JavaScript, C, R, SQL. I think I saw some Perl.</li>
<li>Cold Programming languages: C#, Ruby, Visual Basic/C.</li>
<li>Warm and fuzzy feelings from: Drupal, Django, ExtJS.</li>
<li>Cold and dead to us: Sharepoint. IIS. .net. Silverlight. Flash.</li>
</ul><br />
Hardware:<br />
<ul><li>Hot: Android phones - nearly everywhere!</li>
<li>Cold: iPhones - hard to spot. Either nobody had them or were ashamed to show them.</li>
<li>Cold: Tablets. Think I saw one Samsung Galaxy Tab but nobody was showing off iPads. Or iPhone Maxis as they are sometimes known.</li>
</ul><br />
No great surprises there.<br />
<br />
A few people were tweeting useful things during the conference, and so I should apologise to Jeremy Morley for not saying hello IRL and only communicating via twitter. I will try and submit a paper on doing GIS in R for next year. It will be more than 140 characters.<br />
<br />
Also great to finally meet Suchith - the force behind lots of interesting things going on at Nottingham.<br />
<br />
Tyler Mitchell rolled in from across the pond and did well against the jet lag to present a keynote speech and network like crazy for two days. OSGEO are very busy both globally and locally.<br />
<br />
In all the conference did feel like a mini <a href="http://2011.foss4g.org/">FOSS4G</a>, but with a local feel - but not a restriction that cut down the quality of the talks or the participants or the coffee chat or the lunch discussions or the beer conversations. See you next year!Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-12212421549097413092011-06-01T18:27:00.001+01:002011-06-01T18:31:03.970+01:00Reading Layers from a Qgis Project File.Some notes on reading individual layers out of a Qgis Project file. This should end up in a 'import layers from a project file' plugin.<br />
<br />
1. get the xml into some text:<br />
<br />
xml = file("project.qgs").read()<br />
<br />
<br />
2. make a blank QDomDocument and set the content:<br />
<br />
d = PyQt4.QtXml.QDomDocument()<br />
d.setContent(xml)<br />
<br />
<br />
3. Now find the maplayers:<br />
<br />
maps = d.elementsByTagName("maplayer")<br />
<br />
<br />
4. How many?<br />
<br />
maps.length()<br />
<br />
<br />
5. Now read the fourth layer (0-index) into running Qgis:<br />
<br />
QgsProject.instance().read(maps.item(3))<br />
<br />
Wrap this in a nice gui that tells the user what the map layers are, lets them check boxes, import the layers, marks the current project as dirty. Jobzagoodun.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-43349999906881597202011-04-12T16:24:00.000+01:002011-04-12T16:24:37.204+01:00OpenLayers 2.10 Beginners Guide Book ReviewA man is out driving in unfamiliar countryside. Inevitably he gets lost, so he pulls over to ask a local farmer for directions. "Oh no", says the farmer, "I wouldn't start from here".<br />
<br />
It's an old joke (and not a very funny one), but one that springs to mind when people ask certain questions on mailing lists. Someone with very little computing experience will appear on the OpenLayers mailing list (or IRC) and say "How do I put a map on my web page?". My answer would be "Well, I wouldn't start from here".<br />
<br />
First, I'd say, learn programming using a nice friendly language - python perhaps. Then get a basic understanding of how client-server systems work, and how the web is implemented via the HTTP protocol. Then follow the 20-something year evolution of HTML - from its humble beginnings through to its happy marriage with CSS and their successful menage-a-trois together with Javascript. Ignore its affair with Java - that was a moment of crazed passion, and Java is now out of the HTML/browser party and busily running the "enterprisey" affairs of corporations. <br />
<br />
With a knowledge of HTTP, HTML, CSS and Javascript, you are ready to put maps on your web pages. And I'd choose OpenLayers to do it.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6Ci_V7CEE2h4Sd932zwcFz1_BoT8jH0FC20G4iZYygZ3F1D23QMLm3tbWiWtrG5XH8CAbnKTE0Uvo8IVEQZZMxCoyXaXrJnsmp0P-1RxvVYhf2QXIVO-wWhFSB4UAdGjbX24TDN5R9E4/s1600/4125OS_OpenLayers+2.10+Beginner%2527s+Guidecov_0.jpg.png" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="152" width="125" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6Ci_V7CEE2h4Sd932zwcFz1_BoT8jH0FC20G4iZYygZ3F1D23QMLm3tbWiWtrG5XH8CAbnKTE0Uvo8IVEQZZMxCoyXaXrJnsmp0P-1RxvVYhf2QXIVO-wWhFSB4UAdGjbX24TDN5R9E4/s200/4125OS_OpenLayers+2.10+Beginner%2527s+Guidecov_0.jpg.png" /></a></div><br />
<a href="http://vasir.net/">Erik Hazzard</a>'s book, <a href="http://www.packtpub.com/openlayers-2-1-javascript-web-mapping-library-beginners-guide/book">"OpenLayers 2.10 Beginner's Guide"</a>, published by Packt Publishing, is designed to get people with very little computing experience all the way through that process to the point of quite complex map production. He has to cover everything, and does so pretty well (although I'm not sure about his exposition on Object-oriented programming). Although he doesn't teach programming, he encourages users to cut-and-paste code and then experiment. <br />
<br />
The sample chapter will show you how the book works - simple explanations, example code, and open-ended questions and mini-assignments. The Firebug debugger is introduced early on in the book, and I think this is a great idea. Firebug didn't take long to repay me the time I took to grok what it could do. Erik sprinkles it liberally on the examples, since interactive inspection of a program is a great way to really see what it's doing.<br />
<br />
Inevitably with a book like this, large chunks of its 372 pages will be similar to dumps of the API documentation from the OpenLayers web site. Erik has fleshed out a lot of the terse API information to include fuller examples to make the book more than just that - I can imagine hardcopies of this book with a multitude of flapping yellow sticky notes for their owners to get rapid access to useful pages. Packt Publishing will sell you the PDF version for those who like to search, and the paper version for those who like to stick.<br />
<br />
By the end of the book you should have the knowledge to understand how the sample Flickr map application developed in the last chapter works, and you'll also have a handy reference to a large percentage of the features of OpenLayers. The one glaring omission is that there is no mention of the OpenLayers "Popup" class, typically used for "info-balloons" when clicking on map features. But if you can work through the book, then it should be no problem to work out how to use them from the API documentation. <br />
<br />
There seem to be very few books on OpenLayers on the market at the moment, so if you want to make maps and would like a high-quality step-by-step guide to it, then this is the book for you. <br />
<br />
Note that OpenLayers, like all web technologies is evolving fast, so watch out for updates and get the latest version of the book. A recent OpenLayers hackfest concentrated on developing functionality for mobile devices, so expect to see more OpenLayers applications for your smartphone soon. With Google insisting on presenting advertising along with their map images, it might be a good time to switch to OpenStreetMap for your map background and OpenLayers for your mapping client.<br />
<br />
Barry<br />
<br />
<br />
Note: I was provided with a PDF copy of the book for review, and I have no connection with the author or publisher.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-41080270075003323572011-03-25T12:28:00.000+00:002011-03-25T12:28:07.930+00:00Tolerance and bridgesSo I'm modelling the H1N1 Swine Flu outbreak in Glasgow. One of the questions is how different regions (specifically 'datazones') within the study area interact with each other. You'd expect zones next to each other to interact with each other more than those on opposite sides of the city, for example. So our first idea is to use the distance between centroids as an explanatory variable and a distance-decay parameterisation for the effect. That works quite nicely.<br />
<br />
But the real world isn't quite like that. Glasgow has a river running through it, splitting our study region into north and south parts. I wanted a new distance metric to reflect this partition.<br />
<br />
So I came up with this idea - that the distance between two datazones is the minimum distance between the centroids of the graph formed by adjacent datazones.<br />
<br />
First problem of that is computing the adjacencies between datazones. In R, I can just use poly2nb from the spdep package and get that. But I found that the graph created wasn't connected over the study region. Further investigation showed that polygons that clearly looked adjacent weren't being considered as adjacent. Here's a screenshot that shows three regions that look adjacent:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJFw4xK3BmDEgMTCD5HNP9_hXbIcnoahSXkGsHlJnyeFgMgNTpPiqJKMAu7VurVnTYa-UKAJRChP28mzRRpaWGpyTIOD1y_S7EYU4rkmLzBGKRxHwCdhgKzVdjKUOVJUy0uLna64MyztU/s1600/regions1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="266" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJFw4xK3BmDEgMTCD5HNP9_hXbIcnoahSXkGsHlJnyeFgMgNTpPiqJKMAu7VurVnTYa-UKAJRChP28mzRRpaWGpyTIOD1y_S7EYU4rkmLzBGKRxHwCdhgKzVdjKUOVJUy0uLna64MyztU/s320/regions1.png" /></a></div><br />
Now why wasn't poly2nb saying they were connected? I zoomed in for the answer:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUQHDhAokAz9-0HsGucuBIENi4BF2G1AFnpGqWyn6v9bBKvBpgMX1OAPYQliJaKVTMc0Gwamokk7iEGMU0mNUpy4ww8DR1bG3W7ubpmJ7_jkhEikNlpukwTADkcQhS6TRYoLwvc4PJV0/s1600/regions2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="266" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUQHDhAokAz9-0HsGucuBIENi4BF2G1AFnpGqWyn6v9bBKvBpgMX1OAPYQliJaKVTMc0Gwamokk7iEGMU0mNUpy4ww8DR1bG3W7ubpmJ7_jkhEikNlpukwTADkcQhS6TRYoLwvc4PJV0/s320/regions2.png" /></a></div><br />
Check the scale bar. These polygons are about a hundredth of a millimetre apart. That's a very tiny no-man's land. More like a "no-fly zone".<br />
<br />
I figured I needed some kind of buffer on the polygons, and started looking into the rgeos package, recently released onto CRAN. It'll do that, and all sorts of things, but then when I tried working out adjacencies using its gIntersects function I discovered it worked to those tolerances already. By some magic the apparently adjacent regions were adjacent again.<br />
<br />
However the graph was still not fully connected - it was composed of two subgraphs - the river is wide enough, and runs fully from east to west, that no fuzzy tolerances could really overlap the polygons. And that would probably be the wrong thing to do anyway. <br />
<br />
So I loaded the regions into Quantum GIS and then loaded some Google Maps as underlays using the marvellous Google Maps plugin. I created a new vector layer of lines and started adding extra connections where bridges cross the Clyde:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDhSx1zmaRBRuJWoMUey6urTzJoKNRRP_ZB45KVRGC4RPDwhGaLZ-lPT5cxEgTfNFz0uefK1E9-0m88WM_4mzNzj_I1G42Dj3KhyMuX2e7G9yXxsJGxUq33SoSo9O_pc6JVXgwqglxV3s/s1600/bridges.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="266" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDhSx1zmaRBRuJWoMUey6urTzJoKNRRP_ZB45KVRGC4RPDwhGaLZ-lPT5cxEgTfNFz0uefK1E9-0m88WM_4mzNzj_I1G42Dj3KhyMuX2e7G9yXxsJGxUq33SoSo9O_pc6JVXgwqglxV3s/s320/bridges.png" /></a></div><br />
These lines, shown in red above, were saved as a shapefile, then loaded into R and their start and endpoints overlaid on the datazones. That gave me some extra edges to add to the adjacency graph, all the graph stuff being handled by R's igraph package (including the distance-weighted minimum distance algorithm along adjacent datazones).<br />
<br />
So now I have a completely-connected graph of datazones, which I can convert into a full distance matrix that reflects the geometry of the study region. The university's High-End Cluster is currently churning out the number for the model fit, and the results will hopefully be published in a journal sometime... <br />
<br />
Thanks to: R, Quantum GIS, Google Maps plugin authors Ivan Mincik and Nungyao Lin, R package developers for sp, spdep, rgeos, igraph.Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com0tag:blogger.com,1999:blog-2573156439353370360.post-21225349339103767812010-12-22T19:43:00.001+00:002010-12-22T20:45:16.637+00:00Mapping The Kendal Mint QuakeAbout 11pm last night my wall of photographs rattled. It's not unusual, it seems to happen whenever anyone is having a washing machine delivered and a truck crawls past my house. But a minute after that I was on the computer and someone local was tweeting about her house shaking. Earthquake? A few twitter searches revealed twitterers in Cumbria and North Lancashire had indeed experienced a small earthquake.<br />
<br />
So I did what all statisticians would do - go looking for data. Tweets can be geo-coded, but getting reliable info might be hard. I checked the USGS earthquake site, but it's not real-time. But then I found the <a href="http://www.earthquakes.bgs.ac.uk/">British Geological Society</a> earthquakes page, with true real-time reports from a bunch of sensors around the country. Under the Monitoring menu there's the Real-Time Data option.<br />
<br />
Here you can choose the station and a day and get a seismogram. Here's what the one in Keswick (KESW) was showing:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnoelhSeEebusLcUDxiNUA176d7HgiK6b2r-MhSaflvH6bjJxI0lb67RoxjgdJJcMJ54afHbXVLjNrH-GbLEULPX48lBWDJZ5lbi0VM4LdBE4Poa5zucaQb7VFdaG7o_idqb2FSeVeIHI/s1600/screenshot_003.png" imageanchor="1" ><img border="0" height="292" width="314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnoelhSeEebusLcUDxiNUA176d7HgiK6b2r-MhSaflvH6bjJxI0lb67RoxjgdJJcMJ54afHbXVLjNrH-GbLEULPX48lBWDJZ5lbi0VM4LdBE4Poa5zucaQb7VFdaG7o_idqb2FSeVeIHI/s320/screenshot_003.png" /></a></div><br />
BIG spike. It went off the scale of the plot. I looked at a few other plots and discovered most of them had recorded it too. And I noticed the quake had hit at slightly different times. Maybe I could use that information to work out the location of the quake.<br />
<br />
At this point the BGS site was getting slow as worried Cumbrians checked to see if it was an earthquake or if the local nuclear power station had exploded. Anyway, it was bedtime, so I left it to the morning, as something to do in my lunch hour. <br />
<br />
I couldn't find a way to get the raw data, so I decided I'd get what I could from the images. A quick script pulled all the GIF images from the sites, except for a couple that seemed to be off-line. Another script pulled all the station information book HTML files and from there I got the locations.<br />
<br />
I then stepped through each of the 24 images, identifying the point near 11pm where the tremors started, and entered the data. <br />
<br />
Now I had the location and time of the points where the quake had been sensed. This was all converted to OSGB coordinates using rgdal in R.<br />
<br />
I didn't have a zero-time for the quake or a location. A quick plot did show the quake had to be somewhere in Cumbria.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjw-sN10NruvtV_iMVQ5ijaBR-LNzawad9Fi8sjd-j8vfp6kFWaCOBl8JG7NoXkwJ18B-7XUAbZNrLllZ_82uN52MJSUKhgvVrm3pc7FA4GYrF7m_DYDCR060cwUJShJ98bigepRxB_FsE/s1600/screenshot_001.png" imageanchor="1" ><img border="0" height="320" width="207" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjw-sN10NruvtV_iMVQ5ijaBR-LNzawad9Fi8sjd-j8vfp6kFWaCOBl8JG7NoXkwJ18B-7XUAbZNrLllZ_82uN52MJSUKhgvVrm3pc7FA4GYrF7m_DYDCR060cwUJShJ98bigepRxB_FsE/s320/screenshot_001.png" /></a></div><br />
I don't know how professional seismologists work out earthquake epicentres, so I came up with something quick and dirty.<br />
<br />
Assume the velocity of the quake waves are constant across the UK. Then if the quake was at location (x,y) then a plot of distance against time should be a straight line, with all the points on it. Some quick experiments showed I could fit a straight line then use the sum of the residuals squared as a measure to be minimised over (x,y). A few lines of code later and I had a coordinate!<br />
<br />
<pre>> fit$par
[1] 307499.9 506035.7
</pre><br />
By now the quake had hit the news and the USGS web site was showing it:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4qFsEtDDRzZiXZPoaVhzDzs16_zs-E2qGgkMZVtsamvvPo9GBtA7dA_Wz6qcK_wGxRjJLwVWa5IN0xtV9Sr2xPllhAAu7c87t3gnNR8_Z3ydlowKpCW5A-PnxYB946wEilGIoZqmAH8o/s1600/neic_c0000sdh_2.gif" imageanchor="1" ><img border="0" height="320" width="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4qFsEtDDRzZiXZPoaVhzDzs16_zs-E2qGgkMZVtsamvvPo9GBtA7dA_Wz6qcK_wGxRjJLwVWa5IN0xtV9Sr2xPllhAAu7c87t3gnNR8_Z3ydlowKpCW5A-PnxYB946wEilGIoZqmAH8o/s320/neic_c0000sdh_2.gif" /></a></div><br />
How did that compare to my point? I converted it to OSGB coordinates:<br />
<br />
<pre>> coordinates(usgsos)
coords.x1 coords.x2
[1,] 328994.6 500054.7
</pre><br />
20km further east, and 6km south. Not bad considering I had to eyeball the time from the seismograms and various other assumptions. I don't know what kind of uncertainty the USGS put on their predictions, and I can't put an uncertainty on my location either since there's not a proper statistical model behind this. I can give you the data if you have a better idea - or you can scrape it again from the web site! Here's a comparison of the epicentre predictions on a map:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihNgLuFJLlCuLiEERQVTZ58xauFs26vvnW6UlsCBZwBO7A4wnEp-30ziEH3vw9z3QLIjc9Wow5k3ljEJHgMgoNGiLi8eBwFSWXtRuLbXBQahl1Lqpc6qSDeQ4R7_BuoDAmb1FSQw9uYWM/s1600/screenshot_005.png" imageanchor="1" ><img border="0" height="288" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihNgLuFJLlCuLiEERQVTZ58xauFs26vvnW6UlsCBZwBO7A4wnEp-30ziEH3vw9z3QLIjc9Wow5k3ljEJHgMgoNGiLi8eBwFSWXtRuLbXBQahl1Lqpc6qSDeQ4R7_BuoDAmb1FSQw9uYWM/s320/screenshot_005.png" /></a></div><br />
Here's what the distance/time plot looks like for my predicted epicentre:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyC_Z16QfSRjU6RojZ8-yxsverIjhIzgYd0-t0JKEihObyDOdFFGQThsGs8b24eD_spVFtiUpvE4rj04QJTGD0dV0U0OGI6ej76k4lSuSnMHKr-ZDC-Bjdfw0dxczw8mZS9gQBha2e5BI/s1600/screenshot_002.png" imageanchor="1" ><img border="0" height="200" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyC_Z16QfSRjU6RojZ8-yxsverIjhIzgYd0-t0JKEihObyDOdFFGQThsGs8b24eD_spVFtiUpvE4rj04QJTGD0dV0U0OGI6ej76k4lSuSnMHKr-ZDC-Bjdfw0dxczw8mZS9gQBha2e5BI/s320/screenshot_002.png" /></a></div><br />
And from the slope of this we can get the speed of travel of the quake waves - about 7,500 km/s - which is about what a quick google search on earthquakes tells me.<br />
<br />
Footnote: I think on the bike home tonight I worked out a way to do this statistically. Suppose the earthquake is at (x,y), at time t0, and the waves move with speed v. Then suppose the time of arrival of the quake at a point at distance d is Normally distributed with mean time t0+d/v and variance k(d;theta), where k() is some increasing function of d. This allows us to consider the increasing noise in the system for points further away. Fit the whole thing by maximum likelihood and we get estimates (with errors) for location, time of origin, and wave speed. Something for next lunchtime...Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1tag:blogger.com,1999:blog-2573156439353370360.post-43456134526627130492010-12-21T16:03:00.000+00:002010-12-21T16:03:40.202+00:00Speeding up some R CodePart of my job involves looking at code written by research students and trying to make it do other things. Often a student is focused on their application area or particular dataset, and so the code and data are inextricably linked. Add another observation to the dataset and you'll find yourself having to change 3467 to 3468 in a dozen places. And maybe find some 6934s and change them to 6936s. The other problem is that research students are mostly new to serious programming, and so still need to develop both intuition and formal skills.<br />
<br />
Today I've been working on such code for predicting epidemic outcomes. It was taking 14 seconds to run, which isn't too bad (especially considering some other work I'm doing takes 6 hours using OpenMP on an 8-core machine), but in my inspection I'd spotted some odd R code.<br />
<br />
<code>exp(var%*%t(coef[1,]))</code><br />
<br />
The oddity here is that var is one-dimensional and so is coef[1,]. This matrix multiply is really just something like exp(sum(var*coef[,1])). So I tried that, and the code took 6 times longer to run. Matrix multiplications are efficient in R, even in one dimension. <br />
<br />
So I switched back to the original code. Time to get profiling. This is easy in R.<br />
<br />
<code><br />
Rprof("run.out")<br />
dostuff()<br />
Rprof("")<br />
summaryProfile("run.out")<br />
</code><br />
<br />
This told me most of the time was spent in the <code>t</code> function, within the function I was playing with earlier. Then I noticed that pretty much wherever <code>coef</code> was used, it was transposed with the <code>t</code> function.<br />
<br />
I refactored the code to do <code>t(coef)</code> once, and then pass the transposed matrix around. That sped the code up from the original 14 seconds to 3 seconds. Sweet.<br />
<br />
Note that at each stage I check that the new output is the same as the old output - it can be useful to write some little test functions at this point, but doing formal test-driven development can be a bit tricky for statistical applications in R.<br />
<br />
There's still a bit of work to do to make this code a bit more general purpose. Currently it works on predictions for one particular country - we want to do predictions over half of Africa!Anonymoushttp://www.blogger.com/profile/03263677417814151768noreply@blogger.com1