Just for fun I thought I'd write a very (very) basic geocoder for OpenStreetMap.org data.

OSM data is very easy to work with, I think it's a good reflection of their pragmatic approach to mapping.

The approach I've used is pretty straight forward. Place names and the search address are normalized a bit. Extra whitespace is stripped, along with non alphanumeric characters, then converted to lower case. If there's an exact match between the search address and a place name this result is returned immediately. Otherwise...

Words from place names in the map are used to build up the word frequency table used for the spelling correction algorithm (devised by Peter Norvig). This is used to correct any misspellings in each word of the search address and if a direct match is found it's returned.

Finally, if there is still no match, the address with the most words in common with the corrected search address is returned.

It's a naive approach, but still fun to write and a nice way to play around with OSM data.

You can check out the source on github.

Comment

You can leave a comment by twittering with the text http://dogg.ie/.4nw2q6 anywhere in your tweet. The link below will do this for you with pure magic (and some glue).

Sometime in the next 5 minutes, if everything goes to plan, your comment will appear on this page. If you can't fit everything you want to say into 117 characters, write a blog post and tweet a link to that.

Take me to Twitter, I've got something to say.

Comments

None yet.