Just for fun I thought I'd write a very (very) basic geocoder for OpenStreetMap.org data.
OSM data is very easy to work with, I think it's a good reflection of their pragmatic approach to mapping.
The approach I've used is pretty straight forward. Place names and the search address are normalized a bit. Extra whitespace is stripped, along with non alphanumeric characters, then converted to lower case. If there's an exact match between the search address and a place name this result is returned immediately. Otherwise...
Words from place names in the map are used to build up the word frequency table used for the spelling correction algorithm (devised by Peter Norvig). This is used to correct any misspellings in each word of the search address and if a direct match is found it's returned.
Finally, if there is still no match, the address with the most words in common with the corrected search address is returned.
It's a naive approach, but still fun to write and a nice way to play around with OSM data.
You can check out the source on github.
