Geocoding: Turning text into numbers

One of my favorite libraries for doing geo in Python is GeoPY.

It’s able to do one of the next tools we need in our toolbox, plus a few others.

What is this new tool? Geocoding.

Geocoding is the process of taking user-provided information and turning it into a geometry that we can use in our application. It’s how we turn “402 S. Main St. Joplin MO” into 37.08764, -94.513324999999995.

And it’s hard. TIGER/Line provides some of the basics but isn’t easy or forgiving. When PostGIS 2.0 ships it will actually have a built-in TIGER geocoder, which will be great for rough shots at determining where things are. But because TIGER/Line isn’t all that great, and the proliferation of GPS devices there is a small army of address verification vehicles on the roads determining where this data falls short and determining how to fix it then selling the data back to companies who need accurate navigation data.

Google, Yahoo, Bing and a few (much smaller) others provide this as a service. Read the TOS and determine if it is compatible with your usage. I am not a lawyer, nor am I recommending you do anything with this data. For this particular example we’re not (yet – but will) displaying the geocoded result on their maps so may be out of compliance with certain providers.

Geocoder.us is free for non-commercial use (or you can pay for commercial access), which suits us just fine – but has its shortcomings.

>>> from geopy import geocoders
>>> dotus = geocoders.GeocoderDotUS()
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire Lawrence KS 66044")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire St. Lawrence KS 66044")
>>> place
u'609 New Hampshire St, Lawrence, KS 66044'
>>> lat
38.972679999999997
>>> lon
-95.234657999999996

Check out what happens if we (or the user) “forget” to include “St.”. Nothing found.

The great thing about GeoPY is it’s a full abstraction of the various APIs. If you pass a geocoder instance into whatever function geocodes, you can quickly swap to a different one if your needs change.

>>> def get_location(geocoder, text):
...     place, (lat, lon) = geocoder.geocode(text)
...     return (lat, lon)
...
>>> get_location(g, "609 New Hampshire St. Lawrence KS 66044")
(38.972327999999997, -95.235273000000007)
>>> get_location(dotus, "609 New Hampshire St. Lawrence KS 66044")
(38.972679999999997, -95.234657999999996)

The geocoder class always takes any required arguments (API keys etc) in the constructor so you can set it outside and keep your internal code generic.

Now we can interpret input from our users and figure out where they are – OR add missing geographic metadata to records we already have.

But this isn’t ALL GeoPY has available for us. It also has a very usable, no dependency method of calculating distances to locations (yes, PostGIS will do it but does require setup) regardless of your database backend.

Note that GeoPY takes its coordinates in the way we’re used to seeing them; latitude, longitude instead of GEOS/GDAL style longitude, latitude. Since both classes are named Point I foresee this biting you at some point (it has me).

>>> from geopy.point import Point
>>> jln = Point(37.1518136, -94.4982683)
>>> osh = Point(43.9843528, -88.5570417)

Now that we’ve defined points, let’s see how far between them.

>>> from geopy import distance
>>> distance.GreatCircleDistance(jln, osh).miles
565.5739219367847
>>> distance.VincentyDistance(jln, osh).miles
565.3214657121755

There’s a difference between these two – Great Circle assumes Earth is perfectly round. Vincenty came up with a formula that uses ellipsoids instead. Fed with the WGS-84 default ellipsoid it comes much closer to reality – but as you can see even over large distances the inaccuracy isn’t startling.

>>> starcity = Point(55.878333, 38.061667)
>>> distance.GreatCircleDistance(jln, starcity).miles
5432.6569119874948
>>> distance.VincentyDistance(jln, starcity).miles
5445.2435752403135

Leave a Reply

Your email address will not be published.