Mapping better – Google Maps API v3

I tend to do my Google v3 work purely in my templates.

For this example I’ve got a view that will return the objects to be displayed in `object_list` in the context – same dataset as last time, only instead of building things inside the view we’ll do it in our Django template.

from uscampgrounds.models import Campground

def gmap3(request):
    campgrounds = Campground.objects.filter(point__distance_lte=(Point((-94, 37)), D(mi=100)))

    return render_to_response('gmap3.html', {
        'object_list': campgrounds,
    }, context_instance=RequestContext(request))

For our base template – defining a map area, loading the initial Google JavaScript: (I named it gmap3base.html because there are several base and normal templates in the project I’m building from)

<html>
    <head>
        <title>{% block page-title %}Simple Google Maps API v3{% endblock %}</title>
        <script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=false"></script>
        {% block head_override %}{% endblock head_override %}
    </head>
    <body{% block body_override %}{% endblock body_override %}>
        {% block content %}
        {% endblock content %}
    </body>
</html>

Next our map creating template (I named mine gmap3.html):

{% extends "gmap3base.html" %}

{% block page-title %}Requested Items{% endblock %}

{% block head_override %}
<script type="text/javascript">
  var bounds = new google.maps.LatLngBounds();

  function buildMarker(map, latitude, longitude, name, color) {
    var latlng = new google.maps.LatLng(latitude, longitude);
    var marker = new google.maps.Marker({
          position: latlng,
          map: map,
          title:name,
    });
    marker.setIcon('http://maps.google.com/mapfiles/ms/icons/' + color + '-dot.png');
    bounds.extend(latlng);
    return marker
  }

  function mapInitialize() {
    var myOptions = {
      zoom: 6,
      center: new google.maps.LatLng(0, 0),
      mapTypeId: google.maps.MapTypeId.SATELLITE
    };
    var map = new google.maps.Map(document.getElementById("map_canvas"),
        myOptions);

    {% for object in object_list %}marker{{ object.pk }} = buildMarker(map, {{ object.point.y }}, {{ object.point.x }}, "{{ object }}", 'red');
    {% endfor %}

    map.fitBounds(bounds);
  }

</script>
{% endblock head_override %}

{% block body_override %} onload="mapInitialize()"{% endblock body_override %}

{% block content %}
<div id="map_canvas" style='width:600px;height:400px;'></div>
{% endblock content %}

This doesn’t do anything fancy – just the simple behavior we got from using GeoDjango’s map generator – but since we generated the JavaScript ourselves, we know what things will be named and can add on our own JavaScript much easier – and this works without modification. No API key creating, switching based on what site we’re on, it just works.

Since we generated normal JavaScript we also have the benefit of being able to use tutorials and examples found all over the internet to fine tune things. Want an onClick event that opens an infoWindow? It’s quite easy. Want to do zooming differently, say to a range from the search point instead of making every point visible? You can do that too. [you will need another geometry passed into the context for the search location in order to do this]

On the Map

So far we’ve done all kinds of wearying and information – but never left a Python console. I suppose it’s time we fall into the web mapping world and look at our options.

On the left of the easy-to-hard scale is Google Maps, API v2. This is the deprecated version, but is the only frontend generation available to us out of the box. It’s not mentioned in the documentation, but is well represented in the class’ docstring. (django.contrib.gis.maps.google)

Follow the doctoring to get your template set up appropriately.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
{{ google.xhtml }}
<head>
 <title>Google Maps via GeoDjango</title>
 {{ google.style }}
 {{ google.scripts }}
</head>
{{ google.body }}
<div id="{{ google.dom_id }}" style="width:600px;height:400px;"></div>
</body>
</html>

We create markers from the points we have with this:

Then instantiate the map with a markers kwarg (you can include key=’api_key_here’ in the class instantiation or it will fall back to settings.GOOGLE_MAPS_API_KEY), pass it into the template and watch in amazement as a map materializes before your eyes!

from django.contrib.gis.geos import Point
from django.contrib.gis.maps.google import GoogleMap, GMarker, GPolygon
from django.contrib.gis.measure import D

from uscampgrounds.models import Campground

def gmap2(request):
	campgrounds = Campground.objects.filter(point__distance_lte=(Point((-94, 37)), D(mi=100)))

	markers = []
	for campground in campgrounds:
		markers.append(GMarker(campground.point, title=campground.name))

	the_map = GoogleMap(markers=markers)

	return render_to_response('googlemap.html', {
		'google': the_map,
	}, context_instance=RequestContext(request))

You can do this with polygons too, as this code illustrates.

from django.contrib.gis.geos import Point
from django.contrib.gis.maps.google import GoogleMap, GMarker, GPolygon
from django.contrib.gis.measure import D

from tigerline.models import State

def gmap2_poly(request):
	states = State.objects.filter(mpoly__distance_lte=(Point((-94, 37)), D(mi=100)))

	polygons = []
	for state in states:
		polygons.append(GPolygon(state.mpoly[0].simplify()))

	the_map = GoogleMap(polygons=polygons)

	return render_to_response('googlemap.html', {
		'google': the_map,
	}, context_instance=RequestContext(request))

The resulting HTML, even after running simplify() on the geometry, is 2.4MB so I won’t display it here – but here’s a link. Warning: displaying polygons gets big. I’m asking for trouble here, after all – every single point that makes up the boundary of four states is going to be a bunch.

django.contrib.gis.maps.google is great for quick, don’t need a lot of details or fancy overlays maps. But it leaves out an easy way to do click events on markers and labeling, and API keys are inconvenient.

IF you do need onClick events, infoWindows from markers or any of that stuff there are hacks – but it involves looping over the items in the template as well and generating JavaScript that infers what it knows this map generation code will name things. Best to bite the bullet and build API v3 code in a template on your own (coming soon!)

Faking it: Geospatial search without the prereqs or infrastructure

First, a public service announcement: tomorrow, Nov. 16th, is “GIS Day” which means universities all over will be holding day lectures, in most cases for free. Many of them will be over our heads since we’re but neogeographers, but inspiration can come from anywhere – and it’s a good opportunity to meet others interested in this stuff. I’ll be at the one held at the University of Kansas here in Lawrence – why not look it up and see if your nearby university is having a program? My apologies on the short notice for this though, hopefully you can still make something happen as a professional development day.

I’ve shown more than a few ways we can use the ORM with GeoDjango to do real geographic searches of various types. But what if your limitations aren’t so generous and geospatial libraries and databases are off limits?

What if there were a way we could FAKE it?

There’s the incredibly nasty way which I won’t even put in code. Seriously, don’t do this. For completeness I’ll put it in semi-pseudocode that at least conveys the basic idea.

Create a set, loop over all objects in the database and use our
multi-tool GeoPy's vincentydistance (correct capitalization) method
to calculate the distance to each of them. Store the distance and a
record identifier in a dictionary and append it as a new "row" to our
set. Sort the set, and WOW, we have a really slow, potentially
disastrous in memory usage, but pure Python and minimal prerequisite
implementation of geographic searching.

Still listening? Please don’t do this.

Somebody smarter at math than me had this problem and thought about a way to solve it. It’s called Geohash and essentially it’s goal is to have a way of defining geography with varying degrees of accuracy with the variance in the number of characters. So 9yum8 will match 9yum8yef3vds6 (Lawrence, KS) and we can use a CharField to store, and query with the __startswith operator.

It’s not perfect, though – items JUST across the line which do overlap, but are not centered inside the hash we search for won’t be retured. To get them we need to calculate the neighbors of our hash. (in the case of python-geohash a method called “expand” will return neighbors + our origin box)

There’s a good library called python-geohash (use that name to pip install it, “geohash” on pip is not the right thing) that will do both of these calculations for us – pure python with C for speed if that works on your system.

>>> import geohash
>>> geohash.encode(38.9716689, -95.2352501)
'9yum8yef3vds'
>>> geohash.expand('9yum8yef3vds')
['9yum8yef3vdk', '9yum8yef3vdu', '9yum8yef3vde', '9yum8yef3vd7', '9yum8yef3vdg', '9yum8yef3vdt', '9yum8yef3vdm', '9yum8yef3vdv', '9yum8yef3vds']
>>> from django.db.models import Q
q_object = Q()
>>> for hash in geohash.expand('9yum8yef3vds'):
...     q_object.add(Q(geohash__startswith=hash), Q.OR)
>>> print(q_object)
(OR: ('geohash__startswith', '9yum8yef3vdk'), ('geohash__startswith', '9yum8yef3vdu'), ('geohash__startswith', '9yum8yef3vde'), ('geohash__startswith', '9yum8yef3vd7'), ('geohash__startswith', '9yum8yef3vdg'), ('geohash__startswith', '9yum8yef3vdt'), ('geohash__startswith', '9yum8yef3vdm'), ('geohash__startswith', '9yum8yef3vdv'), ('geohash__startswith', '9yum8yef3vds'))

So what we’ve done is create a Q object you can use on a queryset to fake a geographic search around the general area. This isn’t a very wide area, though. Trial and error will help. Passing a “precision=” kwarg into .encode() to request less accuracy (and a wider search area) like this:

>>> geohash.encode(38.9716689, -95.2352501, precision=4)
'9yum'
>>> geohash.expand('9yum')
['9yuj', '9yut', '9yuk', '9yuh', '9yus', '9yuq', '9yun', '9yuw', '9yum']
>>> q_object = Q()
>>> for hash in geohash.expand('9yum'):
...     q_object.add(Q(geohash__startswith=hash), Q.OR)
...
>>> print(q_object)
(OR: ('geohash__startswith', '9yuj'), ('geohash__startswith', '9yut'), ('geohash__startswith', '9yuk'), ('geohash__startswith', '9yuh'), ('geohash__startswith', '9yus'), ('geohash__startswith', '9yuq'), ('geohash__startswith', '9yun'), ('geohash__startswith', '9yuw'), ('geohash__startswith', '9yum'))

According to Wikipedia’s explanation of the algorithm (see under “Worked Example”) accuracy ranges from +/- 2500km at 1 character to +/- 0.019km at 8 characters. A length of 5 is +/- 2.4km, close to what we were using for circle radius generation in previous work. But drop to 4 and you’re grabbing +/- 20km worth of records.

It’s far from perfect, but when there’s nothing available but plain text search, it’s better than nothing!

Where is my user? Part 2, Browser Geolocation

As we saw last week GeoIP can be pretty inaccurate for mobile users – the exact audience we may be trying hardest to serve with a geographically aware website. But the W3C saw, or was made to see, the writing on the wall and built a set of standard APIs into HTML5 for just this case and most modern browsers have picked it up.

The draft for the spec is http://dev.w3.org/geo/api/spec-source.html if you want to read it through or need further info.

The API is pretty marvelously simple. This implementation changes the URL to return latitude and longitude when they are available, which we can use in our Django view. Plus, the same code works on mobile devices (at least the iOS ones I carry) with no changes.

So let’s dive right in, and make the campgrounds dataset grab the nearest results to the user.

The view:

from django.contrib.gis.geos import Point
from django.shortcuts import render_to_response
from django.template import RequestContext

from uscampgrounds.models import *

def nearby_campgrounds(request):
    if request.GET.get('lat') and request.GET.get('lon'):
        origin = Point(float(request.GET.get('lon')), float(request.GET.get('lat')))
        camps = Campground.objects.all().distance(origin).order_by('distance')
    else:
        camps = Campground.objects.all().order_by('name')

    return render_to_response('uscampgrounds/nearby.html', {
        'object_list': camps
    }, context_instance=RequestContext(request))

uscampgrounds/nearby.html template:

<html>
<head>
  <script src='http://media.adamfast.com/QueryData.compressed.js'></script>
  <script language='JavaScript'>

    function checkLocation() {
      var getData = new QueryData();
      if ('lat' in getData) { }
      else {
        if (navigator.geolocation) {
          navigator.geolocation.getCurrentPosition(
            function (ppos) {
              window.location.href = window.location.href + '?lat=' + ppos.coords.latitude + '&lon=' + ppos.coords.longitude;
            },
            function (err) {
              switch(err.code) {

                case err.TIMEOUT:
                  alert('Attempts to retrieve location timed out.')
                  break;

                case err.POSITION_UNAVAILABLE:
                  alert("Your browser doesn't know where you are.")
                  break;

                case err.PERMISSION_DENIED:
                  alert('You have to give us permission!')
                  break;

                case err.UNKNOWN_ERROR:
                  alert('Unknown error returned.')
                  break;

                default:
                  alert(err + ' ' + err.code)

              }
            }
          );
        }
      }
    }
  </script>
</head>
<body{% block body_override %} onLoad="javascript:checkLocation();"{% endblock body_override %}>
{% block content %}

{% endblock content %}
</body>
</html>

It’s using QueryData.js, a small library to make getting data out of the current page’s querystring easier.

Other than that we check to see if the querystring has location info available – if not we request it from the browser and register a callback to bring the user back to the page with the right querystring args. The second function passed into getCurrentPosition() is the error-handling callback. In this case I’ve just set it to alert in the various cases for simplicity’s sake.

Where is my user? Part 1, GeoIP

There are a a few techniques we can use to make our apps more “magical” for our end users – showing them things that we have reason to believe may be spatially relevant to them. The most reliable but least accurate of these techniques is the combination of a hidden GeoDjango battery with a free (as in beer) data set from MaxMind.

GeoDjango includes ctypes bindings against their GeoIP C API to give us an easier interface to the data. You’ll need to install the C API and download a copy of GeoLiteCity.dat.gz to your server and set up the correct paths as mentioned in the docs.

Once that’s done it’s very simple to interact with.

>>> from django.contrib.gis.utils.geoip import GeoIP
>>> GeoIP().city("amazon.com")
{'city': 'Seattle', 'region': 'WA', 'area_code': 206, 'longitude': -122.32839965820312, 'country_code3': 'USA', 'latitude': 47.60260009765625, 'postal_code': '98104', 'dma_code': 819, 'country_code': 'US', 'country_name': 'United States'}

For the ultimate in handiness I’ve turned it into a context processor. Context processors don’t always get the limelight they deserve, they’re insanely handy for things you want on EVERY view (well, every view you remember to use a RequestContext for) like MEDIA_URL. In our case, we’re going to add to every template’s context the location of our user.

And we’re going to do it with less than 10 lines of code (though a bit more for comments and spacing)

from django.contrib.gis.utils.geoip import GeoIP

def addgeoip(request):
    # check x-forwarded-for first in case there's a reverse proxy in front of the webserver
    if request.META.has_key('HTTP_X_FORWARDED_FOR'):
        ip = request.META['HTTP_X_FORWARDED_FOR']
    else:
        ip = request.META.get('REMOTE_ADDR', False)

    if ip:
        return {'geoip': GeoIP().city(ip)}
    else:
        return {}

If you have a moderate or greater traffic site, you probably run behind a reverse proxy like perlbal, nginx or Varnish – and your REMOTE_ADDR meta headers are incorrect as a result of this. You’ve seen it in your comments or other places that log IP address and probably heard of HTTP_X_FORWARDED_FOR. This HTTP header is set by these reverse proxies so we can see in our code who their client is. The first four lines here determine if we need to deal with this. NOTE: Django 1.3 includes a Reverse proxy middleware  that does what the first four lines of this context processor do if you enable it. Please note it’s security notice as it applies here – if you’re not behind a reverse proxy, listening to HTTP_X_FORWARDED_FOR will allow users to spoof their IPs.

Drop this code anywhere on your PYTHONPATH (in a utils.py or processors.py might be nice?) and set settings.TEMPLATE_CONTEXT_PROCESSORS to include it. But be careful you include the defaults too – you don’t want to wipe out the defaults so copy/paste to make sure you don’t miss any.

If you don’t want to add RequestContext to your view, or need it inside the view and not just the template, it’s a normal function – call it with your request object. If you’re not in a view, don’t use the processor and just call it normally (unless you want to go to the effort of mocking the request but that seems silly.)

So there we are – we know a decent amount of reasonably detailed info on our users’ location and can do pre-filtering of data.

If you want better accuracy, you can buy better data. But know that it’s never going to be perfect, the location they will have is the location the telecom provider says. The IP address info for my grandparents’ home is about 80 miles east, has been for years. My iPhone? Well, it says I’m in California but I’m actually in Kansas. So this method is a fall-back at best.

Radius limited searching with the ORM

A handy abstraction inside GeoDjango is the D object. It converts from more units than you knew existed into other units. (complete list) We’ll be using it quite a bit as we do various queries.

>>> from django.contrib.gis.measure import D
>>> D(mi=5).km
8.0467200000000005

Another type of search (and the one you’ve most likely seen all over the internet) is the radius search. If we want to search in a given radius – say the user has requested only results within 20 miles, we can do it a couple ways.

>>> Campground.objects.filter(point__distance_lte=(lawrence, D(mi=20)))
[<Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Clinton State Park>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Perry State Park>, <Campground: Slough Creek - Perry Lake>]

[This query took 5.529ms.]

But if you know the area you’ll notice this isn’t in any particular order, and doesn’t provide us with the distance figure. All we know is it’s less than 20 miles. So we go back to the method from yesterday, .distance() on the queryset to ask for that information, and order by it.

>>> Campground.objects.filter(point__distance_lte=(lawrence, D(mi=20))).distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>]

[Adding distance took us to 6.3ms, and then ordering we stayed at 6.3ms]

But again at larger data sets this becomes slow. A better method is instead to create a geometry and go back to using the __within query.

Take our origin point and let’s transform it into a different coordinate system, 900913. This is the “Google Maps mercator” projection but the important thing is unlike WGS84 and its degree unit of measure that’s inconsistent, a degree one way does not equal a degree in the other, this mercator is in meters and completely consistent. Next we buffer (increase, and in the case of a point turn into a circle) the geometry into a polygon with a 20 mile radius.

>>> lawrence.transform(900913)
>>> lawrence_poly = lawrence.buffer(20 * 2172.344)

Now pass that polygon into our queryset, and we get our results back [0.125ms!] – again not in any order, so we apply distance and ordering to them.

>>> Campground.objects.filter(point__within=lawrence_poly)
[<Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Clinton State Park>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Old Town - Perry Lake>, <Campground: Perry State Park>, <Campground: Slough Creek - Perry Lake>]
>>> Campground.objects.filter(point__within=lawrence_poly).distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>]

And yikes, is this method fast. 6ish ms to 0.125.

There’s a problem though. Fans of the metric system are gathering outside my door with pitchforks and torches – “a mile is 1609 kilometers! what’s with this 2172.344 bullcrap?”

I wish I had an answer for them. The truth of the matter is 900913 isn’t all that accurate at some things, but we need a coordinate system in meters. Doing it the RIGHT way with D(mi=20).m gives me back TWO fewer results than I know I should have, one at 15.9 and another at 17.9 mi. Those two are returned (plus another record, at 20.7 mi) by the 2172.344 number.

Google searching around the number doesn’t bring up much, but I have a gut feeling it came from a conversation a long time ago in the GeoDjango IRC room (#geodjango on irc.freenode.net if you’re interested) and has been in my notes ever since. It’s close enough for me, and many others.

EDIT: Marc has sent in a good comment with a good ESRI post on the problem (thanks Marc!). While I knew it had to do with optimization for things OTHER than measuring in the coordinate system it goes into much more depth. The bad part is since we’re constructing and not measuring here transforming into a different coordinate system after construction doesn’t help (tried it many times). The REAL answer is to use a better coordinate system but I try to trade off absolute accuracy for being more capable across the globe.

And that’s what neogeography is all about. Things that would make a surveyor have a fit but work for us, and our users. (as long as it’s not intentionally used to mislead. My degree’s in Journalism, so I take that element of telling the best truth with the numbers I can seriously)

Finding the nearest with the ORM

We have data, we know how to take what our user is providing us, it’s time to stitch the two together and find things nearby.

The easiest option is to simply let the database order them.

>>> from django.contrib.gis.geos import Point
>>> lawrence = Point((-95.234657999999996, 38.972679999999997))
>>> Campground.objects.all().distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>, '...(remaining elements truncated)...']

This results in the query:

SELECT (ST_distance_sphere("uscampgrounds_campground"."point",ST_GeomFromEWKB(E'\\001\\001\\000\\000 \\346\\020\\000\\000\\251\\357\\374\\242\\004\\317W\\300\\224\\274:\\307\\200|C@'::bytea))) AS "distance", "uscampgrounds_campground"."id", "uscampgrounds_campground"."campground_code", "uscampgrounds_campground"."name", "uscampgrounds_campground"."campground_type", "uscampgrounds_campground"."phone", "uscampgrounds_campground"."comments", "uscampgrounds_campground"."sites", "uscampgrounds_campground"."elevation", "uscampgrounds_campground"."hookups", "uscampgrounds_campground"."amenities", "uscampgrounds_campground"."point" FROM "uscampgrounds_campground" ORDER BY "distance" ASC

which for this dataset EXPLAIN ANALYZE tells me runs in about 55ms with a sequence scan. (Don’t forget we can get the SQL to any query with print(queryset.query) for analysis to see if we can fine tune it.)

Slicing to get only the nearest 20 adds a LIMIT to the above query and takes less than 9 milliseconds.

>>> Campground.objects.all().distance(lawrence).order_by('distance')[:20]
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>]

Once you get to any scale of data this gets pretty slow, but right now it’s the only way to get “nearest _x_ regardless of distance”.

It will only get faster – right now it’s a sequence scan through all records which isn’t very efficient. PostGIS 2.0 will have nearest neighbor searching with indexes which will make it WAY faster.

Geocoding: Turning text into numbers

One of my favorite libraries for doing geo in Python is GeoPY.

It’s able to do one of the next tools we need in our toolbox, plus a few others.

What is this new tool? Geocoding.

Geocoding is the process of taking user-provided information and turning it into a geometry that we can use in our application. It’s how we turn “402 S. Main St. Joplin MO” into 37.08764, -94.513324999999995.

And it’s hard. TIGER/Line provides some of the basics but isn’t easy or forgiving. When PostGIS 2.0 ships it will actually have a built-in TIGER geocoder, which will be great for rough shots at determining where things are. But because TIGER/Line isn’t all that great, and the proliferation of GPS devices there is a small army of address verification vehicles on the roads determining where this data falls short and determining how to fix it then selling the data back to companies who need accurate navigation data.

Google, Yahoo, Bing and a few (much smaller) others provide this as a service. Read the TOS and determine if it is compatible with your usage. I am not a lawyer, nor am I recommending you do anything with this data. For this particular example we’re not (yet – but will) displaying the geocoded result on their maps so may be out of compliance with certain providers.

Geocoder.us is free for non-commercial use (or you can pay for commercial access), which suits us just fine – but has its shortcomings.

>>> from geopy import geocoders
>>> dotus = geocoders.GeocoderDotUS()
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire Lawrence KS 66044")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire St. Lawrence KS 66044")
>>> place
u'609 New Hampshire St, Lawrence, KS 66044'
>>> lat
38.972679999999997
>>> lon
-95.234657999999996

Check out what happens if we (or the user) “forget” to include “St.”. Nothing found.

The great thing about GeoPY is it’s a full abstraction of the various APIs. If you pass a geocoder instance into whatever function geocodes, you can quickly swap to a different one if your needs change.

>>> def get_location(geocoder, text):
...     place, (lat, lon) = geocoder.geocode(text)
...     return (lat, lon)
...
>>> get_location(g, "609 New Hampshire St. Lawrence KS 66044")
(38.972327999999997, -95.235273000000007)
>>> get_location(dotus, "609 New Hampshire St. Lawrence KS 66044")
(38.972679999999997, -95.234657999999996)

The geocoder class always takes any required arguments (API keys etc) in the constructor so you can set it outside and keep your internal code generic.

Now we can interpret input from our users and figure out where they are – OR add missing geographic metadata to records we already have.

But this isn’t ALL GeoPY has available for us. It also has a very usable, no dependency method of calculating distances to locations (yes, PostGIS will do it but does require setup) regardless of your database backend.

Note that GeoPY takes its coordinates in the way we’re used to seeing them; latitude, longitude instead of GEOS/GDAL style longitude, latitude. Since both classes are named Point I foresee this biting you at some point (it has me).

>>> from geopy.point import Point
>>> jln = Point(37.1518136, -94.4982683)
>>> osh = Point(43.9843528, -88.5570417)

Now that we’ve defined points, let’s see how far between them.

>>> from geopy import distance
>>> distance.GreatCircleDistance(jln, osh).miles
565.5739219367847
>>> distance.VincentyDistance(jln, osh).miles
565.3214657121755

There’s a difference between these two – Great Circle assumes Earth is perfectly round. Vincenty came up with a formula that uses ellipsoids instead. Fed with the WGS-84 default ellipsoid it comes much closer to reality – but as you can see even over large distances the inaccuracy isn’t startling.

>>> starcity = Point(55.878333, 38.061667)
>>> distance.GreatCircleDistance(jln, starcity).miles
5432.6569119874948
>>> distance.VincentyDistance(jln, starcity).miles
5445.2435752403135

Basic poly/point spatial queries using the Django ORM

Now that we have a couple sources of information in our database, we’ll look at mashing up the data.

If I need to determine all of the campgrounds available during a trip to Kansas, I can do that with the __within lookup.

>>> from tigerline.models import State
>>> from uscampgrounds.models import *
>>> ks = State.objects.get(usps_code='KS')
>>> kscamp = Campground.objects.filter(point__within=ks.mpoly)
>>> kscamp
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Atchison State Fishing Lake>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Canning Creek - Council Grove Lake>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Card Creek - Elk City Lake>, <Campground: Cedar Bluff State Park - North Shore>, <Campground: Cedar Bluff State Park - South Shore>, <Campground: Clinton State Park>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Curtis Creek - Milford Lake>, <Campground: Damsite - Fall River Lake>, <Campground: Elk City State Park>, <Campground: Ellis Lakeside City Campground>, <Campground: Fall River State Park>, <Campground: Farnum Creek - Milford Lake>, <Campground: French Creek Cove - Marion Lake>, <Campground: Geary State Fishing Lake>, <Campground: Gunn Park>, '...(remaining elements truncated)...']
>>> kscamp.count()
77

Or, I’ve got a campground and it’s state field in the database is blank – how do I find it? __contains to the rescue.

>>> milford = kscamp[11]
>>> State.objects.get(mpoly__contains=milford.point)
<State: Kansas>

It doesn’t matter as much with points, but if you’re doing __contains queries it’s worth knowing about __contains_properly. What’s the difference? There’s a city on the Texas/Arkansas border called “Texarkana” – so close that the two cities’ Police Departments share the same building – with a hallway where the “state line” was (at least this was the case when I was around 10.) Imagine we have a polygon of Texarkana and want to know which state it’s in. __contains will (first) return a MultipleObjectsReturned exception because it’s contained in two states. __contains_properly will return no results – no state contains ALL of that city.

If you’re planning a trip and absolutely don’t want to camp in Arizona (sorry Arizona, I’m not judging you but had to choose somebody!)

>>> Campground.objects.filter(point__disjoint=az.mpoly)
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: A.W. Marion State Park>, <Campground: Ackley Creek County Park>, <Campground: Acorn Valley - Saylorville Lake>, <Campground: Ada Lake>, <Campground: Adair City Park>, <Campground: Addison Oaks County Park>, <Campground: Adrian City Park>, <Campground: Afton State Park>, <Campground: Airport Lake Park>, <Campground: Aitkin County Campground>, <Campground: Akeley City Campground>, <Campground: Akers>, <Campground: Alcona Park 1>, <Campground: Alexander Ramsey Park>, <Campground: Alexandria Lakes  State Rec Area>, <Campground: Algonac State Park>, <Campground: Alley Spring - Ozark National Scenic River>, <Campground: Aloha State Park>, <Campground: Alum Creek State Park>, '...(remaining elements truncated)...']
>>> Campground.objects.filter(point__disjoint=az.mpoly).count()
6065

Now we’ve got every campground not in Arizona.

We can stack these queries, too. How about all campgrounds except Arizona and New Mexico (again, no judgment on the state, after all they host the Albuquerque Balloon Fiesta which is a fantastic event)

>>> nm = State.objects.get(usps_code='NM')
>>> Campground.objects.filter(point__disjoint=az.mpoly).filter(point__disjoint=nm.mpoly).count()
5915

This syntax won’t work with a __within query though because stacking ANDs, so we need to use Q objects and OR like this.

>>> from django.db.models import Q
>>> in_ks = Q(point__within=ks.mpoly)
>>> in_mo = Q(point__within=mo.mpoly)
>>> Campground.objects.filter(in_ks | in_mo)
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Akers>, <Campground: Alley Spring - Ozark National Scenic River>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Atchison State Fishing Lake>, <Campground: Aunts Creek>, <Campground: Babler Memorial State Park>, <Campground: Baxter>, <Campground: Beaver Creek>, <Campground: Bennett Spring State Park>, <Campground: Berry Bend - Harry S. Truman Lake>, <Campground: Big M>, <Campground: Big Spring - Ozark National Scenic River>, <Campground: Binder Park>, <Campground: Blue Springs County Campground>, <Campground: Branson City Campground>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Bucksaw - Harry S. Truman Lake>, <Campground: Campbell Point>, <Campground: Canning Creek - Council Grove Lake>, '...(remaining elements truncated)...']
>>> Campground.objects.filter(in_ks | in_mo).count()
168

There are certainly other types of spatial lookups, but these are the ones I use regularly for my neogeography projects.

If you’ve noticed we haven’t done anything with distance yet, good catch! We will eventually, but there are more foundations to lay first.

Getting the data in, part 2 – CSV and “manual” importing

Before we move on to querying and displaying, let’s look at how to get data into the database from “bare metal” – as nice as LayerMapping is not everything is in a shapefile, and using other formats has a few snares that are easy to avoid but you need to know them.

Our data source is going to be comma separated from www.uscampgrounds.info – a nice dataset of public places available for primitive camping.

We built the model for a campground back on November 2, and it’s available in full form from my library https://github.com/adamfast/geodjango-uscampgrounds

So let’s pick up in load.py https://github.com/adamfast/geodjango-uscampgrounds/blob/master/uscampgrounds/load.py I’m only going to post relevant code here, I’m assuming you know how to build a CSV importer.

try:
    camp = Campground.objects.get(campground_code=row[CAMPGROUND_CODE].lower(), campground_type=row[TYPE])
except Campground.DoesNotExist:
    camp = Campground(campground_code=row[CAMPGROUND_CODE].lower(), campground_type=row[TYPE])

These 4 lines may look odd to you, so let me explain what’s happening. I didn’t forget about get_or_create(), in this case I want to match ONLY on code and type. If I find one, I’m going to replace every other field on it with the latest data. Otherwise I create it.

camp.name = scrub_chars(row[CAMPGROUND_NAME])
camp.phone = scrub_chars(row[PHONE])
camp.comments = row[COMMENTS]
camp.sites = row[SITES]
camp.elevation = row[ELEVATION]
camp.hookups = row[HOOKUPS]
camp.point = Point((Decimal(row[LON]), Decimal(row[LAT])))
camp.save()

Now it’s time to set the only geospatial piece of the model – the point.

Snare #1: the order is not what you expect. I’ve been using Google Earth, GPSes and all this stuff for a long time and points are always referred to as latitude, longitude. Not here. Make sure you put longitude first or your maps are going to look very funny.

Snare #2: The extra tuple is needed, at least in this case when I’m passing in Decimals. If you pass in floats, the tuple isn’t necessary but I’m in the habit of always using Decimal.

Now we have polygon data and point data available, so we can start to combine the two.