Beyond the “logbook” – recording flight memories

One of the more memorable flights where I was not PIC happened the day after the launch of Space Shuttle Atlantis on STS-129. I went out to Titusville airport with Chris Floyd, who was also an attendee of the first NASA launch tweet up. He had flown his beautiful Mooney all the way from California and we were going to do some flying. We climbed into the airplane, N37MG with another non-pilot friend of mine who had come down to Florida to see the launch as well in the backseat, I was in the right seat.

After we reached 1500 AGL or so Chris handed over the plane and let me start to get the feel of it, which was amazing. Remember, I’ve got about 200 hours in Cherokee 140s, Archers and a couple hours in 172s at this point and here we are going 200 knots. I was behind the airplane like you would not believe – a fact I told him pretty quickly, but everything was OK – he could tell and had it under control.

It was a fun flight, but what happened after the flight sticks in my mind and it’s worth discussing after talking about lost logbooks. This idea isn’t really for the professional pilots out there on every flight, but if you’re in flying because you just absolutely love it, think about it.

He handed me a logbook – but it wasn’t lined paper, it was a full-size book that had travel-themed color blank pages. I would consider it a scrapbook, but anybody serious about their scrapbooking probably wouldn’t agree. He also carried a portable Polaroid printer and had taken a couple photos in flight. After landing he printed them out, prints out two copies and handed us one set and kept the others. So we put his set on the next empty page and wrote about the flight from our perspective.

This is a great way to record aspects of your flying that you either don’t notice or would never think to record. Your logbook might say “cut off in the pattern, the controllers vectored me all over where I didn’t want to be” and your passenger’s entry in your OTHER logbook may say “the sunrise out the window was the most beautiful thing I’ve ever seen – I’ll never think of transportation the same way.” You may never hear this type of stuff from your passengers but when you make them write it it’s recorded.

To be honest I’ve yet to implement this idea, as great as I think it is just because I haven’t found the right book, laziness, everything.

So my advice to myself, and you, is to start soon. Go back and write down your favorite stories so they’re recorded. Add photos. In some ways, it’s a hard copy blog. But there’s just something about handwritten notes and records that feels superior to “all binary” data.

Where is my user? Part 1, GeoIP

There are a a few techniques we can use to make our apps more “magical” for our end users – showing them things that we have reason to believe may be spatially relevant to them. The most reliable but least accurate of these techniques is the combination of a hidden GeoDjango battery with a free (as in beer) data set from MaxMind.

GeoDjango includes ctypes bindings against their GeoIP C API to give us an easier interface to the data. You’ll need to install the C API and download a copy of GeoLiteCity.dat.gz to your server and set up the correct paths as mentioned in the docs.

Once that’s done it’s very simple to interact with.

>>> from django.contrib.gis.utils.geoip import GeoIP
>>> GeoIP().city("amazon.com")
{'city': 'Seattle', 'region': 'WA', 'area_code': 206, 'longitude': -122.32839965820312, 'country_code3': 'USA', 'latitude': 47.60260009765625, 'postal_code': '98104', 'dma_code': 819, 'country_code': 'US', 'country_name': 'United States'}

For the ultimate in handiness I’ve turned it into a context processor. Context processors don’t always get the limelight they deserve, they’re insanely handy for things you want on EVERY view (well, every view you remember to use a RequestContext for) like MEDIA_URL. In our case, we’re going to add to every template’s context the location of our user.

And we’re going to do it with less than 10 lines of code (though a bit more for comments and spacing)

from django.contrib.gis.utils.geoip import GeoIP

def addgeoip(request):
    # check x-forwarded-for first in case there's a reverse proxy in front of the webserver
    if request.META.has_key('HTTP_X_FORWARDED_FOR'):
        ip = request.META['HTTP_X_FORWARDED_FOR']
    else:
        ip = request.META.get('REMOTE_ADDR', False)

    if ip:
        return {'geoip': GeoIP().city(ip)}
    else:
        return {}

If you have a moderate or greater traffic site, you probably run behind a reverse proxy like perlbal, nginx or Varnish – and your REMOTE_ADDR meta headers are incorrect as a result of this. You’ve seen it in your comments or other places that log IP address and probably heard of HTTP_X_FORWARDED_FOR. This HTTP header is set by these reverse proxies so we can see in our code who their client is. The first four lines here determine if we need to deal with this. NOTE: Django 1.3 includes a Reverse proxy middleware  that does what the first four lines of this context processor do if you enable it. Please note it’s security notice as it applies here – if you’re not behind a reverse proxy, listening to HTTP_X_FORWARDED_FOR will allow users to spoof their IPs.

Drop this code anywhere on your PYTHONPATH (in a utils.py or processors.py might be nice?) and set settings.TEMPLATE_CONTEXT_PROCESSORS to include it. But be careful you include the defaults too – you don’t want to wipe out the defaults so copy/paste to make sure you don’t miss any.

If you don’t want to add RequestContext to your view, or need it inside the view and not just the template, it’s a normal function – call it with your request object. If you’re not in a view, don’t use the processor and just call it normally (unless you want to go to the effort of mocking the request but that seems silly.)

So there we are – we know a decent amount of reasonably detailed info on our users’ location and can do pre-filtering of data.

If you want better accuracy, you can buy better data. But know that it’s never going to be perfect, the location they will have is the location the telecom provider says. The IP address info for my grandparents’ home is about 80 miles east, has been for years. My iPhone? Well, it says I’m in California but I’m actually in Kansas. So this method is a fall-back at best.

Radius limited searching with the ORM

A handy abstraction inside GeoDjango is the D object. It converts from more units than you knew existed into other units. (complete list) We’ll be using it quite a bit as we do various queries.

>>> from django.contrib.gis.measure import D
>>> D(mi=5).km
8.0467200000000005

Another type of search (and the one you’ve most likely seen all over the internet) is the radius search. If we want to search in a given radius – say the user has requested only results within 20 miles, we can do it a couple ways.

>>> Campground.objects.filter(point__distance_lte=(lawrence, D(mi=20)))
[<Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Clinton State Park>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Perry State Park>, <Campground: Slough Creek - Perry Lake>]

[This query took 5.529ms.]

But if you know the area you’ll notice this isn’t in any particular order, and doesn’t provide us with the distance figure. All we know is it’s less than 20 miles. So we go back to the method from yesterday, .distance() on the queryset to ask for that information, and order by it.

>>> Campground.objects.filter(point__distance_lte=(lawrence, D(mi=20))).distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>]

[Adding distance took us to 6.3ms, and then ordering we stayed at 6.3ms]

But again at larger data sets this becomes slow. A better method is instead to create a geometry and go back to using the __within query.

Take our origin point and let’s transform it into a different coordinate system, 900913. This is the “Google Maps mercator” projection but the important thing is unlike WGS84 and its degree unit of measure that’s inconsistent, a degree one way does not equal a degree in the other, this mercator is in meters and completely consistent. Next we buffer (increase, and in the case of a point turn into a circle) the geometry into a polygon with a 20 mile radius.

>>> lawrence.transform(900913)
>>> lawrence_poly = lawrence.buffer(20 * 2172.344)

Now pass that polygon into our queryset, and we get our results back [0.125ms!] – again not in any order, so we apply distance and ordering to them.

>>> Campground.objects.filter(point__within=lawrence_poly)
[<Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Clinton State Park>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Old Town - Perry Lake>, <Campground: Perry State Park>, <Campground: Slough Creek - Perry Lake>]
>>> Campground.objects.filter(point__within=lawrence_poly).distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>]

And yikes, is this method fast. 6ish ms to 0.125.

There’s a problem though. Fans of the metric system are gathering outside my door with pitchforks and torches – “a mile is 1609 kilometers! what’s with this 2172.344 bullcrap?”

I wish I had an answer for them. The truth of the matter is 900913 isn’t all that accurate at some things, but we need a coordinate system in meters. Doing it the RIGHT way with D(mi=20).m gives me back TWO fewer results than I know I should have, one at 15.9 and another at 17.9 mi. Those two are returned (plus another record, at 20.7 mi) by the 2172.344 number.

Google searching around the number doesn’t bring up much, but I have a gut feeling it came from a conversation a long time ago in the GeoDjango IRC room (#geodjango on irc.freenode.net if you’re interested) and has been in my notes ever since. It’s close enough for me, and many others.

EDIT: Marc has sent in a good comment with a good ESRI post on the problem (thanks Marc!). While I knew it had to do with optimization for things OTHER than measuring in the coordinate system it goes into much more depth. The bad part is since we’re constructing and not measuring here transforming into a different coordinate system after construction doesn’t help (tried it many times). The REAL answer is to use a better coordinate system but I try to trade off absolute accuracy for being more capable across the globe.

And that’s what neogeography is all about. Things that would make a surveyor have a fit but work for us, and our users. (as long as it’s not intentionally used to mislead. My degree’s in Journalism, so I take that element of telling the best truth with the numbers I can seriously)

Finding the nearest with the ORM

We have data, we know how to take what our user is providing us, it’s time to stitch the two together and find things nearby.

The easiest option is to simply let the database order them.

>>> from django.contrib.gis.geos import Point
>>> lawrence = Point((-95.234657999999996, 38.972679999999997))
>>> Campground.objects.all().distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>, '...(remaining elements truncated)...']

This results in the query:

SELECT (ST_distance_sphere("uscampgrounds_campground"."point",ST_GeomFromEWKB(E'\\001\\001\\000\\000 \\346\\020\\000\\000\\251\\357\\374\\242\\004\\317W\\300\\224\\274:\\307\\200|C@'::bytea))) AS "distance", "uscampgrounds_campground"."id", "uscampgrounds_campground"."campground_code", "uscampgrounds_campground"."name", "uscampgrounds_campground"."campground_type", "uscampgrounds_campground"."phone", "uscampgrounds_campground"."comments", "uscampgrounds_campground"."sites", "uscampgrounds_campground"."elevation", "uscampgrounds_campground"."hookups", "uscampgrounds_campground"."amenities", "uscampgrounds_campground"."point" FROM "uscampgrounds_campground" ORDER BY "distance" ASC

which for this dataset EXPLAIN ANALYZE tells me runs in about 55ms with a sequence scan. (Don’t forget we can get the SQL to any query with print(queryset.query) for analysis to see if we can fine tune it.)

Slicing to get only the nearest 20 adds a LIMIT to the above query and takes less than 9 milliseconds.

>>> Campground.objects.all().distance(lawrence).order_by('distance')[:20]
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>]

Once you get to any scale of data this gets pretty slow, but right now it’s the only way to get “nearest _x_ regardless of distance”.

It will only get faster – right now it’s a sequence scan through all records which isn’t very efficient. PostGIS 2.0 will have nearest neighbor searching with indexes which will make it WAY faster.

Geocoding: Turning text into numbers

One of my favorite libraries for doing geo in Python is GeoPY.

It’s able to do one of the next tools we need in our toolbox, plus a few others.

What is this new tool? Geocoding.

Geocoding is the process of taking user-provided information and turning it into a geometry that we can use in our application. It’s how we turn “402 S. Main St. Joplin MO” into 37.08764, -94.513324999999995.

And it’s hard. TIGER/Line provides some of the basics but isn’t easy or forgiving. When PostGIS 2.0 ships it will actually have a built-in TIGER geocoder, which will be great for rough shots at determining where things are. But because TIGER/Line isn’t all that great, and the proliferation of GPS devices there is a small army of address verification vehicles on the roads determining where this data falls short and determining how to fix it then selling the data back to companies who need accurate navigation data.

Google, Yahoo, Bing and a few (much smaller) others provide this as a service. Read the TOS and determine if it is compatible with your usage. I am not a lawyer, nor am I recommending you do anything with this data. For this particular example we’re not (yet – but will) displaying the geocoded result on their maps so may be out of compliance with certain providers.

Geocoder.us is free for non-commercial use (or you can pay for commercial access), which suits us just fine – but has its shortcomings.

>>> from geopy import geocoders
>>> dotus = geocoders.GeocoderDotUS()
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire Lawrence KS 66044")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
>>> place, (lat, lon) = dotus.geocode("609 New Hampshire St. Lawrence KS 66044")
>>> place
u'609 New Hampshire St, Lawrence, KS 66044'
>>> lat
38.972679999999997
>>> lon
-95.234657999999996

Check out what happens if we (or the user) “forget” to include “St.”. Nothing found.

The great thing about GeoPY is it’s a full abstraction of the various APIs. If you pass a geocoder instance into whatever function geocodes, you can quickly swap to a different one if your needs change.

>>> def get_location(geocoder, text):
...     place, (lat, lon) = geocoder.geocode(text)
...     return (lat, lon)
...
>>> get_location(g, "609 New Hampshire St. Lawrence KS 66044")
(38.972327999999997, -95.235273000000007)
>>> get_location(dotus, "609 New Hampshire St. Lawrence KS 66044")
(38.972679999999997, -95.234657999999996)

The geocoder class always takes any required arguments (API keys etc) in the constructor so you can set it outside and keep your internal code generic.

Now we can interpret input from our users and figure out where they are – OR add missing geographic metadata to records we already have.

But this isn’t ALL GeoPY has available for us. It also has a very usable, no dependency method of calculating distances to locations (yes, PostGIS will do it but does require setup) regardless of your database backend.

Note that GeoPY takes its coordinates in the way we’re used to seeing them; latitude, longitude instead of GEOS/GDAL style longitude, latitude. Since both classes are named Point I foresee this biting you at some point (it has me).

>>> from geopy.point import Point
>>> jln = Point(37.1518136, -94.4982683)
>>> osh = Point(43.9843528, -88.5570417)

Now that we’ve defined points, let’s see how far between them.

>>> from geopy import distance
>>> distance.GreatCircleDistance(jln, osh).miles
565.5739219367847
>>> distance.VincentyDistance(jln, osh).miles
565.3214657121755

There’s a difference between these two – Great Circle assumes Earth is perfectly round. Vincenty came up with a formula that uses ellipsoids instead. Fed with the WGS-84 default ellipsoid it comes much closer to reality – but as you can see even over large distances the inaccuracy isn’t startling.

>>> starcity = Point(55.878333, 38.061667)
>>> distance.GreatCircleDistance(jln, starcity).miles
5432.6569119874948
>>> distance.VincentyDistance(jln, starcity).miles
5445.2435752403135

Basic poly/point spatial queries using the Django ORM

Now that we have a couple sources of information in our database, we’ll look at mashing up the data.

If I need to determine all of the campgrounds available during a trip to Kansas, I can do that with the __within lookup.

>>> from tigerline.models import State
>>> from uscampgrounds.models import *
>>> ks = State.objects.get(usps_code='KS')
>>> kscamp = Campground.objects.filter(point__within=ks.mpoly)
>>> kscamp
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Atchison State Fishing Lake>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Canning Creek - Council Grove Lake>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Card Creek - Elk City Lake>, <Campground: Cedar Bluff State Park - North Shore>, <Campground: Cedar Bluff State Park - South Shore>, <Campground: Clinton State Park>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Curtis Creek - Milford Lake>, <Campground: Damsite - Fall River Lake>, <Campground: Elk City State Park>, <Campground: Ellis Lakeside City Campground>, <Campground: Fall River State Park>, <Campground: Farnum Creek - Milford Lake>, <Campground: French Creek Cove - Marion Lake>, <Campground: Geary State Fishing Lake>, <Campground: Gunn Park>, '...(remaining elements truncated)...']
>>> kscamp.count()
77

Or, I’ve got a campground and it’s state field in the database is blank – how do I find it? __contains to the rescue.

>>> milford = kscamp[11]
>>> State.objects.get(mpoly__contains=milford.point)
<State: Kansas>

It doesn’t matter as much with points, but if you’re doing __contains queries it’s worth knowing about __contains_properly. What’s the difference? There’s a city on the Texas/Arkansas border called “Texarkana” – so close that the two cities’ Police Departments share the same building – with a hallway where the “state line” was (at least this was the case when I was around 10.) Imagine we have a polygon of Texarkana and want to know which state it’s in. __contains will (first) return a MultipleObjectsReturned exception because it’s contained in two states. __contains_properly will return no results – no state contains ALL of that city.

If you’re planning a trip and absolutely don’t want to camp in Arizona (sorry Arizona, I’m not judging you but had to choose somebody!)

>>> Campground.objects.filter(point__disjoint=az.mpoly)
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: A.W. Marion State Park>, <Campground: Ackley Creek County Park>, <Campground: Acorn Valley - Saylorville Lake>, <Campground: Ada Lake>, <Campground: Adair City Park>, <Campground: Addison Oaks County Park>, <Campground: Adrian City Park>, <Campground: Afton State Park>, <Campground: Airport Lake Park>, <Campground: Aitkin County Campground>, <Campground: Akeley City Campground>, <Campground: Akers>, <Campground: Alcona Park 1>, <Campground: Alexander Ramsey Park>, <Campground: Alexandria Lakes  State Rec Area>, <Campground: Algonac State Park>, <Campground: Alley Spring - Ozark National Scenic River>, <Campground: Aloha State Park>, <Campground: Alum Creek State Park>, '...(remaining elements truncated)...']
>>> Campground.objects.filter(point__disjoint=az.mpoly).count()
6065

Now we’ve got every campground not in Arizona.

We can stack these queries, too. How about all campgrounds except Arizona and New Mexico (again, no judgment on the state, after all they host the Albuquerque Balloon Fiesta which is a fantastic event)

>>> nm = State.objects.get(usps_code='NM')
>>> Campground.objects.filter(point__disjoint=az.mpoly).filter(point__disjoint=nm.mpoly).count()
5915

This syntax won’t work with a __within query though because stacking ANDs, so we need to use Q objects and OR like this.

>>> from django.db.models import Q
>>> in_ks = Q(point__within=ks.mpoly)
>>> in_mo = Q(point__within=mo.mpoly)
>>> Campground.objects.filter(in_ks | in_mo)
[<Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Akers>, <Campground: Alley Spring - Ozark National Scenic River>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Atchison State Fishing Lake>, <Campground: Aunts Creek>, <Campground: Babler Memorial State Park>, <Campground: Baxter>, <Campground: Beaver Creek>, <Campground: Bennett Spring State Park>, <Campground: Berry Bend - Harry S. Truman Lake>, <Campground: Big M>, <Campground: Big Spring - Ozark National Scenic River>, <Campground: Binder Park>, <Campground: Blue Springs County Campground>, <Campground: Branson City Campground>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Bucksaw - Harry S. Truman Lake>, <Campground: Campbell Point>, <Campground: Canning Creek - Council Grove Lake>, '...(remaining elements truncated)...']
>>> Campground.objects.filter(in_ks | in_mo).count()
168

There are certainly other types of spatial lookups, but these are the ones I use regularly for my neogeography projects.

If you’ve noticed we haven’t done anything with distance yet, good catch! We will eventually, but there are more foundations to lay first.

Getting the data in, part 2 – CSV and “manual” importing

Before we move on to querying and displaying, let’s look at how to get data into the database from “bare metal” – as nice as LayerMapping is not everything is in a shapefile, and using other formats has a few snares that are easy to avoid but you need to know them.

Our data source is going to be comma separated from www.uscampgrounds.info – a nice dataset of public places available for primitive camping.

We built the model for a campground back on November 2, and it’s available in full form from my library https://github.com/adamfast/geodjango-uscampgrounds

So let’s pick up in load.py https://github.com/adamfast/geodjango-uscampgrounds/blob/master/uscampgrounds/load.py I’m only going to post relevant code here, I’m assuming you know how to build a CSV importer.

try:
    camp = Campground.objects.get(campground_code=row[CAMPGROUND_CODE].lower(), campground_type=row[TYPE])
except Campground.DoesNotExist:
    camp = Campground(campground_code=row[CAMPGROUND_CODE].lower(), campground_type=row[TYPE])

These 4 lines may look odd to you, so let me explain what’s happening. I didn’t forget about get_or_create(), in this case I want to match ONLY on code and type. If I find one, I’m going to replace every other field on it with the latest data. Otherwise I create it.

camp.name = scrub_chars(row[CAMPGROUND_NAME])
camp.phone = scrub_chars(row[PHONE])
camp.comments = row[COMMENTS]
camp.sites = row[SITES]
camp.elevation = row[ELEVATION]
camp.hookups = row[HOOKUPS]
camp.point = Point((Decimal(row[LON]), Decimal(row[LAT])))
camp.save()

Now it’s time to set the only geospatial piece of the model – the point.

Snare #1: the order is not what you expect. I’ve been using Google Earth, GPSes and all this stuff for a long time and points are always referred to as latitude, longitude. Not here. Make sure you put longitude first or your maps are going to look very funny.

Snare #2: The extra tuple is needed, at least in this case when I’m passing in Decimals. If you pass in floats, the tuple isn’t necessary but I’m in the habit of always using Decimal.

Now we have polygon data and point data available, so we can start to combine the two.

Getting the data in – Shapefiles with LayerMapping

There’s a lot of fun geospatial data out there once you start looking, and the biggest format you’ll find (particularly when dealing with government sources) is the Shapefile.

Shapefiles are a proprietary but documented standard created by ESRI, the giant of geospatial software. Since they’re so common, they’re well supported in just about everything and Django is no exception here.

So let’s take another model, from geodjango-tigerline – the US state model and populate it with data from the US Census Bureau’s TIGER/LINE product. You can find TIGER/LINE at http://www.census.gov/geo/www/tiger/shp.html and the state file we need is named tl_2010_us_state10.zip at ftp://ftp2.census.gov/geo/tiger/TIGER2010/STATE/2010/.

To do the import we can use my reusable app https://github.com/adamfast/geodjango-tigerline.

In models.py:

class State(models.Model):
    fips_code = models.CharField(‘FIPS Code’, max_length=2)
    usps_code = models.CharField(‘USPS state abbreviation’, max_length=2)
    name = models.CharField(max_length=100)
    area_description_code = models.CharField(max_length=2)
    feature_class_code = models.CharField(max_length=5)
    functional_status = models.CharField(max_length=1)
    mpoly = models.MultiPolygonField()

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.name

 

In load.py:

def state_import(path=’/root/tiger-line/’):
    state_mapping = {
        ‘fips_code’: ‘STATEFP10’,
        ‘usps_code’: ‘STUSPS10’,
        ‘name’: ‘NAME10’,
        ‘area_description_code’: ‘LSAD10’,
        ‘feature_class_code’: ‘MTFCC10’,
        ‘functional_status’: ‘FUNCSTAT10’,
        ‘mpoly’: ‘POLYGON’,
    }
    state_shp = os.path.join(path, ‘tl_2010_us_state10.shp’)
    lm = LayerMapping(State, state_shp, state_mapping)
    lm.save(verbose=True)

 

The dictionary defines (on the left side) the name on the model, and the name inside the shapefile’s metadata (on the right side). If you don’t want to import a field, leave it out. If you need to do calculation or processing into a field on the model based on incoming data, set another field not coming from the shapefile, or something else advanced, you need to define a pre-save signal as there’s no way to do this inside a layer map. https://docs.djangoproject.com/en/dev/ref/signals/#pre-save If you run into invalid characters, save() on the layer mapping does take an encoding kwarg. Verbose=True as here will print a line for each object it creates (and any failures).

Now let’s look at creating a new one. Relevant Django docs (which are quite good so a lot of this is copy/paste/adapt): https://docs.djangoproject.com/en/1.3/ref/contrib/gis/layermapping/#example This is all inside of a shell.

from django.contrib.gis.gdal import DataSource

ds = DataSource(‘tl_2010_us_state10.shp’)

print ds[0].fields

I’ll get back this:

[‘REGION10’, ‘DIVISION10’, ‘STATEFP10’, ‘STATENS10’, ‘GEOID10’, ‘STUSPS10’, ‘NAME10’, ‘LSAD10’, ‘MTFCC10’, ‘FUNCSTAT10’, ‘ALAND10’, ‘AWATER10’, ‘INTPTLAT10’, ‘INTPTLON10’]

Based on the documentation the census bureau publishes, we can figure out what each element is and decide on a case-by-case basis what we need. I’m personally not very consistent in how I name fields, unfortunately – in some cases it’s by what the file calls it, in others by what’s actually there. Don’t be like me, decide one way and stick to it.

As the docs say, if you’re unsure the data type ds[0].geom_type will tell you what the shapefile provides. The attribute for your layer mapping dictionary will always be the OGC name: POLYGON, POINT, etc. (there are other geometries such as LINESTRING we haven’t discussed.)

But what’s OGC? It’s the Open Geospatial Consortium, a group advocating for open source in this field. They define the standards and support the community.

 

It all begins: Geographic models

Before we can store geographic information about or query for an object we have to know what we’re storing or querying by.

So what do we choose?

If we’re storing the location of an object, a models.PointField is your best bet. Could it be a three-dimensional models.MultiPolygonField? Sure. But I doubt your average user wants to go to that extent of effort. Thanks to geocoding (a topic for another day) we can translate their street address into a point with little effort.

If we’re storing political or geographic boundaries, models.MultiPolygonField is our best bet. Whether it’s a US State, county, time zone or country of the world it’s very likely the source data will be of this type.

If you really don’t know, and want to be lazy, there is models.GeometryCollectionField – the NoSQL of geographic fields. Sure you don’t have to tell anybody what you’re going to put there, but good luck finding what you need when it comes time to retrieve. The fields are fine, I hold no grudges against them – I just prefer to store my data more strongly typed.

Below are two snippets from different projects, both defining geographic models. One comes from my geodjango-uscampgrounds app and the other from my geodjango-tigerline app. Note we kill off the old-faithful “from django.db import models” in favor of “from django.contrib.gis.db import models” in order to have the geo fields available. We then override the default manager with a geospatially-aware one through “objects = models.GeoManager()” which is sort of optional, but you should know that if you ever attempt to do a geo query, or join from non-geospatial model to a geospatial one and try to do a geo query, it’s not going to work. You’re far better off just setting it on all models in the project whether they’re geographic or not now while you remember.

Note the PointField() definition has a kwarg you’ve likely not seen before – srid=4326. What is this? It’s a spatial reference system. I’m not going to explain it because I always have to look them up. And while that would shame any real geographer, it doesn’t shame me nor should it you. If you need to look one up, spatialreference.org is a great (and Django powered!) resource for you. If the data is coming from the TIGER/Line, a GPS receiver, or most anything else, it’s 4326 aka WGS 84 which is the GeoDjango default. I include it on my models just because.

from django.conf import settings
from django.contrib.gis.db import models

 

class Campground(models.Model):
    campground_code = models.CharField(max_length=64)
    name = models.CharField(max_length=128)
    campground_type = models.CharField(max_length=128)
    phone = models.CharField(max_length=128)
    comments = models.TextField()
    sites = models.CharField(max_length=128)
    elevation = models.CharField(max_length=128)
    hookups = models.CharField(max_length=128)
    amenities = models.TextField()
    point = models.PointField(srid=4326)

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.name

 

class Zipcode(models.Model):
    code = models.CharField(max_length=5)
    mpoly = models.MultiPolygonField()

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.code

 

Tomorrow we’ll start putting data in these models and continue down the road of getting geospatial information back from the database.

The Installation

Please join me this month as we dive head first into building things with Python (and Django) emphasizing location information.

I hope at the end of this you’re up to speed on building basic location-aware web apps and no longer scared that you suck too much at math to do this kind of stuff. It’s not scary or hard, I promise.

I’m going to be focusing on Python most of the time, and the Django framework some of the time. For a database I’ll be using PostGIS because it’s what I use in production, and I’ve never taken the time to install and configure spatiality. Don’t even ask about MySQL, its spatial support sucks.

Unlike Django itself, which is very easy on prereqs, GeoDjango needs a bunch of support before it works properly. This scares people off, but you shouldn’t be afraid – it’s very easy to get over this speed bump.

I personally use Homebrew on my Macs and a list of apt-get-able stuff on Linux. The Django Docs talk about installation quite extensively, in https://docs.djangoproject.com/en/1.3/ref/contrib/gis/install/#ref-gis-install and https://docs.djangoproject.com/en/1.3/ref/contrib/gis/tutorial/#introduction but I’m adding my “quickstarts” below.

Mac:

  • brew install git
  • brew install bazaar exiftool gdal geos imagemagick libtiff readline redis
  • brew install postgresql postgis
  • sudo easy_install pip
  • sudo pip install virtualenv virtualenvwrapper

Linux: (this is for Ubuntu 11.10, and this is more than you need bare minimum. It’s just what I install by default to get a server ready for me.)

  • apt-get install libgeos-3.2.2 proj postgis gdal-bin postgresql-9.1-postgis postgresql-9.1 libgdal1-dev subversion make python-httplib2 python-lxml python-setuptools python-yaml apache2 apache2.2-common apache2-mpm-worker apache2-threaded-dev apache2-utils libexpat1 ssl-cert libapache2-mod-rpaf python-dev python-imaging python-docutils python-markdown python-dateutil flex bzr gettext screen imagemagick mercurial python-simplejson g++ libperl-dev unzip sl locate git
  • easy_install pip
  • pip install virtualenv virtualenvwrapper

One hangup I ran into with this install tactic on 11.10 is that in Postgres 9.1 the Postgres setting “standard_conforming_strings” is now on by default. This causes an issue with the WKB (well-known binary, a format used for geospatial information) and must be turned off. You’ll find it near the bottom of /etc/postgresql/9.1/main/postgresql.conf

Everything I’ll be doing will be in clean virtualenvs. If you’re not familiar with them, they are definitely worth a read. Essentially, no more Python path issues. It sequesters all of your code for a given project off on its own. I recommend you use virtualenvwrapper as well, it makes working with them much easier. The only tricky part is making sure `source /usr/local/bin/virtualenvwrapper.sh` gets placed in your .bashrc or .bash_profile. You also need to restart your shell to pick up the changes.