Getting the data in – Shapefiles with LayerMapping

There’s a lot of fun geospatial data out there once you start looking, and the biggest format you’ll find (particularly when dealing with government sources) is the Shapefile.

Shapefiles are a proprietary but documented standard created by ESRI, the giant of geospatial software. Since they’re so common, they’re well supported in just about everything and Django is no exception here.

So let’s take another model, from geodjango-tigerline – the US state model and populate it with data from the US Census Bureau’s TIGER/LINE product. You can find TIGER/LINE at http://www.census.gov/geo/www/tiger/shp.html and the state file we need is named tl_2010_us_state10.zip at ftp://ftp2.census.gov/geo/tiger/TIGER2010/STATE/2010/.

To do the import we can use my reusable app https://github.com/adamfast/geodjango-tigerline.

In models.py:

class State(models.Model):
    fips_code = models.CharField(‘FIPS Code’, max_length=2)
    usps_code = models.CharField(‘USPS state abbreviation’, max_length=2)
    name = models.CharField(max_length=100)
    area_description_code = models.CharField(max_length=2)
    feature_class_code = models.CharField(max_length=5)
    functional_status = models.CharField(max_length=1)
    mpoly = models.MultiPolygonField()

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.name

 

In load.py:

def state_import(path=’/root/tiger-line/’):
    state_mapping = {
        ‘fips_code’: ‘STATEFP10’,
        ‘usps_code’: ‘STUSPS10’,
        ‘name’: ‘NAME10’,
        ‘area_description_code’: ‘LSAD10’,
        ‘feature_class_code’: ‘MTFCC10’,
        ‘functional_status’: ‘FUNCSTAT10’,
        ‘mpoly’: ‘POLYGON’,
    }
    state_shp = os.path.join(path, ‘tl_2010_us_state10.shp’)
    lm = LayerMapping(State, state_shp, state_mapping)
    lm.save(verbose=True)

 

The dictionary defines (on the left side) the name on the model, and the name inside the shapefile’s metadata (on the right side). If you don’t want to import a field, leave it out. If you need to do calculation or processing into a field on the model based on incoming data, set another field not coming from the shapefile, or something else advanced, you need to define a pre-save signal as there’s no way to do this inside a layer map. https://docs.djangoproject.com/en/dev/ref/signals/#pre-save If you run into invalid characters, save() on the layer mapping does take an encoding kwarg. Verbose=True as here will print a line for each object it creates (and any failures).

Now let’s look at creating a new one. Relevant Django docs (which are quite good so a lot of this is copy/paste/adapt): https://docs.djangoproject.com/en/1.3/ref/contrib/gis/layermapping/#example This is all inside of a shell.

from django.contrib.gis.gdal import DataSource

ds = DataSource(‘tl_2010_us_state10.shp’)

print ds[0].fields

I’ll get back this:

[‘REGION10’, ‘DIVISION10’, ‘STATEFP10’, ‘STATENS10’, ‘GEOID10’, ‘STUSPS10’, ‘NAME10’, ‘LSAD10’, ‘MTFCC10’, ‘FUNCSTAT10’, ‘ALAND10’, ‘AWATER10’, ‘INTPTLAT10’, ‘INTPTLON10’]

Based on the documentation the census bureau publishes, we can figure out what each element is and decide on a case-by-case basis what we need. I’m personally not very consistent in how I name fields, unfortunately – in some cases it’s by what the file calls it, in others by what’s actually there. Don’t be like me, decide one way and stick to it.

As the docs say, if you’re unsure the data type ds[0].geom_type will tell you what the shapefile provides. The attribute for your layer mapping dictionary will always be the OGC name: POLYGON, POINT, etc. (there are other geometries such as LINESTRING we haven’t discussed.)

But what’s OGC? It’s the Open Geospatial Consortium, a group advocating for open source in this field. They define the standards and support the community.

 

It all begins: Geographic models

Before we can store geographic information about or query for an object we have to know what we’re storing or querying by.

So what do we choose?

If we’re storing the location of an object, a models.PointField is your best bet. Could it be a three-dimensional models.MultiPolygonField? Sure. But I doubt your average user wants to go to that extent of effort. Thanks to geocoding (a topic for another day) we can translate their street address into a point with little effort.

If we’re storing political or geographic boundaries, models.MultiPolygonField is our best bet. Whether it’s a US State, county, time zone or country of the world it’s very likely the source data will be of this type.

If you really don’t know, and want to be lazy, there is models.GeometryCollectionField – the NoSQL of geographic fields. Sure you don’t have to tell anybody what you’re going to put there, but good luck finding what you need when it comes time to retrieve. The fields are fine, I hold no grudges against them – I just prefer to store my data more strongly typed.

Below are two snippets from different projects, both defining geographic models. One comes from my geodjango-uscampgrounds app and the other from my geodjango-tigerline app. Note we kill off the old-faithful “from django.db import models” in favor of “from django.contrib.gis.db import models” in order to have the geo fields available. We then override the default manager with a geospatially-aware one through “objects = models.GeoManager()” which is sort of optional, but you should know that if you ever attempt to do a geo query, or join from non-geospatial model to a geospatial one and try to do a geo query, it’s not going to work. You’re far better off just setting it on all models in the project whether they’re geographic or not now while you remember.

Note the PointField() definition has a kwarg you’ve likely not seen before – srid=4326. What is this? It’s a spatial reference system. I’m not going to explain it because I always have to look them up. And while that would shame any real geographer, it doesn’t shame me nor should it you. If you need to look one up, spatialreference.org is a great (and Django powered!) resource for you. If the data is coming from the TIGER/Line, a GPS receiver, or most anything else, it’s 4326 aka WGS 84 which is the GeoDjango default. I include it on my models just because.

from django.conf import settings
from django.contrib.gis.db import models

 

class Campground(models.Model):
    campground_code = models.CharField(max_length=64)
    name = models.CharField(max_length=128)
    campground_type = models.CharField(max_length=128)
    phone = models.CharField(max_length=128)
    comments = models.TextField()
    sites = models.CharField(max_length=128)
    elevation = models.CharField(max_length=128)
    hookups = models.CharField(max_length=128)
    amenities = models.TextField()
    point = models.PointField(srid=4326)

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.name

 

class Zipcode(models.Model):
    code = models.CharField(max_length=5)
    mpoly = models.MultiPolygonField()

 

    objects = models.GeoManager()

 

    def __unicode__(self):
        return self.code

 

Tomorrow we’ll start putting data in these models and continue down the road of getting geospatial information back from the database.

The Installation

Please join me this month as we dive head first into building things with Python (and Django) emphasizing location information.

I hope at the end of this you’re up to speed on building basic location-aware web apps and no longer scared that you suck too much at math to do this kind of stuff. It’s not scary or hard, I promise.

I’m going to be focusing on Python most of the time, and the Django framework some of the time. For a database I’ll be using PostGIS because it’s what I use in production, and I’ve never taken the time to install and configure spatiality. Don’t even ask about MySQL, its spatial support sucks.

Unlike Django itself, which is very easy on prereqs, GeoDjango needs a bunch of support before it works properly. This scares people off, but you shouldn’t be afraid – it’s very easy to get over this speed bump.

I personally use Homebrew on my Macs and a list of apt-get-able stuff on Linux. The Django Docs talk about installation quite extensively, in https://docs.djangoproject.com/en/1.3/ref/contrib/gis/install/#ref-gis-install and https://docs.djangoproject.com/en/1.3/ref/contrib/gis/tutorial/#introduction but I’m adding my “quickstarts” below.

Mac:

  • brew install git
  • brew install bazaar exiftool gdal geos imagemagick libtiff readline redis
  • brew install postgresql postgis
  • sudo easy_install pip
  • sudo pip install virtualenv virtualenvwrapper

Linux: (this is for Ubuntu 11.10, and this is more than you need bare minimum. It’s just what I install by default to get a server ready for me.)

  • apt-get install libgeos-3.2.2 proj postgis gdal-bin postgresql-9.1-postgis postgresql-9.1 libgdal1-dev subversion make python-httplib2 python-lxml python-setuptools python-yaml apache2 apache2.2-common apache2-mpm-worker apache2-threaded-dev apache2-utils libexpat1 ssl-cert libapache2-mod-rpaf python-dev python-imaging python-docutils python-markdown python-dateutil flex bzr gettext screen imagemagick mercurial python-simplejson g++ libperl-dev unzip sl locate git
  • easy_install pip
  • pip install virtualenv virtualenvwrapper

One hangup I ran into with this install tactic on 11.10 is that in Postgres 9.1 the Postgres setting “standard_conforming_strings” is now on by default. This causes an issue with the WKB (well-known binary, a format used for geospatial information) and must be turned off. You’ll find it near the bottom of /etc/postgresql/9.1/main/postgresql.conf

Everything I’ll be doing will be in clean virtualenvs. If you’re not familiar with them, they are definitely worth a read. Essentially, no more Python path issues. It sequesters all of your code for a given project off on its own. I recommend you use virtualenvwrapper as well, it makes working with them much easier. The only tricky part is making sure `source /usr/local/bin/virtualenvwrapper.sh` gets placed in your .bashrc or .bash_profile. You also need to restart your shell to pick up the changes.

Blog Intro / November Attempt

Welcome (a little early) to November, a month where many technical authors around the internet will be attempting to write one blog post per day.

I intend to join them – on something very interesting to me and useful for all as applications become more location-aware – geospatial. I’m only trying for weekdays, and I may not get them all.

Everything we’ll do here is neogeography – many elements of true GIS will be completely ignored. But if you want to take your Python (and Django) into the realm of “where” this should help.

Hopefully this is useful for everybody – but the emphasis is going to be on Python, and certain topics will include Django. I’ll attempt to keep assumptions of knowledge to a minimum to be helpful.

Also, to facilitate this, I’ve set up a WordPress instance. I have my own site, but it’s specialized for things that aren’t blogging…and I decided time was better spent on content than website renovation. (Yes, I’ve written a Django blog. But it doesn’t have everything I want for this endeavor) Others in the Open Source community are solving the blogging problems, so I’ll work on the location ones.

See you tomorrow, when we’ll get this show on the road!