Finding the nearest with the ORM

We have data, we know how to take what our user is providing us, it’s time to stitch the two together and find things nearby.

The easiest option is to simply let the database order them.

>>> from django.contrib.gis.geos import Point
>>> lawrence = Point((-95.234657999999996, 38.972679999999997))
>>> Campground.objects.all().distance(lawrence).order_by('distance')
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>, '...(remaining elements truncated)...']

This results in the query:

SELECT (ST_distance_sphere("uscampgrounds_campground"."point",ST_GeomFromEWKB(E'\\001\\001\\000\\000 \\346\\020\\000\\000\\251\\357\\374\\242\\004\\317W\\300\\224\\274:\\307\\200|C@'::bytea))) AS "distance", "uscampgrounds_campground"."id", "uscampgrounds_campground"."campground_code", "uscampgrounds_campground"."name", "uscampgrounds_campground"."campground_type", "uscampgrounds_campground"."phone", "uscampgrounds_campground"."comments", "uscampgrounds_campground"."sites", "uscampgrounds_campground"."elevation", "uscampgrounds_campground"."hookups", "uscampgrounds_campground"."amenities", "uscampgrounds_campground"."point" FROM "uscampgrounds_campground" ORDER BY "distance" ASC

which for this dataset EXPLAIN ANALYZE tells me runs in about 55ms with a sequence scan. (Don’t forget we can get the SQL to any query with print(queryset.query) for analysis to see if we can fine tune it.)

Slicing to get only the nearest 20 adds a LIMIT to the above query and takes less than 9 milliseconds.

>>> Campground.objects.all().distance(lawrence).order_by('distance')[:20]
[<Campground: Clinton State Park>, <Campground: Bloomington Pubilc Use Area - Clinton Lake>, <Campground: Rockhaven - Clinton Lake>, <Campground: Lone Star Lake Park>, <Campground: Slough Creek - Perry Lake>, <Campground: Perry State Park>, <Campground: Old Town - Perry Lake>, <Campground: Michigan Valley - Pomona Reservoir>, <Campground: 110 Mile Park - Pomona Reservoir>, <Campground: Hillsdale State Park>, <Campground: Outlet Park - Pomona Reservoir>, <Campground: Pomona State Park>, <Campground: Carbolyn Park - Pomona Lake>, <Campground: Warnock Lake Park>, <Campground: Outlet Park - Melvern>, <Campground: Coeur DAlene - Melvern Lake>, <Campground: Longview Lake County Campground>, <Campground: Arrow Rock - Melvern Reservoir>, <Campground: Louisburg Middle Creek State Fishing Lake>, <Campground: Atchison State Fishing Lake>]

Once you get to any scale of data this gets pretty slow, but right now it’s the only way to get “nearest _x_ regardless of distance”.

It will only get faster – right now it’s a sequence scan through all records which isn’t very efficient. PostGIS 2.0 will have nearest neighbor searching with indexes which will make it WAY faster.

Leave a Reply

Your email address will not be published. Required fields are marked *