Putting it all together, part 2

We’ve got our quakes app installed (update it if you’re following along at home, I committed a bunch of code changes for this entry) and our initial data imported.

However, we need this data to be updated somewhat regularly for the service to be useful. Sean shipped django-quakes with Celery support (which I commented out of the requirements file because I’m running this app on the neat ep.io auto-scaling webapp service‘s free tier and don’t have Celery access. If you run Celery just re-enable it, if you don’t use Celery or even know what it is (check it out!) you can use cron to schedule it just as well.

To avoid polling too often and bugging the USGS I’d set the interval at hourly or more under normal circumstances. We only care about quakes when our users do. I’d run it at least daily though so you keep history.

Since we care when our users do and earthquakes can’t be convinced to ONLY hit at certain minutes after the hour, what do we do? How about we check the latest quake we have and see if it’s beyond a certain age to force a refresh?

This is more of that “please don’t do this” code – yes, it works, but it’s bad. If you’re like me and want the site to auto-retrieve at a faster schedule when users are using it, you might be tempted to do this:

latest_quake_ago = datetime.datetime.now() - quakes[0].datetime
if latest_quake_ago > datetime.timedelta(minutes=5):
    from django.core import management
    management.call_command('load_quakes')

This is bad. Why? Our webapp isn’t a single-file line. When there’s an earthquake, people start going nuts. The resulting traffic will be somewhat of a flood, so imagine a bunch of these views all running at once. USGS will not be happy with us, and we’re doing all kinds of extra processing that just gets thrown out. (NOTE: If you’re trying this in the default dev server, it won’t work – it’s single-threaded so both will never be going at once.)

So what’s our next thought? Lock files (virtually, in cache instead of filesystem)!

checking = cache.get('usgs-poll-in-progress', False)
if not checking:
    print('spawning check')
    cache.set('usgs-poll-in-progress', True)
    latest_quake_ago = datetime.datetime.now() - quakes[0].datetime
    if latest_quake_ago > datetime.timedelta(minutes=5):
        from django.core import management
        management.call_command('load_quakes')
    cache.delete('usgs-poll-in-progress')
    print('check done.')
else:
    print('in progress')

We’re still not quite there. After all, at the time of this writing there hasn’t been an earthquake in 45 minutes – so every access kicks off a poll. Let’s add caching to store the last time a poll was done.

last_check = cache.get('usgs-poll-last-finished', datetime.datetime(2000, 1, 1))
checking = cache.get('usgs-poll-in-progress', False)
if not checking:
    cache.set('usgs-poll-in-progress', True)
    latest_quake_ago = datetime.datetime.now() - quakes[0].datetime
    latest_check_ago = datetime.datetime.now() - last_check
    if latest_quake_ago > datetime.timedelta(minutes=5) and latest_check_ago > datetime.timedelta(minutes=5):
        print('spawning check')
        from django.core import management
        management.call_command('load_quakes')
        cache.set('usgs-poll-last-finished', datetime.datetime.now())
        print('check done.')
    cache.delete('usgs-poll-in-progress')
else:
    print('in progress')

But what about people who hit the site while a poll is in progress? How about a time.sleep(0.5) loop?

Did you just throw your mouse? I dodged it. Yeah, that’s a bad idea. After all, it does tie up our Python processes at the moment when we’ve got hordes of eager, shaken (pun only slightly intended) users ringing the doorbell.

So what’s a programmer to do? Let’s set a context flag in the template and use it to trigger something in the template about “we just heard, we’re checking into it!”with what we already have.

def earthquake_display(request):
    weekago = datetime.datetime.now() - datetime.timedelta(days=7)
    quakes = Quake.objects.filter(datetime__gte=weekago).order_by('-datetime')

    last_check = cache.get('usgs-poll-last-finished', datetime.datetime(2000, 1, 1))
    checking = cache.get('usgs-poll-in-progress', False)
    if not checking:
        cache.set('usgs-poll-in-progress', True)
        latest_quake_ago = datetime.datetime.now() - quakes[0].datetime
        latest_check_ago = datetime.datetime.now() - last_check
        if latest_quake_ago > datetime.timedelta(minutes=5) and latest_check_ago > datetime.timedelta(minutes=5):
            from django.core import management
            management.call_command('load_quakes')
            cache.set('usgs-poll-last-finished', datetime.datetime.now())
        cache.delete('usgs-poll-in-progress')
        checking = False

    return render_to_response('earthquakes.html', {
        'object_list': quakes,
        'checking': checking,
    }, context_instance=RequestContext(request))

We’re getting there! We still need geolocation, geographic filtering, and a frontend.

Leave a Reply

Your email address will not be published. Required fields are marked *