Finding Data: Often harder than using it

One of the hardest parts of doing geo projects is getting the data you need to do it in the first place. In the US at least there are mountains of data at the federal level, some at the state level and who knows what at the local level. There isn’t a single place I can go to for anything outside of census-type stuff. Which school for a given grade level would a child at this point attend? Maybe it’s published, maybe it’s not.

Most cities or counties will have a GIS department. Some of them will be great, and helpful towards your goal as a developer. Others won’t. Our world of the tools and technologies are leapfrogging traditional methods.

So here’s some of my favorite sources of different kinds of data, worth looking at for your next project.

TIGER/LINE: Gigabytes of shapefiles of things the US Federal government collects. I tend to look here first if I need basic location data.

SimpleGeo Places DB: SimpleGeo put a dataset of 21 million places (12ish million in the US) into the public domain last summer, and links to the file here. I have played with the data some and it’s pretty clean but often categorized inconsistently (but hey, it’s free for all to use) There are discussions online about various ways of getting it imported, shapefile conversion didn’t work for me and neither did wrapping it in a feature collection in GeoJSON – it’s just too big – nearly 8GB of JSON. The method I got to actually work was taking it line by line, deserializing and then processing. I split into a TON of files containing 20,000 places each and ran several processes to get it imported. It’s a BIG database, and pretty slow in PostGIS so be warned. I have no plans to put anything in production from it so speed isn’t that big of an issue. As a side note, people importing this dataset are probably their best sales tactic toward paying for their SAAS version.

Flickr Shapefiles: Also public domain, potentially useful if you need to bring photos into the mix.

Timezones: Full shapefile of timezones of the world. Useful for auto-detecting your users’ time by their location and auto-shifting times to them. Public domain.

Free IP/Location database: Similar to GeoIP but community built. Likely not as comprehensive, it had no idea where I was for instance.

data.gov: It’s hard to find what you want and sometimes hard to figure out how to use the format it’s in, but data.gov is as close to a one-stop-shop as you may come.

data.nasa.gov: Datasets published by NASA. This one’s still a work in progress, but has a better search than data.gov and includes most NASA data listed on data.gov.

Ordnance Survey: Open (see licenses) data for the UK.

FCC GIS: Information on communications systems you can import.

FCC APIs: APIs to interact with the FCC’s data through their systems, includes data gathered for the national broadband map.

FreeGIS: Links to other datasets and software related to geo

NAIP Imagery: Satellite photos from flyovers during growing seasons.

Land Cover: Determining classification of land – forested, open, urban, etc.

National Hydrography Dataset: Information on our water.

Cartographic Boundaries: Outlines of various types of drawn boundaries.

 

Have you got a favorite or just plain useful dataset missed here? Leave a comment and let us know about them!

Leave a Reply

Your email address will not be published. Required fields are marked *