Big Data Vector Heatmaps with Mapbox GL

There's always one big bottleneck in data analytics and visualization solutions -  Making sense of big location data is hard.  Not only is it hard to render, but it's also hard to make a useful visualization that helps answer a specific question.

Many platforms either don't allow for lots of location data to show up on screen, or shade data as raster images before sending to a client.  Both of these limit the questions you can answer with your data.

So let's fix this.  How can we make large location data useful and easy to work with in any application?

Visualizing Routes & Traces

Let's make a visualization of hundreds of thousands of traces as a heatmap around the world using Mapbox GL JS.  The question we want to answer is where are the most traveled cycling routes around the world.  

Fullscreen Demo

Step 1 - Gather data

First determine where all of your data is stored.  In this case, I have all of my data in PostGIS.  Your data can be in any data store.

Transform your data into the WGS84 coordinate system and store it as line-delimited geojson.  In the PostGIS case, that means I can extract data using ogr2ogr with the command below:

ogr2ogr -f GeoJSON out.json "PG:host=localhost dbname=mydb user=myus password=mypw" -sql "select * from my_geom_table" -s_srs EPSG:4326

Now I have an exceedingly large line-delimited out.json file that contains each of the points or linestring traces from my database.  This step can easily be multi processed using a distributed system or parallel queries.

Step 2 - Install mapbox/tile-count

Tile-Count is an open-source tool authored by Eric Fischer that creates spatially aggregated vector tile features from dense point, line, or polygon data.  We'll use tile-count to aggregate our dense route data into spatial bins that become coarser or finer depending on zoom level.  The output will be a Mapbox Vector Tileset which we can efficiently visualize in a front-end web or mobile application.

To install tile-count on MacOS or Linux, follow the instructions in the tile-count readme.  Tile-count does not support Windows at this time.

Step 3 - Create Counts

We can create counts from our out.json file using tile-count.  Counts are a spatial representation of the number of features in a particular square geographic area, known as a tile, uniquely identified with a quadkey.  Read up more on tiled maps and quadkeys here.

To create counts, use the following command:

tile-count-create -o out.count out.json

This command will output an out.count file which contains 

  1. A location identifier (known as a tile or quadkey)
  2. A `count` property for each location identifier that represents the total number of features in that location bin.

You can use the -s option for tile-count-create to reduce the precision of your data.  I recommend a maximum precision of 5 or 6 for most applications.

Optionally you can perform the count analysis as multiple distributed jobs by chunking the out.json file into parts, and using tile-count-merge to join all of the count files together.  We won't use this in our job, but if your data is larger than ~1GB, run a distributed job and merge the results together.

Step 3- Create Tiles

Now that we have counts, we want to pack the count data into map tiles we can use in our application.  Tile-Count has options for both vector and raster tile output formats - in this example, we're going to use vector tile outputs to get the full capability of the Mapbox GL Style Spec to design our visual with.

Use the command below to create tiles from your count:

tile-count-tile -f -n "my_count_layer" -p num_cpus -P -z11 -o out.mbtiles out.count

Let's unpack the options above:

  1. -f overwrites any existing files named out.mbtiles
  2. -n specifies the layer name of the output vector tile.  Replace this with something short and unique describing your data.
  3. -p specifies the number of CPU threads to use.  Set this to the number of cores available on your machine.
  4. -z is interesting - it specifies the max zoom of our tileset.  This means that zooming over z11 in this example will not yield any new information, it will just overzoom (scale up) the existing data at zoom 11.  Set this to a higher value to get more detailed spatial bins at lower zoom levels.
  5. -o is the name of the output tileset.  Call this something unique with the file extension .mbtiles
  6. -P outputs points instead of polygons.  I recommend points for most visual applications.  Analytical applications may prefer to omit this option and output polygons to explicitly bound the area of each bin.

Step 4- Upload Tiles to Mapbox

You can upload the tiles to your Mapbox account by dragging/dropping out.mbtiles onto your Mapbox account tileset page, or using the Mapbox Uploads API.  I used the commands below to upload from the command line:

pip install mapboxcli
export MAPBOX_ACCESS_TOKEN=my-mapbox-uploads-scope-token
mapbox upload out.mbtiles my-mapbox-account-name.my-tileset-name

Alternatively, you can host and serve these tiles locally if you need to store data on-premise.  Check out an open-source vector tile server such as tegola.

Step 5 - Make a Map Style

Now the fun part - designing how the map will look!  I recommend using the Mapbox Studio Style Editor to get started.  You can directly export your style to use in any web or mobile application.

Add your new count tileset to a new base map.  I recommend the Mapbox Dark base map for data visualization.

Add your tileset to a new style as a dark base map.

Add your tileset to a new style as a dark base map.

Now you can use the style editor to make the map look great using data-driven styles.  I used the following data-driven style, borrowing colors from color-brewer to get the look just right.  Also it's great for non-designers to make great looking maps! 

Creating a data-driven style based on the density property in the tileset.

Creating a data-driven style based on the density property in the tileset.

This part is where vector data really shines, because you can completely change the visual appearance on the fly based on the data properties in the tileset.  Here we are changing the `color` of a circle based on the value of density, which is the normalized count value that corresponds to the number of features from the tile-count process.

To get the satellite map to fade in at low zoom levels, I added a new layer using the mapbox.satellite tileset and specified the minzoom and maxzoom to be 0 and 5, respectively.  Then I added a zoom-driven raster-opacity setting to fade the satellite map out from zoom 2 to 5.

Adding a zoom-driven opacity function to transition from a satellite map to a dark data viz map.

Adding a zoom-driven opacity function to transition from a satellite map to a dark data viz map.

Publish your style when you're done, and it's ready to share in your web or mobile application.

Hey, What about aggregation by property value?

It's all coming soon!  Right now tile-count is a framework for density aggregations based on the number of features in a tile.  Essentially the function of tile-count is to group and aggregate data (count), and then create vector tiles.  

Imagine hooking up any old data analysis tool or database to perform this query - then you could aggregate, predict, model -> whatever you want to do, and output vector tiles.  For a future blog post!

Boom

This is just the start - we'll dive deeper into more examples in future blog posts.  To answer our question from the start - ADV users ride the most in London.

In particular, this section of road along the Thames is quite the cycling through way.  

What large location data visualizations have you built?  Drop your examples in the comments below.