Now that www.athletedataviz.com is up and running, I wanted to tell a story about our user base that would be visually engaging and genuinely revealing. The first idea that I followed through with was a customizable "ADV Global Heatmap".
How was the data aggregated and visualized? The one-two-three steps are below!
First, check out the blog post on the website. The main purpose is to "rank" each country by it's respective cycling performance, using anonymous Strava data fetched by users for the ADV platform.
Here's the main visual for the blog post:
The steps to make the visual above:
STEP 1 - Extract the data
Since ADV stores all of its activity data in PostGIS, this is easy! What I want to return is an aggregated data set from all ADV users. One row per user. The data set should look like this:
Using PostGIS we can extract this information from one simple SQL query over a table with ~15 million rows.
SELECT ATH_ID, AVG(SPEED), AVG(ELEVATION), AVG(ALTITUDE), /* All remaining summary metrics here */ AVG(LATITUDE), AVG(LONGITUDE) FROM ACTIVITY_DETAILS Group by ATH_ID
OK! Now we have our data query, extracted and saved to an Excel file.
STEP 2 - Reverse Geocode
Reverse Geocoding is the process of finding what address, city, state, and country a geo data point belongs to (usually the point is a lat/long value, or a local coordinate system point value). The underlying logic to this process is just a "Point-In-Polygon" query : asking what "polygons" a lat/long value falls within.
To do this, we are going to use the google geocoding API and a python library to interact with it called "pygeocode". After installing the library (pip install pygeocode), I built the following quick script using the Pandas "apply" function to use the Geocode.reverse function from the pygeocode library on the data set in step 1.
Ok! Now with a bit of munging we can split the reverse geocoded data into city, state, and country metadata for each athlete.
STEP 3 - Create the Tableau Dashboard Visual
I choose Tableau for the visual because it's a great BI tool for the job, and because of it's free Tableau Public offering for dashboard hosting and licensing. Since I anonymized the data set (removing all personally identifying information from the data set), I can comfortably use Tableau Public.
First I connected to the data set. Second, I added Country to the Marks shelf, and then selected the "Filled Map" viz style.
I wanted a bit more customizable mapping backgrounds for this project, so I connected to Mapbox for my mapping layer. Tableau v9.2+ makes this really easy. There's only a few buttons to click:
I choose the Mapbox "Dark" background for awesome color contrast visuals on a filled map.
Next, I made a parameter which allows the user to select a target metric to swap between. Mainly the need here is to change the "Selected Metric" value on the Marks. The parameter allows for that:
Next I'll create a new sheet of data to show the "Rank" of the country by the selected metric:
Awesome! Now all that's let's is the Dashboard view and some interactivity. The main interactivity I want is a mouse over highlight - which I set up to happen using the Dashboard - Actions toolbar. In addition I want a link back to ADV - So I add one hyperlink area on the dashboard to direct users back to my site.
That's it! Publish to Tableau and embed in your blog or website.
This is just the beginning of what we can do with athletics data in bulk. I'd like to broaden the ADV user base, and get statistically significant numbers of athletes from more countries, and develop metrics to truly compare performance across the world.
Any ideas on how to improve the analysis or the visual - please let me know in the comments.
PS - The USA needs to step it up. Not in the Top 5 of any category ranking!