Bulk convert GPX files from Strava to a CSV

Strava has the option to "Bulk Export GPX" data.  A GPX file is an XML schema file which stores GPS data, and associated way points, tracts, and routes.  We can use this data, as opposed to the Strava API, to generate maps and analysis of our data.

 

To get started, open up Strava and download your GPX files.  To do this, navigate to your Account Settings page, and click the link below in the bottom right corner of your dashboard.

 

Next, open your email and move the file sent to you to a new folder on your computer.  Extract the zipped file contents to a new folder, named "data". 

 

Using Python, we will now extract the GPX data to CSV using the libraries "Pandas" for data processing, and "GPXPY" for parsing the XML data from the GPX files into a Python dictionary.

 

In [10]:
# Convert a directory of GPX files to CSV of timestamp, lat, long, elevation
# Ryan Baumann
# 2015-07-18
# Requires pandas, gpxpy libraries
# Python 2.7.10
In [11]:
import gpxpy
import os
import pandas as pd

#Change these Global Vars to match your input and output data directories
INDIR = r'C:\Projects\GPX\data\parse'
OUTDIR = r'C:\Projects\GPX\data'
In [12]:
#Set the working directory to the INDIR variable
os.chdir(HOME)
In [15]:
def parsegpx(f):
    #Parse a GPX file into a list of dictoinaries.  
    #Each dict is one row of the final dataset
    
    points2 = []
    with open(f, 'r') as gpxfile:
        # print f
        gpx = gpxpy.parse(gpxfile)
        for track in gpx.tracks:
            for segment in track.segments:
                for point in segment.points:
                    dict = {'Timestamp' : point.time,
                            'Latitude' : point.latitude,
                            'Longitude' : point.longitude,
                            'Elevation' : point.elevation
                            }
                    points2.append(dict)
    return points2   
In [14]:
#Parse the gpx files into a pandas dataframe
files = os.listdir(INDIR)
df2 = pd.concat([pd.DataFrame(parsegpx(f)) for f in files], keys=files)
df2.head(5)
Out[14]:
Elevation Latitude Longitude Timestamp
20131127-172322-Ride.gpx 0 116.0 33.201111 -117.281861 2013-11-27 17:23:22
1 116.7 33.201166 -117.281807 2013-11-27 17:23:24
2 117.3 33.201275 -117.281763 2013-11-27 17:23:27
3 117.8 33.201385 -117.281732 2013-11-27 17:23:30
4 117.8 33.201424 -117.281731 2013-11-27 17:23:31
In [50]:
#Write the data out to a CSV file
os.chdir(OUTDIR)
df2.to_csv('rides.csv')

Above is an overview of the parsing process in an Ipython Notebook.  All you have to do is change the INDIR and OUTDIR variables in the first cell to match the source of your data, and where you want to write your CSV file out to.

 

Here is the file as .PY script you can run if you have the proper Python libraries installed.

 

Now we can import the data into any program we want for analysis or mapping.