On Caching : AthleteDataViz Segments++

What we're about to make

Try the solution live at the link below (select VizType "Segments++" from the toolbar menu):
https://design.athletedataviz.com/demodesigner

Problem

Strava's competitive platform for endurance athlete's has risen to the top because of its "Strava Segment King of the Mountain (KOM)" innovation.  Athletes compete for the fastest time on any start/finish user-defined segment on a map : a race with anyone, anywhere, anytime!  The problem is, it can be hard to find segments without stumbling across them.  

The critical limiter is that the Strava 3rd party API returns only the "top 10 most popular" segments in a given lat/long bounds.  What if a user wants to zoom out and view all the segments in a given area?  What if they want to see all the segments on their target ride route?  What if the user is looking for new segments that are not popular?

Solution Design

I built an extension on top of the AthleteDataViz (ADV) platform called Segments++ to address the Strava Segment explorer issue.  My solution design has three parts:

  1. A "bisect_rectange" function, which divides a map region into a bunch of smaller regions.
  2. A storage layer in-between the Strava API call and the user's Segment++ map.
  3. A restful API service which combines the API call and all cached data.

Solution Part 1 - Server-side Bisect Rectangle function

The server-side of ADV uses Python's Flask web framework, so I wrote the server side code in Python.  The function below accepts user input - a LatLong map bounds set and the "number of splits" to make in the given map bounds - and returns a list of lists containing all of the sub-rectangles in the given map area. 

The objective here is to take a user's map bounds, and split them into a bunch of tiny map bounds within the user's selected viewing area.  Then we make API calls to each of the sub-areas, cache the results, and return the concatenates results to the user.  We have to be careful here with the number of splits, since the number of sub-areas grows exponentially - we don't want to make too many API calls and exceed the Strava API rate limit of 600 calls per 15 minutes.


def bisect_rectange(numSplits, minlat, minlong, maxlat, maxlong):
    """Split a rectange into (numSplits+1)^2 sub-rectanges.  
       Return a list of extent arrays 
       Example result [[40.681, -89.636, 40.775, -89.504], [40.681, -89.636, 40.775, -89.504]] 
    """
    #initialize function variables
    longpoints = []
    latpoints = []
    extents = [] 
    
    #First get a list of the split lat/long locations in the rectangle
    for i in range(numSplits+1):
        latpoints.append( (minlat + ((maxlat-minlat)/numSplits)*i) )
        longpoints.append( (minlong + ((maxlong-minlong)/numSplits)*i) )
    
    #Now loop through the line locations and create a list of sub-rectangles
    for latindex, latmin in enumerate(latpoints):
        for longindex, longmin in enumerate(longpoints):
            if latindex<(len(latpoints)-1) and longindex<(len(longpoints)-1):
                newextent = [latmin, longmin, latpoints[latindex+1], longpoints[longindex+1]]
                extents.append(newextent)
    return extents

Solution Part 2 - Server-side storage layer (caching)

Now that we have the segment data, we need to store it in a caching layer so that we can display it back to the user at any zoom level.  ADV uses the open-source PostgreSQL database with the PostGIS extension for geospatial functions.  It will serve as the Segments++ caching layer.


def seg_to_df(segment_explorer, act_type, engine, startLat, startLong, endLat, endLong):
    dflist = []
    #alias the user input activity type
    if act_type == 'riding':
        acttype = 'ride'
    else:
        acttype = 'run'
    #find existing segments already in the database
    dl_list = get_segs_in_db(engine, 'Segment', startLat, startLong, endLat, endLong, acttype).tolist()
    #Create a dataframe for all new segments
    for exp in segment_explorer: 
        for seg in exp:
            seg_id = int(seg.id)
            if seg_id not in dl_list:
                newrow = {'seg_id': int(seg_id),
                          'name': unicode(seg.name),
                          'act_type': str(acttype),
                          'elev_low': 0,
                          'elev_high': 0,
                          'start_lat': float(seg.start_latlng[0]),
                          'start_long': float(seg.start_latlng[1]),
                          'end_lat': float(seg.end_latlng[0]),
                          'end_long': float(seg.end_latlng[1]),
                          'date_created': datetime.utcnow(),
                          'effort_cnt': 0,
                          'ath_cnt': 0,
                          'cat': int(seg.climb_category),
                          'elev_gain': float(seg.elev_difference),
                          'distance': float(seg.distance),
                          'seg_points': str(seg.points)
                          }
                dflist.append(newrow)

    if len(dflist)>0:
        seg_df = pd.DataFrame(dflist)
        #Insert the new segments into the database from the dataframe
        seg_df.to_sql('Segment', engine, if_exists='append', index=True, index_label='seg_id')
        return seg_df
    else:
        return None

Solution Part 3 - Deliver all the data to the user's map via a REST API

In part three, we want to get the aggregated "new and cached" segment data for the user and display it on their map.  Here we need to write some well-optimized SQL to keep user response speeds quick, and expose the data to the user via an API.

The query from our database is:


geojson_sql = """
                SELECT row_to_json(fc) 
                FROM (SELECT 'FeatureCollection' As type, 
                  array_to_json(array_agg(f)) As features
                  FROM (SELECT 'Feature' As type, 
                  st_asgeojson(st_LineFromEncodedPolyline(lg.seg_points, 4))::json AS geometry,
                  (
                  select row_to_json(t) 
                  FROM (SELECT lg.name as name,
                                lg.act_type as type,
                                round((lg.distance*0.000621371)::numeric,1) as dist,
                                round((lg.elev_gain*3.28084)::numeric,1) as elev) as t
                                 ) as properties
                            FROM "Segment" as lg 
                                  WHERE ST_Contains(ST_Envelope(ST_GeomFromText('LINESTRING(%s %s, %s %s)')), lg.start_point)
                                  AND lg.act_type = '%s' and lg.distance BETWEEN %s and %s LIMIT 500
                     ) as f) as fc"""  % (startLong, startLat, endLong, endLat, acttype, distlow, disthigh)

Our Flask API call  runs the query from the PostGis database given the user's location and filter settings, then returns the data as geojson to the client.


class Segment_Data(Resource):
    def get(self):
        parser = reqparse.RequestParser()
        parser.add_argument(
            'startLat', type=float, required=True, help='Enter start latitude')
        parser.add_argument(
            'startLong', type=float, required=True, help='Enter start longitude')
        parser.add_argument(
            'endLat', type=float, required=True, help='Enter end latitude')
        parser.add_argument(
            'endLong', type=float, required=True, help='Enter end longitude')
        parser.add_argument(
            'act_type', type=str, required=True, help='Enter act type riding or running')
        #Optional arguments
        parser.add_argument('start_dist', 
            type=float,help='Enter start distance in meters')
        parser.add_argument('end_dist', 
            type=float, help='Enter end distance of activility in meters')
        parser.add_argument('newSegs', 
            type=str, help='Enter True or False to get new segments from the Strava API')

        args = parser.parse_args()
        #For optional args, set values for query if args are blank
        if not args['start_dist']:
            start_dist = int(0)
        else:
            start_dist = args['start_dist']
        if not args['end_dist']:
            end_dist = int(100000)
        else:
            end_dist = args['end_dist']
        if not args['newSegs']:
            newSegs = 'False'
        else:
            newSegs = args['newSegs']
        seg_geojson = seg_sp.get_seg_geojson(engine, args['startLat'], args['startLong'], args['endLat'], 
                                            args['endLong'], args['act_type'], start_dist, end_dist, newSegs)

        return output_json(seg_geojson, 200, 5)

Sample output:

{"type": "FeatureCollection", 
"features": [{"geometry": {"coordinates": [[-89.75104, 40.73507], [-89.75126, 40.73503]... 

Conclusion

We can easily extend both the Strava API and the ADV visualization platform through a hybrid caching approach.  What we need to be careful of is keeping the cached data up to date.  The good news is that high-level segment information does not change frequently (like segment location, name, distance, and elevation).  Therefore we can safely return cached data to the user and not worry about it being out of date.  Other live information, such as KOM Leaderboards, we can make a live API call for.

Want to check out the code for this project?  AthleteDataViz is on GitHub!