Time-series graphs with TempoDB and Flot - Kristian Glass

A side project of mine that I’m working on at the moment is StackCompare, an app for StackExchange users to compare their reputation and badges to that of their friends. One feature I wanted to add was a graph of reputation over time.

Step One - Data Aggregation

First things first, get the data into some sort of database. The data is a time series (a set of tuples of the form (timestamp,data)), so my thoughts immediately went to setting up my own OpenTSDB instance. However, where possible I’d rather use a hosted solution at this stage of development, and some googling led me to TempoDB, a hosted time-series database service. Currently Tempo seems quite early-stage (a warning sign to me is lack of any mention of pricing…) but works quite nicely, with decent documentation and a Python client.

Writing to TempoDB is nice and straightforward:

def _reputation_key(site, user_id):
    key = '%s.%d.reputation' % (site, user_id)
    return key

def write_reputation(site, users):
    data = [{'key': _reputation_key(site, user.user_id), 'v': user.reputation} for user in users]
    now = datetime.utcnow()
    TEMPODB_CLIENT.write_bulk(now, data)

and then it was just a matter of wrapping this in a management command (using stackpy, my Python library for the StackExchange v2 API (currently quite pre-alpha-quality…)):

class Command(BaseCommand):
    help = 'Grab reputation for all users and their friends, and store in TempoDB'

    def handle(self, *args, **options):
        s = stackpy.Stackpy(settings.STACKEXCHANGE_CLIENT_KEY)
        user_ids = _get_all_user_ids()
        users = s.users(user_ids).items
        tempo.write_reputation('stackoverflow', users)

As StackCompare is currently a Heroku app, it was trivial to hook up the Heroku Scheduler to run this every 10 minutes.

Step Two - Data Extraction

Getting things out of TempoDB is equally straightforward:

def get_reputation(site, user_ids):
    #Fairly fluffly datetime range
    end = datetime.utcnow() + timedelta(days=1)
    start = end - timedelta(weeks=52)
    keys = [_reputation_key(site, user_id) for user_id in user_ids]
    datasets = TEMPODB_CLIENT.read(start, end, keys=keys, interval='1hour')
    return [(_key_to_dict(dataset.series.key), dataset.data) for dataset in datasets]

I’m not using series attributes just yet as that part of the client library is still slightly in flux, instead encoding them in the key

Step Three - Display

On the graphing front, it was time to whip out Flot - a Javascript plotting library for jQuery, beautifully simple to use.

First, some placeholder HTML (with some slightly ugly hardcoded sizes…):

<div id="plot" style="width: 960px; height: 500px;">
    <h2 id="plot-placeholder">Loading...</h2>
</div>

Then, a little Javascript to populate it:

    <script src="{% static "flot-0.7/jquery.flot.js" %}"></script>
    <div id="graph-data-url" data-url="{% url "graph_data_api" %}"></div>
    <script>
        var url = $("#graph-data-url").data("url");
        $(function() {
            $.get(url, function(data) { //TODO Handle errors...
                $("#plot-placeholder").remove();
                var data = $.parseJSON(data);
                var options = {
                    xaxis:{mode:"time"},
                    series:{
                        lines:{show:true},
                        points:{show:true}
                    }
                };
                $.plot($("#plot"), data, options);
            });
        });
    </script>

All that was needed to finish it off was some short Python code to massage the data from TempoDB into the right format for Flot (slightly paraphrased):

@login_required
def graph_data(request):
    user_profile = request.user.get_profile()

    ids = _get_ids(user_profile)

    flot_data = []
    data = tempo.get_reputation('stackoverflow', ids)
    for attr_dict, series_data in data:
        def flotify(series_data):
            return [[time.mktime(point.ts.timetuple()) * 1000, point.value] for point in series_data] # Javascript time in ms
        user_id = int(attr_dict['user_id'])
        label = _determine_label(user_id)
        series = {'label': label, 'data': flotify(series_data)}
        flot_data.append(series)

    return HttpResponse(json.dumps(flot_data))

And voila: