Last October, I began recording temperatures for my homebrew setup. It was early in my explorations of Python and scripting, and ran in a pretty silly fashion. It went something like this:
Echo the output of a C program that read the temperatures into a log file
Parse that logfile using python to create a JSON file for Google Charts to use
Use PHP embedded in HTML via an AJAX callback function to parse that JSON file
Display the result in Google Charts
This was incredibly slow and process intensive when more than 1 week’s worth of data was captured, so I moved the log to a backup file weekly, leaving a blank file for the next week’s data.
InfluxDB has been getting a lot of buzz on HackerNews lately. It is a time-series database, which means that the primary key of any entry is the timestamp of when the data was entered into the database. Queries are done using a SQL-like syntax, but everything is oriented around time. Queries like “give me the last data value of every hour” and “what is the average value of this data point for the past 3 Wednesdays” are difficult to do with large data sets in a traditional SQL database, but InfluxDB is designed from the ground up to handle them.
For a future idea - how to join time-series data from Influx to regular relational data in MySQL? This is relevant for another project I’m currently working on. Keep an eye out for updates on that project as well.
What won me over to databases like InfluxDB and RethinkDB is the simple, no-nonsense, easy-to-use web interface that comes ready right out of the box, and the SQL-like query language. Another big win for Influx was support from Grafana, a turnkey open-source time-series graphing platform, which I highly encourage you to check out.
I set up InfluxDB on a Vagrant VM on my Mac server, with the Vagrantfile like so:
This forwards the InfluxDB ports, port 80 for serving Grafana, installs influx, Grafana, and python dependencies, and sets up cron jobs (see below).
With InfluxDB running, the first step was to capture some data! Since I bought a Nest themostat, I figured that would be a good piece of time-series data to measure.
Running this every minute via cron pulls the current temperature as well as the target temperature (what the thermostat is set to) into InfluxDB, through its easy-to-use but peculiarly-structured API.
I did the same thing with Forecast.io, my favorite weather API. This tells me the temperature outside my house:
This required me to get a Forecast.io API key, with a limit of 1000 calls per day. That’s why I call it once every 2 minutes, for 720 calls per day.
The last part of setting up the VM is to perform database migrations - set up users and databases for the temperature data to go into: