Update: So I spelt
Neil Neal’s name wrong. Sorry Neal! I also added some other links to the bottom that he mentioned to me.
Its been a little over 48 hours, and StatsD, Graphite, and Graphene have already paid off major dividends. While speaking at Tek12, Neal Anders was giving a talk on Graphite which I wanted to attend, but I was speaking at the same time I was speaking on Redis so I couldn’t. Then, a week or so after the conference, someone posted a link about StatsD on twitter (I can’t remember who), so I started poking around, loved what I saw, and within an hour I had it running on a server for Dating DNA friday night (I still do some contract work for them to help keep their infrastructure running well). I quickly added about a dozen metrics to track, and went to bed.
I noticed two things after looking at the data this weekend, first off, we had one API in particular that blew me away. We were measuring the time it took to complete API requests, measured is Milliseconds (MS):
Wow… so we had requests taking 5k to 10k milliseconds (5 to 10 seconds), and spiking up to 40 seconds. After 10 minutes of looking at the code, I found the inefficiency, and if you look at it now (the arrow is where the code change was made), its taking less than 300 ms.
Then, I noticed another oddity. Across the board, most APIs had patterns of massive spikes for a few minutes:
Things would run optimally, but those hourly massive spikes were painful. After a few hours of investigation, I found a munin mysql plugin that would aggressively analyze disk usage via querying internal tables on MySQL. This was highly inefficient and taxing on the server, causing massive disk IO issues. After disabling that particular check (and we check MySQL’s disk usage by just checking the file size on disk, not how MySQL internally manages space), those spikes went away.
So within 48 hours I found two serious issues that had flown under the radar until I had a chance to look at the data. Dating DNA now has a dashboard using Graphene monitoring critical metrics. I added tabs to it that auto rotate after mouse inactivity, and we can watch dozens of metrics easily throughout the day. I have a few more things I want to add to the system, such as sending all our munin data to StatsD/Graphite so we can compare things like CPU Usages against Online Users or other Activity. I’m also really excited to add this to Deseret News, I just didn’t feel comfortable hacking on it’s production environment over the weekend my first month on the job.
I’ll have a blog post later this week detailing in more detail how I set everything up. But until then, here are some articles that helped me in setting up StatsD, Graphite, and Graphene:
- Measure Anything, Measure Everything – This is one of the initial posts I read that really inspired me to implement this setup.
- PHP Client Code – Simple PHP Class to send data to StatsD. I’ve made a few changes to it on my own (such as adding support for gauges). I’ll throw up a gist of it later.
- How to Setup Metric Collection Using Graphite and Statsd on Ubuntu 12.04 LTS – I followed this article pretty much word for word. If you’re testing it out for the first time, you can just use screen to run the different services. But eventually you’ll want to setup carbon and statsd as services as well as setup Apache/Nginx to serve & protect your web gui.
- Installing Graphite on Ubuntu 10.04 LTS – I used this one to help setup Apache to serve my graphite web gui.
- Obfuscurity’s Blog – Tags: Graphite – Neal sent me this link, and it is awesome. Obfuscurity has a lot of posts dealing with Graphite & many tools that work with it. Definitely check it out.
Honestly, there shouldn’t be a single reason why any company couldn’t track better metrics of their application. Within an hour I was up and running, within a day I had it nice and polished, and within two days I had fixed two major performance issues. You’ll likely be seeing several posts from myself over the next weeks of even cooler things to do with Graphite. Big thanks to Etsy, Orbitz, and all the contributors to these projects!