This is just a quick post to summarize my thoughts on logging and monitoring. I have spent a bit of time now for my particular product at OpenTable to do logging and monitoring and I've quickly realized that it's a pretty deep topics.
I've included some resources below, mostly things that I've found helpful or seem interesting:
- StatsD / Graphite - good for business metrics and any type of summary statistic (e.g. getting alerted of production issues)
- Etsy's initial blogpost on StatsD - "Measure Anything, Measure Everything"
- Configuring StatsD for Graphite - https://github.com/etsy/statsd/blob/master/docs/graphite.md
- I think it may be a good idea to have a single "bootstrap StatsD" service which essentially triggers every StatsD event when you're releasing a new metric / product / environment. Otherwise, Graphite has no knowledge of the metric, and from what I've seen, you can't add those metrics to your dashboard in advance (e.g. you know you will eventually get a 500 HTTP server error, but you can't initially select it specifically - you can only use a wildcard * to capture its group).
- Gentle introductions on the various components of Graphite and how it works with StatsD by Digital Ocean
- The "ELK" stack: Elasticsearch / Logstash / Kibana - good for capturing detailed logs (e.g. debugging production issues)
- Overall architecture diagram below
- Brief overview of ELK (first product reviewed)
- Interesting blog posts on some of the issues of operating the ELK stack in production - There's numerous ELK SaaS as a solution that I think in general these hosted solutions will become more and more affordable over time.
Source: Digital Ocean on ELK stack
Paid solutions for using StatsD:
- There is a hosted Graphite / StatsD service that seems to have an affordable entry plan ($19/mo): https://www.hostedgraphite.com/hosted-statsd
- Scout: https://scoutapp.com/signup
Paid solution for the "ELK" stack:
- elastic, the company behind the three open-source projects of the ELK stack, has a SaaS offering for Elasticsearch (which can be tricky to operate especially as you scale): https://www.elastic.co/found/features