In Technical

At SpaceTime we’ve been loving the ease and flexibility Kubernetes, particularly on Google Container Engine (GKE), has given us. With GKE or minikube, bringing up new Kubernetes clusters is quick and easy, so doing the same with the services that we develop to run on Kubernetes is important. We use Helm for deployments on Kubernetes of our services and the monitoring infrastructure. In the first post of this series, we built a Docker image to launch the Erlang release Presence. Presence provides an HTTP interface for clients to POST the current status of games and the players in them. This post describes the Helm chart for deploying that Presence Docker image to Kubernetes and how to use charts to install Prometheus and Grafana dashboards for monitoring the cluster.

Later entries in this series will cover alerting and distributed tracing.

If you don’t already run Kubernetes, the simplest way to get started is with minikube, and spinning up a cluster on Google Container Engine is about as simple as it gets as well.

If using minikube to try this out, first start it up and give the VM extra RAM for Prometheus:

$ minikube --memory=4096 start

Exposing Elli and BEAM Metrics from Presence

Presence uses the Erlang web server library, Elli. Including the Elli middleware elli_prometheus exposes metrics to be scraped by Prometheus on the path /metrics. It also sets up Elli event handlers to collect metrics on the incoming HTTP requests. The middleware depends on the application prometheus.erl which collects VM metrics and has functions for creating custom metrics. Lastly, Presence includes prometheus_process_collector for collecting information on the OS process, like CPU, memory, file descriptor usage, and native threads count.

The dependencies defined in rebar.config related to Prometheus:

{deps, [elli_prometheus,
        prometheus,
        prometheus_process_collector,
        ...

In presence.app.src the Erlang applications must also be included so they are started with the release:

{applications,
  [kernel,
   stdlib,
   prometheus,
   elli_prometheus,
   prometheus_process_collector,
   ...

In the case of our production release we want the smallest image possible, so debug_info is stripped from the BEAM files. The Prometheus Erlang app depends on the information that is stripped in order to load all collectors based on the behavior they implement. So instead of the default loading we must define the collectors in the application environment for the prometheus application in sys_prod.config:

{prometheus, [{collectors, [default,
                            prometheus_process_collector]}]}

And finally in presence_sup.erl Elli is configured to use the Prometheus middleware:

Config = [{mods, [{elli_prometheus, []},
                  {presence_elli_callback, []}
                 ]}
         ],
ElliOpts = [{callback, elli_middleware}, {callback_args, Config}, {port, Port}],
ChildSpecs = [#{id       => elli_server,
                start    => {elli, start_link, [ElliOpts]},
                restart  => permanent,
                shutdown => 5000}],

Creating and Deploying the Helm Chart

We created the chart, found in the helm/ directory of the presence repo, with helm create presence. Helm generates the chart with Kubernetes Deployment and Service resources. The deployment template uses values from values.yaml at the root of the chart to fill in where the image for the container can be fetched. Presence images have already been published to tsloughter on Docker Hub, so values.yaml contains:

image:
  repository: tsloughter/presence
  tag: 0.1.0
  pullPolicy: IfNotPresent

Since Presence is built on Erleans, a datastore is required to back the grains. For this, an optional dependency, conditioned on the variable postgresql.enabled, is added to requirements.yaml:

dependencies:
  - name: postgresql
    version: ^0.7.1
    condition: postgresql.enabled
    repository: https://kubernetes-charts.storage.googleapis.com/

and configured in values.yaml:

postgresql:
  postgresUser: presence
  postgresDatabase: presence
  enabled: true
  metrics:
    enabled: true

This is optional since while it is certainly useful for testing purposes in minikube, likely in production a service like Google’s Cloud SQL would be used for running Postgres. In that case the requirement would be disabled when installing with --set postgresql.enabled=false and the Cloud SQL proxy would be used instead.

The Presence release configures itself based on environment variables before booting. In the Deployment resource, environment variables to include for the container can be defined:

containers:
  - name: {{ .Chart.Name }}
    image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
    imagePullPolicy: {{ .Values.image.pullPolicy }}
    env:
    - name: NODE
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: DISCOVERY_DOMAIN
      valueFrom:
        configMapKeyRef:
          name: "{{ .Release.Name }}-presence-config"
          key: presence.discovery_domain
    - name: POSTGRES_HOST
      valueFrom:
        configMapKeyRef:
          name: "{{ .Release.Name }}-presence-config"
          key: presence.postgres_host
    - name: POSTGRES_HOST
      valueFrom:
        configMapKeyRef:
          name: "{{ .Release.Name }}-presence-config"
          key: presence.postgres_host
    - name: POSTGRES_DATABASE
      value: "{{ .Values.postgresql.postgresDatabase }}"
    - name: POSTGRES_USER
      value: "{{ .Values.postgresql.postgresUser }}"
    - name: POSTGRES_PASSWORD
      value: "{{ .Values.postgresql.postgresPassword }}"

Now when the Presence container is run by Kubernetes the release will have the necessary values for connecting to the Postgres instance and setting DISCOVERY_DOMAIN configures Erleans for discovering other Erlang nodes to cluster with. The discovery domain utilizes Kubernetes DNS addon to query for SRV records of available Presence nodes. Both the Postgres and Presence service domains are defined in the ConfigMap based on the name of the release and service

apiVersion: v1
kind: ConfigMap
metadata:
  name: "{{ .Release.Name }}-presence-config"
data:
  presence.discovery_domain: "{{ .Release.Name }}-{{ .Values.service.name }}.{{ .Release.Namespace }}.svc.cluster.local"
presence.postgres_host: "{{ .Release.Name }}-postgresql.{{ .Release.Namespace }}.svc.cluster.local"

A Service and Ingress resource expose Presence internally and externally:

apiVersion: v1
kind: Service
metadata:
  name: {{ template "fullname" . }}
  labels:
    tier: app
    release: "{{ .Release.Name }}"
    chart: "{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}"
spec:
  type: {{ .Values.service.type }}
  ports:
  - port: {{ .Values.service.externalPort }}
    targetPort: {{ .Values.service.internalPort }}
    protocol: TCP
    name: web
  selector:
    app: {{ template "fullname" . }}
    tier: app
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: {{ template "fullname" . }}
  annotations:
    ingress.kubernetes.io/rewrite-target: /
    ingress.kubernetes.io/ssl-redirect: "false"
  labels:
    app: {{ template "fullname" . }}
    chart: "{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}"
    release: "{{ .Release.Name }}"
    heritage: "{{ .Release.Service }}"
spec:
  rules:
  - http:
      paths:
      - path: /
        backend:
          serviceName: {{ template "fullname" . }}
          servicePort: {{ .Values.service.externalPort }}
$ minikube addons enable ingress
$ helm install --name=presence presence
$ curl -v -XPOST http://$(minikube ip):8080/heartbeat \
    -d '{"game":"game-1","status":{"players":["player-1","player-2"], "score":"100"}}'

Setup Prometheus

CoreOS released an Operator, Prometheus Operator, which handles creating, configuring, and managing Prometheus instances on Kubernetes. Using the resources defined by the operator, Prometheus servers can be declaratively defined to scrape the pods in the cluster.

The operator can be installed to the cluster via a Helm chart

$ helm repo add opsgoodness http://charts.opsgoodness.com
$ helm install --namespace=monitoring opsgoodness/prometheus-operator

Another chart to install is kube-prometheus. This chart installs services to expose Kubernetes and node metrics, configures a Prometheus server to scrape the metrics, and creates a 3 node Alertmanager cluster with a number of default alerts.

$ helm install --namespace=monitoring  --name=kubeprom opsgoodness/kube-prometheus

(Note: When running on minikube v0.20.0 and before, you’ll want to include --set rbacEnable=false as an option to the install commands.)

With the metrics exposed via a Service, the ServiceMonitor resource configures Prometheus to scrape the Presence endpoints by selecting on the labels that matches the Presence Service:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  name: "{{ .Release.Name }}-service-monitor-web"
  labels:
    level: application
spec:
  selector:
    matchLabels:
      release: "{{ .Release.Name }}"
      tier: app
  endpoints:
  - port: web
    interval: 10s

A ServiceMonitor is also created for the postgres-metrics Service that points to the Postgres metrics exporter sidecar:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  name: "{{ .Release.Name }}-service-monitor-pg"
  labels:
    tier: app
    release: "{{ .Release.Name }}"
spec:
  selector:
    matchLabels:
      app: "{{ .Release.Name }}-postgresql"
  endpoints:
  - port: metrics
    interval: 10s

There is a Prometheus resource in the default namespace to handle scraping Presence and Postgres. The kube-prometheus chart creates a Prometheus resource in the monitoring namespace, when installed with --namespace=monitoring, that scrapes Kubernetes runtime metrics.

externalUrl: "http://127.0.0.1:8001/api/v1/proxy/namespaces/default/services/{{ .Release.Name }}-prometheus:web/"

To check that Presence and Postgres are being scraped, open the Prometheus targets page, after running kubectl proxy: http://127.0.0.1:8001/api/v1/proxy/namespaces/default/services/presence-prometheus-svc:web/targets

prometheus-targets

Grafana Dashboards

Ilya Khaprov maintains a set of Grafana dashboards for Elli and Erlang VM metrics. These and a Postgres dashboard template are bundled in the Grafana helm chart in the directory, dashboards/. Using helm’s templating, the chart creates a ConfigMap resource in dashboards-configmap.yaml with the contents of each file in dashboards/, plus a datasource entry that tells Grafana how to connect with the Prometheus instance.

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ template "grafana.server.fullname" . }}
  labels:
    app: {{ template "grafana.fullname" . }}
    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    heritage: "{{ .Release.Service }}"
    release: "{{ .Release.Name }}"
data:
  {{ .Release.Name }}-datasource.json: |
{{ include "datasource" . | indent 4 }}

{{ (.Files.Glob "dashboards/*.json").AsConfig | indent 2 }}

To install grafana and the dashboards with the release name gf in the namespace monitoring, run:

$ helm install --name=gf --namespace=monitoring ./grafana

The server_root_url value configures Grafana in such a way that accessing it through the kubectl proxy works. It is used in the pages returned by Grafana for links and loading resources.

Assuming kubectl proxy is still running, Grafana is accessible in the browser at http://127.0.0.1:8001/api/v1/proxy/namespaces/monitoring/services/gf-grafana:80/

Navigating to the datasources page through the dropdown list in the top left will show that both Prometheus services are being used:

grafana-datasources

Back on the dashboards page the top bar has a dropdown list of all available dashboards:

grafana-dashboards

The Postgres and BEAM specific dashboards are shown below:

postgres-beam-dashboard

beam-dashboard

In the future, the plan is for Grafana to have an Operator capable of handling datasource and dashboard configuration based on Kubernetes resources found in the cluster. This would be similar to how the ServiceMonitor resource is used to configure targets for Prometheus to scrape, and would remove the need to manually include the Helm chart and modify a copy of the dashboards before installing. The Presence repo will be updated for such changes when they are available.

Next Time

Being able to view metrics is great but we don’t want to have to be watching them 24/7 to be alerted to problems. In an upcoming post, Prometheus’ AlertManager will be configured through the Operator to alert on issues related to Kubernetes, Postgres, and Presence.

Recommended Posts
Comments
pingbacks / trackbacks

Leave a Comment

solar arrayHype Cycle Reinforcement Learning