Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

January 3, 2021 03:34 pm GMT

Kubernetes Service Monitoring and Observability

A common problem that I've run across throughout my career is supporting applications which have gone to production. When supporting a service in production, it is important to be able to identify where things went wrong and how to fix them before customers/end users are impacted.
Application logs are usually first place that I usually check when I'm notified of a production issue. Seems simple enough right? Open a file look for the words "exception" or "error" and backtrack from there. The only problem is that this approach to troubleshooting becomes unsustainable in today's environments where services can be comprised of decentralized, distributed, constituent applications.
How would you go about monitoring multiple services (i.e. micro-service architecture) without the increased cognitive load that comes with complicating the troubleshooting process?

Monitoring and Kubernetes

Although there are multiple ways of performing monitoring which can be dependent on which platform you're using and what tools are available for your platform (i.e. GCP, Azure, AWS), I'm going to be focusing on how to perform monitoring on a Kubernetes cluster.
Fortunately when working in the context of a Kubernetes cluster, there's a fair bit of open source projects which can aid in providing observability and monitoring for services.
For the ease of use, I'll be using Linkerd. Linkerd is a service mesh and is fairly easy to set up. Additionally, with Linkerd, you get Prometheus and Grafana included for FREE!

What is a Service Mesh?

In the context of a Kubernetes cluster, a service mesh is a collection of applications, services, and custom resources which provide observability, scalability, and resiliency for applications in your cluster. Linkerd specifically uses a pod-sidecar called Linkerd Proxy which proxies API calls to your services and provides metrics back to Linkerd. These metrics are reported to Prometheus which is a tool for viewing metrics and managing alerts. Prometheus reports metrics to Grafana which provides visualizations of these metrics in the form of charts and graphs which makes the metrics easier to digest.
Additionally, Grafana has a plugin, Loki which is designed to feed logs to Grafana allowing you to search through logs with relative ease.

The flow for application metrics will resemble the following:

Additionally, an application log flow will look like the following:

Setting up Linkerd

Assuming you have your Kubernetes cluster all set up, setting up Linkerd will be relatively straightforward. Linkerd provides a command-line interface (CLI) tool which makes managing Linkerd pretty easy.

Install the CLI following these instructions:

curl -sL https://run.linkerd.io/install | sh# Add Linkerd to pathexport PATH=$PATH:$HOME/.linkerd2/bin# Deploy Linkerd to your clusterlinkerd install | kubectl apply -f -

Enter fullscreen mode Exit fullscreen mode

From here, you'll be able to access Grafana from the Linkerd Dashboard

# Open a browser window to the linkerd dashboardlinkerd dashboard

Enter fullscreen mode Exit fullscreen mode

Kubernetes resources can be assigned to Linkerd using annotations to either a Deployment or Namespace:

apiVersion: v1kind: Namespacemetadata:  linkerd.io/inject: enabled

Enter fullscreen mode Exit fullscreen mode

apiVersion: apps/v1kind: Deploymentmetadata:  linkerd.io/inject: enabled

Enter fullscreen mode Exit fullscreen mode

Logging Operator

Logging Operator is a project created by Banzai Cloud which is powered by FluentD and FluentBit in order to perform log discovery.

We begin by deploying the logging operator onto our Kubernetes cluster. Fortunately, there's a helm chart which makes the deployment easier.

# Create a logging namespaceapiVersion: v1kind: Namespacemetadata:  name: logging

Enter fullscreen mode Exit fullscreen mode

# Add the helm repohelm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com# Install the helm charthelm upgrade --install --wait --create-namespace --namespace logging logging-operator banzaicloud-stable/logging-operator \  --set createCustomResource=false"

Enter fullscreen mode Exit fullscreen mode

Deploying Loki

Now that Logging Operator is present to set up log discovery across our various sources, it's time to set up Loki and Grafana.
Grafana community has set up a helm chart which will help in building a Loki stack using terraform:

resource "helm_release" "rel_logging_loki" {  repository = "https://grafana.github.io/helm-charts"  chart = "loki-stack"  name = "loki"  namespace = "logging"  set {    name = "pomtail.enabled"    value = "true"  }  set {    name = "loki.enabled"    value = "true"  }}

Enter fullscreen mode Exit fullscreen mode

Linkerd will need to be updated so that Grafana talks to Loki.
To preform the set up, we'll use a built-in Kubernetes utility kustomize in order to patch the new Grafana configuration into Linkerd's Grafana instance.

grafana.yml

kind: ConfigMapapiVersion: v1metadata:  name: linkerd-grafana-configdata:  datasources.yaml: |-    apiVersion: 1    datasources:    - name: prometheus      type: prometheus      access: proxy      orgId: 1      url: http://linkerd-prometheus.linkerd.svc.cluster.local:9090      isDefault: false      jsonData:        timeInterval: "5s"      version: 1      editable: true    - name: Loki      type: loki      access: proxy      editable: false      default: true      url: http://loki.logging:3100      maximumLines: "300"

Enter fullscreen mode Exit fullscreen mode

Set up a kustomization.yml:

resources:- linkerd.ymlpatchesStrategicMerge:- grafana.yml

Enter fullscreen mode Exit fullscreen mode

Now we can dump our current Linkerd config and run kubectl kustomize to patch the Grafana configuration from the previous step into Linkerd

linkerd upgrade > linkerd.ymlkubectl kustomize | kubectl apply -f -

Enter fullscreen mode Exit fullscreen mode

Setting up Logging Operator to Stream to Loki

The last step we'll have to complete is setting up the logging operator to stream to Loki.
The Logging operator specifies the following custom resources which are used to watch containers and transport logs to a target destination:

Logging - Specifies a logging source
Output - Specifies a destination for log outputs. These resources can also be established cluster-wide as a ClusterOutput
Flow - Connects Logging resources to Output resources and specifies patterns which are used to parse log entries. These resources can also be established cluster-wide as a ClusterFlow.

Specify a ClusterFlow to send logs to Loki:

apiVersion: logging.banzaicloud.io/v1beta1kind: ClusterOutputmetadata:  name: loki-output  namespace: loggingspec:  loki:    url: http://loki:3100    configure_kubernetes_labels: true    buffer:      timekey: 1m      timekey_wait: 30s      timekey_use_utc: true

Enter fullscreen mode Exit fullscreen mode

For each Pod/Deployment you want to monitor, you're going to need to set up a Logging and a Flow:

# Set up the Logging objectapiVersion: logging.banzaicloud.io/v1beta1kind: Loggingmetadata:  name: files-logger  namespace: filesspec:  fluentd: {}  fluentbit: {}  controlNamespace: logging---# Set up the Flow objectapiVersion: logging.banzaicloud.io/v1beta1kind: Flowmetadata:  name: files-flow  namespace: filesspec:  globalOutputRefs:  - logging-index-output  filters:    - tag_normaliser: {}    - parser:        remove_key_name_field: true        reserve_data: true        parse:          type: multi_format          patterns:            - format: regexp              expression: '/^(?<time>[^\]]*) \[(?<level>[^ ]*)\] (?<source>[^\":]*): (?<message>.*)$/'              time_key: logtime              time_format: '%Y-%m-%dT%H:%M:%S.%LZ'            - format: regexp              expression: '/^time="(?<time>[^\]]*)" level=(?<level>[^ ]*) msg="(?<message>[^\"]*)"/'              time_key: time              time_format: '%Y-%m-%dT%H:%M:%SZ'            - format: regexp              expression: '/^level=(?<level>[^ ]*) ts=(?<time>[^\]]*) caller=(?<source>.*) msg="(?<message>[^\"]*)"/'              time_key: time              time_format: '%Y-%m-%dT%H:%M:%S.%LZ'            - format: regexp              expression: '^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$'              time_key: time              time_format: '%d/%b/%Y:%H:%M:%S %z'  match:  - select:      labels:        app: owncloud

Enter fullscreen mode Exit fullscreen mode

The Flow object in this example uses FluentD expressions to parse log streams. The FluentD expressions can be evaluated/debugged using Fluentar

The globalOutputRefs section matches the Flow resource with an Output. The Output will route the logs to the indicated destination.

The match section indicates what Kubernetes resources the Flow resource will look for in order to obtain logs. In this example, the Flow resource will look for any pods with the label app: owncloud.

If everything is set up right, the logs will be able to be searched in Grafana:

What's Left?

So far I've covered how to set up Linkerd and stream logs to Grafana. An improvement to this set up would be deploying Prometheus AlertManager in order to create and manage alerts so that you can receive notifications through 3rd party services like Slack and PagerDuty.

References

Customizing Linkerd's Configuration - https://linkerd.io/2/tasks/customize-install/
Logging Operator Quickstart Guide - Loki- https://banzaicloud.com/docs/one-eye/logging-operator/quickstarts/loki-nginx/
https://itnext.io/part-4-operations-and-the-cloud-native-stack-in-action-bb17d9f0ff5

Outtakes

This article was roughly 3 months of experimenting in my home-lab. Instead of Graphana and Loki, I had originally attempted to preform log streaming using the Elastic Stack, EFK (Elasticsearch, FluentD, and Kibana).
I'm using Keycloak as an identity provider and had attempted an initial configuration using Kibana and Open-ID Connect (OIDC) based authentication; however, the OIDC plugin is only available on the Platinum Tier of Elastic. Disabling the xpack.security.enabled setting broke the Elastic applications.
Attempting to utilize a keycloak-kibana plugin also caused Kibana to fail to start.

With Loki and Grafana, I'm able to set up a Kubernetes Ingress with OAUTH2 authentication as a way to secure access to my services.

Original Link: https://dev.to/mikeyglitz/kubernetes-service-monitoring-and-observability-3g67

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To