「Kubernetes」- 部署 Grafana Loki 监控

  CREATED BY JENKINSBOT

问题描述

我们采用 Promtail 来采集日志,Grafana Loki 来存储日志,对日志进行集中管理(即单个 Grafana Loki 实例);

该笔记将记录:在 Kubernetes Cluster 中,部署 Grafana Loki 的方法,以及相关问题的解决办法;

解决方案

第一步、部署 Loki 集群

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm show values grafana/loki-distributed > loki.helm-values.yaml
...(1)调整 s3 相关配置:地址为 Endpoint 域名,且域名不携带 Bucket 名称;
...(2)调整 Gateway 配置:开启 Ingress 访问;开启 Basic Auth 认证;

helm --namespace=loki --create-namespace                                       \
    upgrade --install loki grafana/loki-distributed                            \
    -f loki.helm-values.yaml                                                   \
    --create-namespace

关于 Ruler 组件:
问题描述:msg=”error running loki” err=”mkdir /etc/loki/rules: read-only file system…
解决方案:需要配置 ruler.directories 参数
[loki] msg=”error running loki” err=”mkdir /rules: read-only file” · Issue #577 · grafana/helm-charts
[loki-distributed] Ruler pod won’t start. err=”mkdir /etc/loki/rules…” · Issue #537 · grafana/helm-charts
helm-charts/charts/loki-distributed at main · grafana/helm-charts

第二步、部署 Grafana 实例

helm show values grafana/grafana > grafana.helm-values.yaml
...(1)调整 Ingress 配置

helm --namespace=loki upgrade --install loki-grafana grafana/grafana -f grafana.helm-values.yaml

kubectl get secret --namespace loki loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

# Add datasource
# Basic Auth: Username:Password
# http://loki-loki-distributed-gateway.loki.svc.cluster.local/

第三步、部署 Promtail 组件

Loki/Promtail/2.Installing and Upgrading

第四步、查询日志内容

略……

常见问题汇总

Loki Cannot connect to Loki NoSuchKey The specified key does not exist. status code: 404, request id: xxx, host id:

Amazon S3 exception: “The specified key does not exist” – Stack Overflow
解决 Amazon S3 的 404 NoSuchKey 错误

问题描述:
在 Grafana 中,添加 Datasource 保存时,出现错误:Loki Cannot connect to Loki NoSuchKey The specified key does not exist. status code: 404, request id: 629CD87A3652D93638D8FEE2, host id:

原因分析:
然后,我们查看 Querier 日志,在其中看到相关信息,似乎是 S3 相关的错误。然后我们对 S3 相关的配置进行检查,尝试配置 aws.s3forcepathstyle: false 得以解决问题;
也有可能是其他原因,06/09/2022,我们再次遇到该错误。原因是 Compactor 清理的周期过短,导致 Chunk 被请求,而无法找到日志数据;

解决方案:
配置 aws.s3forcepathstyle: false 属性;

AccessDenied: S3 API Request made to Console port

问题描述:

...
AccessDenied: S3 API Request made to Console port. S3 Requests should be sent to API port.
	status code: 403, request id: , host id:
failed to get s3 object
github.com/grafana/loki/pkg/storage/chunk/aws.(*S3ObjectClient).GetObject
	/src/loki/pkg/storage/chunk/aws/s3_storage_client.go:389
...

环境信息:MinIO 通过的 S3 存储服务;

原因分析:配置错误,应该连接 9000 端口,而不是 9001 端口(这是管理端口);

解决方案:修改配置,使用 9000 端口;

NoSuchBucket: The specified bucket does not exist

Grafana Loki/Getting started

问题描述:

...
NoSuchBucket: The specified bucket does not exist
	status code: 404, request id: 16FE7D94B41B007D, host id:
failed to get s3 object
github.com/grafana/loki/pkg/storage/ch
unk/aws.(*S3ObjectClient).GetObject
	/src/loki/pkg/storage/chunk/aws/s3_storage_client.go:389
github.com/grafana/loki/pkg/storage/stores/shipper/storage.prefixedObjectClient.GetObject
	/src/loki/pkg/storage/stores/shipper/storage/prefixed_object_client.go:25
...

环境信息:MinIO 通过的 S3 存储服务;

原因分析:需要开启 aws.s3forcepathstyle: true 属性(我们参考 Grafana Loki/Getting started 才猜测到原因);

解决方案:配置 aws.s3forcepathstyle: true 属性;

参考文献

Installation | Grafana Loki documentation
Microservices deployment with Helm | Grafana Loki documentation
Promtail/Scraping | Grafana Loki documentation
Configuration | Grafana Loki documentation