grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

目錄

(1).關於prometheus-operator

(2).部署kube-prometheus

1.下載最新版本

2.容器化部署

(3).kube-prometheus主要組件概述

(4).生產級改造

1.總述

2.維護kube-prometheus副本

3.NodeSelector改造

4.grafana改造

5.持久化改造

6.釘釘報警

6.1.創建釘釘報警機器人

6.2.配置釘釘報警

7.Ingress代理

8.工程規劃

(5).總結

(6).相關文章


(1).關於prometheus-operator和kube-prometheus


在最新版本中,kubernetes的prometheus-operator部署內容已經從prometheus-operator的github工程中拆分出獨立工程kube-prometheus。


kube-prometheus即是通過operator方式部署的kubernetes集群監控,所以我們直接容器化部署kube-prometheus即可。


(2).部署kube-prometheus


1.下載最新版本


老版本的prometheus-operator自帶kube-prometheus,位於contrib/kube-prometheus/manifests,但是0.34版本中kube-prometheus已經獨立成單獨項目:


grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


進入kube-prometheus的release頁面:

https://github.com/coreos/kube-prometheus/releases

下載kube-prometheus最新版本:v0.3.0(本文時間)

wget https://github.com/coreos/kube-prometheus/archive/v0.3.0.tar.gz


2.容器化部署


進入kube-prometheus根目錄,我們執行kustomization.yaml中所有的配置文件即可,kustomization.yaml文件中包含了所有相關的容器化配置文件:

<code>apiVersion: kustomize.config.k8s.io/v1beta1

kind: Kustomization

resources:

- ./manifests/alertmanager-alertmanager.yaml

- ./manifests/alertmanager-secret.yaml

- ./manifests/alertmanager-service.yaml

- ./manifests/alertmanager-serviceAccount.yaml

- ./manifests/alertmanager-serviceMonitor.yaml

- ./manifests/grafana-dashboardDatasources.yaml

- ./manifests/grafana-dashboardDefinitions.yaml


- ./manifests/grafana-dashboardSources.yaml

- ./manifests/grafana-deployment.yaml

- ./manifests/grafana-service.yaml

- ./manifests/grafana-serviceAccount.yaml

- ./manifests/grafana-serviceMonitor.yaml

- ./manifests/kube-state-metrics-clusterRole.yaml

- ./manifests/kube-state-metrics-clusterRoleBinding.yaml

- ./manifests/kube-state-metrics-deployment.yaml

- ./manifests/kube-state-metrics-role.yaml

- ./manifests/kube-state-metrics-roleBinding.yaml

- ./manifests/kube-state-metrics-service.yaml

- ./manifests/kube-state-metrics-serviceAccount.yaml

- ./manifests/kube-state-metrics-serviceMonitor.yaml

- ./manifests/node-exporter-clusterRole.yaml

- ./manifests/node-exporter-clusterRoleBinding.yaml

- ./manifests/node-exporter-daemonset.yaml

- ./manifests/node-exporter-service.yaml

- ./manifests/node-exporter-serviceAccount.yaml

- ./manifests/node-exporter-serviceMonitor.yaml

- ./manifests/prometheus-adapter-apiService.yaml

- ./manifests/prometheus-adapter-clusterRole.yaml

- ./manifests/prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml

- ./manifests/prometheus-adapter-clusterRoleBinding.yaml

- ./manifests/prometheus-adapter-clusterRoleBindingDelegator.yaml

- ./manifests/prometheus-adapter-clusterRoleServerResources.yaml


- ./manifests/prometheus-adapter-configMap.yaml

- ./manifests/prometheus-adapter-deployment.yaml

- ./manifests/prometheus-adapter-roleBindingAuthReader.yaml

- ./manifests/prometheus-adapter-service.yaml

- ./manifests/prometheus-adapter-serviceAccount.yaml

- ./manifests/prometheus-clusterRole.yaml

- ./manifests/prometheus-clusterRoleBinding.yaml

- ./manifests/prometheus-operator-serviceMonitor.yaml

- ./manifests/prometheus-prometheus.yaml

- ./manifests/prometheus-roleBindingConfig.yaml

- ./manifests/prometheus-roleBindingSpecificNamespaces.yaml

- ./manifests/prometheus-roleConfig.yaml

- ./manifests/prometheus-roleSpecificNamespaces.yaml

- ./manifests/prometheus-rules.yaml

- ./manifests/prometheus-service.yaml

- ./manifests/prometheus-serviceAccount.yaml

- ./manifests/prometheus-serviceMonitor.yaml

- ./manifests/prometheus-serviceMonitorApiserver.yaml

- ./manifests/prometheus-serviceMonitorCoreDNS.yaml

- ./manifests/prometheus-serviceMonitorKubeControllerManager.yaml

- ./manifests/prometheus-serviceMonitorKubeScheduler.yaml

- ./manifests/prometheus-serviceMonitorKubelet.yaml

- ./manifests/setup/0namespace-namespace.yaml

- ./manifests/setup/prometheus-operator-0alertmanagerCustomResourceDefinition.yaml

- ./manifests/setup/prometheus-operator-0podmonitorCustomResourceDefinition.yaml


- ./manifests/setup/prometheus-operator-0prometheusCustomResourceDefinition.yaml

- ./manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml

- ./manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

- ./manifests/setup/prometheus-operator-clusterRole.yaml

- ./manifests/setup/prometheus-operator-clusterRoleBinding.yaml

- ./manifests/setup/prometheus-operator-deployment.yaml

- ./manifests/setup/prometheus-operator-service.yaml

- ./manifests/setup/prometheus-operator-serviceAccount.yaml/<code>


順次執行下述命令即可:

# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources

kubectl create -f manifests/setup

再執行:kubectl create -f manifests

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


可以看到有很多pending狀態,我們descirbe看一下原因:

kubectl describe -n monitoring pod prometheus-k8s-0

可以看到原因是沒有找到符合條件的node節點,很有可能是nodeSelector指定的label和我單集群的node的label不一致。

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

查證:

prometheus-k8s-0的nodeSelector是:kubernetes.io/os: linux

查看node的label:

<code>kubectl get nodes future --show-labels

NAME STATUS ROLES AGE VERSION LABELS

future Ready master 107d v1.13.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=future,node-role.kubernetes.io/master=

[root@future kube-prometheus-0.3.0]# kubectl get nodes future --show-labels | grep -i linux

future Ready master 107d v1.13.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=future,node-role.kubernetes.io/master=/<code>


可以看到node沒有名為"kubernetes.io/os"的label,我們需要打個label:

kubectl label nodes future kubernetes.io/os=linux

當然你也可以修改配置文件,這個在生產是要注意的。


再次查看pod,可以看到全部OK。


(3).kube-prometheus主要組件概述


grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

從這裡也可以看到,當集群規模逐步增大時,grafana/prometheus會逐步增多,命名的可讀性就會變得非常重要。


(4).生產級改造


1.總述


官方/開源版本用於生產還是有些問題需要處理的。

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

可能還有,想到再續(應該還是有的,一時想不到了)。


2.維護kube-prometheus副本


因為要適配生產,需要做一些改動,必須有一個地方存放且記錄歷史修改。

如筆者備份為:

https://github.com/hepyu/k8s-app-config/tree/master/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests


3.NodeSelector改造


grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

給node增加label:

kubectl label nodes future node.type=monitoring


然後再重新執行上述文件,OK。


4.grafana改造


默認不支持餅圖,需要裝載餅圖的插件。


修改文件:

manifests/grafana-deployment.yaml


增加餅圖插件,下述黑色部分:

<code>resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
env:
- name: GF_INSTALL_PLUGINS
value: "grafana-piechart-panel"/<code>


然後重新部署grafana即可。


5.持久化改造


修改prometheus-k8s,增加pv存儲,本文由於是作者自己ECS,所以使用local PV,生產環境建議使用nas雲存儲。

初始化prometheus-k8s-pv,配置文件位於:

k8s-app-config/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests/custom_by_hepy

注意建立對應的本地目錄,並chmod配置權限。


然後需要修改文件k8s-app-config/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests/prometheus-prometheus.yaml,增加下述加粗部分:

<code>spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
#增加下述配置
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-k8s

resources:
requests:
storage: 100Gi/<code>

然後重新執行prometheus-prometheus.yaml。


PVC驗證:

<code>root@future manifests]# kubectl get pvc -n monitoring | grep -i k8s

prometheus-k8s-db-prometheus-k8s-0 Bound prometheus-k8s-1 100Gi RWO prometheus-k8s 6m30s

prometheus-k8s-db-prometheus-k8s-1 Bound prometheus-k8s-0 100Gi RWO prometheus-k8s/<code>


數據目錄驗證:

ll /datavip/k8s-data/prometheus-k8s-0/

total 4

drwxrwsrwx 3 root 2000 4096 Dec 18 18:21 prometheus-db


特別注意:

生產環境注意使用獨立的雲存儲空間,防止共用互相影響。


6.釘釘報警


6.1.創建釘釘報警機器人


先建一個釘釘普通群,然後點擊右上角的群設置:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

點擊智能群助手:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

選擇添加一個機器人:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

機器人類型選擇:自定義(通過Webhook接入自定義服務)

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s

完成:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


6.2.配置釘釘報警


kube-prometheus默認是將alertmanager的報警配置放在secret中(我很不習慣),我們也暫且遵循這個做法。


創建釘釘告警插件:dingtalk-webhook.yaml;

位於:

k8s-app-config/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests/custom_by_hepy


內容如下,主要是將釘釘報警地址配到K8S中:

<code>---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
run: dingtalk
name: webhook-dingtalk
namespace: monitoring
spec:
replicas: 1
template:
metadata:
labels:
run: dingtalk
spec:

containers:
- name: dingtalk
image: timonwong/prometheus-webhook-dingtalk:v0.3.0
imagePullPolicy: IfNotPresent
# 設置釘釘群聊自定義機器人後,使用實際 access_token 替換下面 xxxxxx部分
args:
- --ding.profile=default-webhook-dingtalk=https://oapi.dingtalk.com/robot/send?access_token=98f5b3db00fe696046c21a6eded40a94886f5e1a022e84a5d53aed371f93fa5e
ports:
- containerPort: 8060
protocol: TCP

---
apiVersion: v1
kind: Service
metadata:
labels:
run: dingtalk
name: webhook-dingtalk
namespace: monitoring
spec:
ports:
- port: 8060
protocol: TCP
targetPort: 8060
selector:
run: dingtalk
sessionAffinity: None/<code>


創建告警接收器alertmanager.yaml,位於相同目錄,內容如下:

<code>global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: webhook
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/default-webhook-dingtalk/send'
send_resolved: true/<code>


進入目錄:

k8s-app-config/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests

執行命令,部署釘釘插件:

kubectl apply -f custom_by_hepy/dingtalk-webhook.yaml


執行命令,替換原有的alertmanager-secret:

kubectl delete secret alertmanager-main -n monitoring

kubectl create secret generic alertmanager-main --from-file=custom_by_hepy/alertmanager.yaml -n monitoring


至此,完成釘釘插件集成。


下圖為釘釘報警樣例:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


7.Ingress代理


代理grafana,prometheus, alertmanager。


進入目錄:

k8s-app-config/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests

執行命令部署ingress-grafana代理:


kubectl apply -f custom_by_hepy/grafana-ingress.yaml

kubectl apply -f custom_by_hepy/prometheus-k8s-ingress.yaml


本地配置host。


訪問grafana:

http://monitor-kubernetes.inc-inc.com:30834/


grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


我們隨便選一個:Nodes

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


每個dashboard含義本文暫不做詳述,後續另開。


訪問prometheus:

http://prometheus-k8s.inc-inc.com:30834/graph


grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


查看告警信息:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


查看監控的對象,如果懷疑有那個資源沒有被監控到,來這裡查證:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


8.工程規劃


對於規模較大的kubernetes集群,需要在工程上進行拓撲規劃,尤其是命名規範(通過pod名稱能夠準確閱讀出"Who, What, Why, When, Where",這就要求儘量使用statefulset)。


規劃必要性在於,不同的業務線有不同的grafana/prometheus,沒有規劃非常容易亂。


本文不討論;但會涉及其中的一個點:


即,將kubernetes監控的dashboard統一到業務的grafana裡,可以讓所有相關的技術人員看到集群的情況,這點很重要,所有開發是有必要從潛意識開始逐步適應雲原生體系。


方法是將每個dashboard的json配置文件考出來作為單獨的文件,利用grafana的provisioning機制進行load。

本文提供一個摘錄好的dashboard文件集(基於kube-prometheus-v0.3.0版本),位於:

https://github.com/hepyu/k8s-app-config/tree/master/product/standard/grafana-prometheus-pro/grafana/provisioning/dashboards/kubernetes


效果如下:

grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


詳情以及體驗/實操請參見:


9.鏡像本地化


這個是顯然是必須處理的,必須將相關的docker鏡像放到自己公司的鏡像倉庫。具體方式參見文章:


(5).總結


本文提供一個可用於生產的kube-prometheus的容器化配置(v0.30.0版本),位於:

https://github.com/hepyu/k8s-app-config/tree/master/product/standard/kube-prometheus-pro/kube-prometheus-pro-0.3.0/manifests

(使用時注意將PV改為雲存儲)


包含了本文所涉及的生產級改造。


(6).相關文章



grafana&prometheus生產容器化監控-4:kubePrometheus監控k8s


分享到:


相關文章: