kubernetes集群中的elsaticsearch从5.6升级到6.3并开启监控

kubernetes官方的github上的例子,是用es的开源oss版为基础做的镜像,配合,这个版本中不包含xpack的功能,在es 6.x版本中,xpack部分功能可以免费获取,其中比较明显的就是监控,以及sql解释器,具体的各个版本的差异,请查看 这里,因此在本文中,我们现在用es的官方docker镜像来搭建。es作为io重度应用,将数据放在cephd等块存储中影响性能,因此,我们在本文中用的是localvolume作为持久存储。

准备工作

镜像准备

考虑到国内的网络,以及服务器访问外网可能由安全问题,可以先将镜像拉取并上传到私有的registry中

1
2
3
4
5
6
7
8
9
10
11
12
13
docker pull docker.elastic.co/elasticsearch/elasticsearch:6.3.2

docker tag docker.elastic.co/elasticsearch/elasticsearch:6.3.2 10.168.136.193:5000/elasticsearch/elasticsearch:6.3.2

docker push 10.168.136.193:5000/kibana/kibana:6.3.2



docker pull docker.elastic.co/kibana/kibana:6.3.2

docker tag docker.elastic.co/kibana/kibana:6.3.2 10.168.136.193:5000/kibana/kibana:6.3.2

docker push 10.168.136.193:5000/kibana/kibana:6.3.2

重新索引老版本的数据

在我们的系统中,es是用来存放日志文件的,对可靠性要求不高,也没有api兼容问题,如果你的业务系统比较重要,按照官网的建议请至少参考一下几点:

  • Review the breaking changes for changes that affect your application.
  • Check the deprecation log to see if you are using any deprecated features.
  • If you use custom plugins, make sure compatible versions are available.
  • Test upgrades in a dev environment before upgrading your production cluster.
  • Back up your data before upgrading. You cannot roll back to an earlier version unless you have a backup of your data.

其中提到的是,如果你的集群中,由2.x版本创建的index,则必须下5.x版本reindex后,才能升级。如果没做reindex,6.3版本会拒绝启动。

用以下命令获取所有的index的名称

1
curl -s  172.30.52.2:9200/_cat/indices |sort -k3

将需要重新索引的库的名称,写入oldindex.txt中,一行一个库,接着获取原来的mapping

1
curl -H 'Accept: application/json' "172.30.52.2:9200/logstash-2018.10.27"

只取 index名后面的json节点,存入 mappings.json文件中。在创建个reindex的json模板reindex.json

1
2
3
4
5
6
7
8
{
"source": {
"index": "old_name"
},
"dest": {
"index": "new_name"
}
}

下面是我写的重新索引的脚本,完成的是创建mapping,以及创建reindex的任务

1
2
3
4
5
6
7
8
while read i; do
echo "$i"
curl -XDELETE 172.30.52.2:9200/${i}_5.6
curl -XPUT --header 'Accept: application/json' "172.30.52.2:9200/${i}_5.6?pretty" -d @./mappings.json
sed -e "s/old_name/${i}/g" ./reindex.json | sed -e "s/new_name/${i}_5.6/g" >reindex_instance.json
curl -s -d @./reindex_instance.json -XPOST --header 'Accept: application/json' '172.30.52.2:9200/_reindex?wait_for_completion=false&
pretty' >> tasks
done <./oldindex.txt

请根据集群的压力来决定一次索引多少个,如果系统压力太大,会发生节点频繁gc,最后任务超时的问题。可以用以下脚本来检查任务的执行情况

1
2
3
4
5
6
7
for t in `cat tasks  |grep task |awk '{print $3}' |sed -e 's/\"//g'`; 
do
echo -n $t ' '
curl -s 172.30.52.2:9200/_tasks/$t|jq .completed;
curl -s 172.30.52.2:9200/_tasks/$t|jq .response.total;
curl -s 172.30.52.2:9200/_tasks/$t|jq .response.created;
done

再所有的任务都跑完以后,一定要重新检查新旧index的文档数量,是否一致。

curl -s 172.30.52.2:9200/_cat/indices |sort -k3
注意检查第七列

去过确认无误,就可以删除捞的

1
2
3
4
while read i; do
echo "$i"
curl -XDELETE 172.30.52.2:9200/${i}
done <./oldindex.txt

最后将数据刷到磁盘

1
curl  172.30.52.2:9200/*/_flush

移动数据目录

关闭所有的节点

es 5.x中,数据的存储结构是, ${path.data}/cluster-name/node0,6.x版本中,移动到了${path.data}/node0中,因此需要将目录上移一层

1
2
cd ${path.data}
mv kubernetes-logging/nodes ./

es部署

创建StatefulSet所需要的服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Elasticsearch"
spec:
ports:
- port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging

kubectl apply -f ./es-service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 3
selector:
matchLabels:
k8s-app: elasticsearch-logging
template:
metadata:
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: elasticsearch
tolerations:
- key: fuck
operator: "Exists"
effect: NoExecute
containers:
- image: 10.168.136.193:5000/elasticsearch/elasticsearch:6.3.2
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 16000m
memory: 40Gi
requests:
cpu: 4000m
memory: 30Gi
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- name: es-persistent-storage
mountPath: /data
#- name: jvm-options-volume
# mountPath: /usr/share/elasticsearch/config/jvm.options
# subPath: jvm.options
# readOnly: true
env:
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: "ES_JAVA_OPTS"
value: "-Xms30g -Xmx30g -XX:-UseConcMarkSweepGC -XX:-UseCMSInitiatingOccupancyOnly -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75 -Xlog:gc*:stdout:utctime,pid,tags"
- name: "cluster.name"
value: "kubernetes-logging"
- name: POD_NAME_WRAPPER
valueFrom:
fieldRef:
fieldPath: "metadata.name"
- name: "node.name"
value: "$(POD_NAME_WRAPPER)"
- name: "node.master"
value: "true"
- name: "node.data"
value: "true"
- name: "transport.tcp.port"
value: "9300"
- name: "http.port"
value: "9200"
- name: "path.data"
value: "/data"
- name: "network.host"
value: "0.0.0.0"
- name: "discovery.zen.minimum_master_nodes"
value: "2"
- name: "discovery.zen.ping.unicast.hosts"
value: "elasticsearch-logging-0.elasticsearch-logging,elasticsearch-logging-1.elasticsearch-logging"
- name: "processors"
valueFrom:
resourceFieldRef:
containerName: elasticsearch-logging
resource: limits.cpu
- name: "xpack.security.enabled"
value: "false"
- name: "xpack.ml.enabled"
value: "false"
- name: "xpack.monitoring.collection.enabled"
value: "true"
- name: "xpack.license.self_generated.type"
value: "basic"
#volumes:
#- name: es-persistent-storage
# hostPath:
# path: /home/dockerdata/elasticsearch
#- name: jvm-options-volume
# configMap:
# name: es-jvm-options
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: 10.168.136.193:5000/busybox:1
command: ["/bin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: es-persistent-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-storage
resources:
requests:
storage: 512Gi
selector:
matchLabels:
app: elastic
disk-type: hdd

与k8s官方配置不同,discovery.zen.minimum_master_nodes 不依赖elasticsearch_logging_discovery,而是用statefulset提供的稳定的网络名称,靠coredns解析。其他的配置,按照es官方yaml的值进行环境变量的设置。

cpu 内存的限制,以及java启动的gc相关的设置ES_JAVA_OPTS,请根据实际需要进行调整。elasticsearch.yml中的相关的值,直接用环境变量注入即可生效。

kubectl apply -f ./es-statefulset.yaml

创建完statefulset以后,按顺序创建localvolume,看到第一个docker容器启动后再创建第二个,不然可能会发生es容器id和磁盘/主机顺序不一致的情况。请根据实际情况调整目录名和ip地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elastic-hdd-0
labels:
app: elastic
disk-type: hdd
spec:
capacity:
storage: 1Ti
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /mnt/hdd0/elasticsearch-0/
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "10.174.100.6"

kibana

kibana配置参考k8s官方的配置文件即可。

激活basic版本

如果是全新搭建,则自动会启动basic版本,如果是从老版本升级上来,很可能是系统里已经由trial版本了,这个时候需要用rest切到basic版本

1
2
3
curl -XPOST  -H "Content-Type: application/json" 172.30.52.2:9200/_xpack/license/start_basic
#会提示要确认什么的,直接执行下面的
curl -XPOST -H "Content-Type: application/json" 172.30.52.2:9200/_xpack/license/start_basic?acknowledge=true