Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Re-election takes over 30 seconds when deleting master pod (but fast when killing the process directly) #231

Open
nabadger opened this issue Sep 19, 2018 · 2 comments

Comments

@nabadger
Copy link

nabadger commented Sep 19, 2018

Hi,

I've been struggling to understand what's causing this, so wonder if you can offer any help. This is something that I can re-reproduce across various kubernetes-elasticsearch repo's (including the operators as well). It's also something I can re-reproduce on various clusters.

I'd really like to know if this is expected behaviour or not...

My Configuration:

I've setup ES using the example on the README (this is a 3 node kubernetes cluster running v1.11.3)

kubectl create -f es-discovery-svc.yaml
kubectl create -f es-svc.yaml
kubectl create -f es-master.yaml
kubectl rollout status -f es-master.yaml

kubectl create -f es-ingest-svc.yaml
kubectl create -f es-ingest.yaml
kubectl rollout status -f es-ingest.yaml

kubectl create -f es-data.yaml
kubectl rollout status -f es-data.yaml

This all works fine and brings up the ES cluster as expected.

I monitor the state of the ES master by execing into an ingestion pod (kubectl exec ...) and running:

watch curl localhost:9200/_cat/nodes

I then kubectl exec into the pod running the ES master and run kill 1 (the java process).

This starts the master re-election process straight away, and typically a new master is elected in 2-3seconds (expected right?).

If on the otherhand, I delete the pod which is running the master (kubectl delete pod <master pod>), re-election always takes over 30 seconds.

At this point the cURL command also hangs until the new master is available. I don't think this is expected right, as it essentially means the cluster is unavailable to use.

I've also tried playing with various kubernetes pod-termination timeouts, along with the ES fault-detection timeouts, but can't seem to work around the problem.

Do you know if this is expected behaviour? If so, how do people actually upgrade the masters with a short-period of downtime? We also run ES outside of Kubernetes, and master re-election happens in under 3s (because we're essentially just doing SIGTERM on the parent process like kill 1) - hence I feel this is a Kubernetes thing.

I've added 2 sets of logs

1 - Logs with kill 1 on the ES java process

# kill process in container, failover is quick

es-master-d4d46765-v9sbw es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-v9sbw es-master
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,501][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master_left [{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,503][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}, local
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:56,529][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-v9sbw] stopping watch service, reason [no master node]
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}, master
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,566][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-krkhr] stopping watch service, reason [no master node]
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,006][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] stopped
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,006][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] closing ...
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,054][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] closed
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,612][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-v9sbw] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,667][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] detected_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [20]])
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,687][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-krkhr] failed to connect to node {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-krkhr es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-n5646][10.244.1.2:9300] connect_exception
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.244.1.2/10.244.1.2:9300
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master 	... 1 more
es-master-d4d46765-krkhr es-master Caused by: java.net.ConnectException: Connection refused
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,748][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-v9sbw] publishing cluster state with version [20] failed for the following nodes: [[{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}]]
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,750][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-v9sbw] new_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [20] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
es-master-d4d46765-krkhr es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master 	... 1 more
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,764][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-v9sbw] failed to connect to node {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-v9sbw es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-n5646][10.244.1.2:9300] connect_exception
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.244.1.2/10.244.1.2:9300
es-master-d4d46765-v9sbw es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-v9sbw es-master 	... 1 more
es-master-d4d46765-v9sbw es-master Caused by: java.net.ConnectException: Connection refused
es-master-d4d46765-v9sbw es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-v9sbw es-master 	... 1 more
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,834][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-v9sbw] zen-disco-node-failed({es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}), reason(transport disconnected), reason: removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,852][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [21]])
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,932][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-v9sbw] removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [21] source [zen-disco-node-failed({es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}), reason(transport disconnected)]])

2 - Logs with kubectl delete pod on the pod hosting the master ES instance

es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,271][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] stopping ...
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,261][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-vh5tp] master_left [{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,255][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master_left [{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,276][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-6x4bp] stopping watch service, reason [shutdown initiated]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,263][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-vh5tp] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-vh5tp es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}, master
es-master-d4d46765-vh5tp es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, local
es-master-d4d46765-vh5tp es-master
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,276][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-vh5tp] stopping watch service, reason [no master node]
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,256][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}, master
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}, local
es-master-d4d46765-krkhr es-master
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,258][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-krkhr] stopping watch service, reason [no master node]
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,847][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] stopped
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,848][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] closing ...
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,886][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] closed
+ es-master-d4d46765-fpcmg › es-master
- es-master-d4d46765-6x4bp
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:04,384][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:04,408][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] detected_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [35]])
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,184][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] initializing ...
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,398][INFO ][o.e.e.NodeEnvironment    ] [es-master-d4d46765-fpcmg] using [1] data paths, mounts [[/data (/dev/vda1)]], net usable_space [75gb], net total_space [77.3gb], types [ext4]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,400][INFO ][o.e.e.NodeEnvironment    ] [es-master-d4d46765-fpcmg] heap size [247.5mb], compressed ordinary object pointers [true]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,401][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] node name [es-master-d4d46765-fpcmg], node ID [L0BKnWY2RRmU8A3EtmY3VQ]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,402][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] version[6.3.2], pid[1], build[default/tar/053779d/2018-07-20T05:20:23.451332Z], OS[Linux/4.15.0-30-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b11]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,402][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms256m, -Xmx256m, -Des.path.home=/elasticsearch, -Des.path.conf=/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:08,808][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-vh5tp] failed to connect to node {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-vh5tp es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-6x4bp][10.244.2.5:9300] connect_exception
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:180) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.244.2.5/10.244.2.5:9300
es-master-d4d46765-vh5tp es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-vh5tp es-master 	... 1 more
es-master-d4d46765-vh5tp es-master Caused by: java.net.NoRouteToHostException: Host is unreachable
es-master-d4d46765-vh5tp es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-vh5tp es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-vh5tp es-master 	... 1 more
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:10,480][WARN ][o.e.d.c.s.Settings       ] [http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,189][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [aggs-matrix-stats]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,193][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [analysis-common]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,194][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [ingest-common]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,194][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-expression]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,195][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-mustache]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-painless]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [mapper-extras]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [parent-join]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [percolator]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [rank-eval]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [reindex]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [repository-url]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [transport-netty4]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [tribe]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-core]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-deprecation]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,199][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-graph]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-logstash]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-monitoring]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-rollup]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-security]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-sql]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-upgrade]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,202][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-watcher]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,202][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] no plugins loaded
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:21,888][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-krkhr] failed to connect to node {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-krkhr es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-6x4bp][10.244.2.5:9300] connect_exception
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.244.2.5/10.244.2.5:9300
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master 	... 1 more
es-master-d4d46765-krkhr es-master Caused by: java.net.NoRouteToHostException: Host is unreachable
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master 	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master 	... 1 more
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:22,474][INFO ][o.e.x.s.a.s.FileRolesStore] [es-master-d4d46765-fpcmg] parsed [0] roles from file [/elasticsearch/config/roles.yml]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:24,768][INFO ][o.e.d.DiscoveryModule    ] [es-master-d4d46765-fpcmg] using discovery type [zen]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,175][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] initialized
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,176][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] starting ...
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,563][INFO ][o.e.t.TransportService   ] [es-master-d4d46765-fpcmg] publish_address {10.244.2.6:9300}, bound_addresses {10.244.2.6:9300}
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,594][INFO ][o.e.b.BootstrapChecks    ] [es-master-d4d46765-fpcmg] bound or publishing to a non-loopback address, enforcing bootstrap checks
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,398][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] timed out waiting for all nodes to process published state [35] (timeout [30s], pending nodes: [{es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,399][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] publishing cluster state with version [35] failed for the following nodes: [[{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}]]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,402][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-vh5tp] new_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [35] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,649][WARN ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [40.2s] above the warn threshold of 30s
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:44,656][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [36]])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,654][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] zen-disco-node-failed({es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}), reason(transport disconnected), reason: removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,720][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-vh5tp] removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [36] source [zen-disco-node-failed({es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}), reason(transport disconnected)]])
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:56,679][WARN ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] timed out while waiting for initial discovery state - timeout: 30s
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:56,680][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] started

In the set of of logs where we kubectl delete pod, it looks like master re-election happens twice.

@nabadger nabadger changed the title Re-election takes over 30 seconds when deleting master pod (but fast when killing the process) Re-election takes over 30 seconds when deleting master pod (but fast when killing the process directly) Sep 19, 2018
@nabadger
Copy link
Author

There's some obvious information in our logs actually that helps:

[o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] timed out waiting for all nodes to process published state [35] (timeout [30s], pending nodes: [{es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}])

This 30s timeout.

We think it's related to this: https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

@nabadger
Copy link
Author

Adding an extra sleep after trapping the sigterm seems to resolve the issue for us (see above merge if you're interested).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant