1 year ago
#126221

s666
Why could split-brain situation happen in Resourcemanager HA using ZooKeeper
I was wondering why we need the extra ACL fencer to avoid split-brain situation in Resourcemanager HA using ZooKeeper. Because I thought when active RM thinks that it is still active with temporarily disconnection, it would be notified that its session has expired after re-establishing with the zookeeper clustesr. At this point, there would be only one ActiveStandbyElectorLock znode, created by standby RM. So there would never be split-brain situation.
As what could be seen from below: Zookeeper Programmer's Guide
Session expiration is managed by the ZooKeeper cluster itself, not by the client. When the ZK client establishes a session with the cluster it provides a "timeout" value detailed above. This value is used by the cluster to determine when the client's session expires. Expirations happens when the cluster does not hear from the client within the specified session timeout period (i.e. no heartbeat). At session expiration the cluster will delete any/all ephemeral nodes owned by that session and immediately notify any/all connected clients of the change (anyone watching those znodes). At this point the client of the expired session is still disconnected from the cluster, it will not be notified of the session expiration until/unless it is able to re-establish a connection to the cluster. The client will stay in disconnected state until the TCP connection is re-established with the cluster, at which point the watcher of the expired session will receive the "session expired" notification.
hadoop
hadoop-yarn
apache-zookeeper
high-availability
resourcemanager
0 Answers
Your Answer