There’re two cluster manager: CMAN and GULM. CMAN is an abbreviation for Cluster Manager, GULM is Grand Unified Lock Manager. CMAN uses DLM(Distributed Lock Manager), the difference is: CMAN is a distributed cluster manager,GULM is a client-server cluster manager.
The cluster manager keeps trace of cluster quorum by monitoring the count fo cluster nodes that run cluster manager. In CMAN cluster, all cluster nodes run cluster managers; In a GULM cluster only the GULM servers run cluster manager. If more than half the nodes that run cluster manager are active, the cluster has quorum. If half the nodes that run cluster manager(or fewer) are active, the cluster does not have quorum and all cluster activity is stopped.
Cluster quorum prevents the accurence of “split-brain” condition, a condition where two instance of the same cluster are running. A split-brain condition will allow each cluster instance to access cluster resource without knowledge of the other cluster instance, resulting in corrupted cluster integrity.
In CMAN cluster, quorum is determinted by communication of heartbeats among cluster nodes via Ethernet. Optionally, quorum can be determinted by a combination of communicating heartbeats via Ethernet and through a quorum disk. For quorum via Ethernet, quorum consist of 50 percent of the node votes plus 1. For quorum via quorum disk, quorum consists of user-specified conditions.
The cluster manager keeps track of membership by monitoring hearbeat message from other cluster nodes.
If a clusterr node does not transmit a heartbeat message within a prescribed amount of time, the cluster manager removes the node from the cluster and communications to other cluster infrastructure components that the node is not a member. Again, other cluster infrastructure components determinted what actions to take upon notifications that node is no longer a cluster manager. Fencing would fense the node that is no longer a member.
Lock managerment is a common clusster infrastructure service that provides a mechanism for other cluster infrastructure component to synchoronize their access to share resources.
GFS and CLVM use locks from the lock manager. GFS uses locks from the lock manager to synchoronize access to file system metedate on shared storage. CLVM uses locks from the lock manager to synchoronize update to LVM volumes and volume groups(also on shared storage).
Fencing is the disconnecting of a node from the cluster’s shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
The cluster infrastructure performs fencing through one of the following programs according to the type of cluster manager and lock manager that is configured:
1. configured with CMAN/DLM: fenced, the fence daemon perform fencing
2. configured with GULM server: GULV performing fencing
When the cluster manager determines that a node has failed, it communicates to other cluster infrastructure components that the node has failed. The fencing program, when notificated that failed, fences the failed node, other cluster infrastructure components determine waht actions to take, that is, they perform any recovery that needs to done. For example, DLM and GFS(in cluster configured with CMAN/DLM),when notified of a node failure, suspend activity until they detect that the fencing program has completed fencing the failed node. Upon confirmation that the failed node is fenced, DLM and GFS perform recovery.DLM releases locks of the failed node; GFS recovers the journal of the failed node.
Fencing method: Power fencing, Fibre channnel switch fencing, GNBD fencing, Other fencing.
Specify a fencing method consist of editing a cluster configuration file to assign a fencing method name, the fence agent and the fencing device for each node in the cluster.
When you configure a node for multiple fencing methods, the fencing methods are cascaded from one fencing method to another according to the order of the fencing methods specified in the cluster configuration file.
1. If a node fails, it is fenced using the first fencing method specified in the cluster configuration file for that nodes.
2. If the first fencing method is not successful, the next fencing method specified for that node is used.
3. If none of the fencing methods is successful, then fencing starts again with the first fencing method specified and continutes looping through the fencing method in the order specified in the cluster configuration file until the node has been fenced.
CCS, Clsuter Configuration System. “/etc/cluster/cluster.conf” is an XML file including cluster name and cluster setting.
A failover domain is a subset of cluster nodes that are run on only one cluster eligate to run a particular cluster service. A cluster service can run on only one cluster node at a time to maintain data integrity.Failover domain, specify failover priority. The priortiy level determines the failover order, determining which node that a cluster service should faile over to. If do not specify failover priority, a cluster service can failure to any node in its failover domain.
Also, you can specify if a cluster service is restricted to run only on nodes of its associatied failover domain.
GFS is a cluster file system, allows a cluster of nodes to simultaneously access a block device that is shared among the nodes. GFS is a nativa filesystem that interfaces directly with the VFS layer of the linux kernel filesystem interface. GFS employs distributed metadata and multiple journals for optimal operation in a cluster. To maintain filesystem integrity, GFS uses a lock manager to coordinate I/O. When one node changed data on a GFS filesystem,that change is immediately visible to other cluster nodes using the filesystem.
The cluster logical volumn manager(CLVM) provides a cluster-wide version of LVM2. CLVM provides the same capabilities as LVM2 on a single node, but make the volumns available to all nodes in a RedHat cluster. The key component in CLVM is clvmd. clvmd is a daemon that provides clustering extensions to the standard LVM2 tool set and allows LVM2 commands to manager shared storage.clvmd runs in each cluster node and distributes LVM metadata updates in a cluster, thereby presenting each cluster node with the same view of the logical volumes. See /etc/lvm/lvm.conf.
GNBD(Global Network Block Device) provides block-device access to RedHat GFS over TCP/IP. GNBD is similar in concept to NBD: GNBD is useful when need for more robust technolies:FC or singe-initiator SCSI.
GNBD consists of two major components:GNBD clients and server. Client runs in a node with GFS and imports a block device exported by GNBD server. Server runs in another node and exports block-level storage from its local storage(either directly attached storage or SAN storage). Multiple GNBD clients can access a device exported by a GNBD server, thus making a GNBD suitable for use a group of nodes running GFS.
There are many ways to synchoronize data among real servers.For example, you can use shell script to post updated web pages to the real server simultaneously. Also, you can use program such as rsync to replicate changed data across all nodes at a set interval.