| 1 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>ctdb</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry"><a name="ctdb.7"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>ctdb — Clustered TDB</p></div><div class="refsect1"><a name="idp53709056"></a><h2>DESCRIPTION</h2><p>
|
|---|
| 2 | CTDB is a clustered database component in clustered Samba that
|
|---|
| 3 | provides a high-availability load-sharing CIFS server cluster.
|
|---|
| 4 | </p><p>
|
|---|
| 5 | The main functions of CTDB are:
|
|---|
| 6 | </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|---|
| 7 | Provide a clustered version of the TDB database with automatic
|
|---|
| 8 | rebuild/recovery of the databases upon node failures.
|
|---|
| 9 | </p></li><li class="listitem"><p>
|
|---|
| 10 | Monitor nodes in the cluster and services running on each node.
|
|---|
| 11 | </p></li><li class="listitem"><p>
|
|---|
| 12 | Manage a pool of public IP addresses that are used to provide
|
|---|
| 13 | services to clients. Alternatively, CTDB can be used with
|
|---|
| 14 | LVS.
|
|---|
| 15 | </p></li></ul></div><p>
|
|---|
| 16 | Combined with a cluster filesystem CTDB provides a full
|
|---|
| 17 | high-availablity (HA) environment for services such as clustered
|
|---|
| 18 | Samba, NFS and other services.
|
|---|
| 19 | </p></div><div class="refsect1"><a name="idp52048160"></a><h2>ANATOMY OF A CTDB CLUSTER</h2><p>
|
|---|
| 20 | A CTDB cluster is a collection of nodes with 2 or more network
|
|---|
| 21 | interfaces. All nodes provide network (usually file/NAS) services
|
|---|
| 22 | to clients. Data served by file services is stored on shared
|
|---|
| 23 | storage (usually a cluster filesystem) that is accessible by all
|
|---|
| 24 | nodes.
|
|---|
| 25 | </p><p>
|
|---|
| 26 | CTDB provides an "all active" cluster, where services are load
|
|---|
| 27 | balanced across all nodes.
|
|---|
| 28 | </p></div><div class="refsect1"><a name="idp50534928"></a><h2>Recovery Lock</h2><p>
|
|---|
| 29 | CTDB uses a <span class="emphasis"><em>recovery lock</em></span> to avoid a
|
|---|
| 30 | <span class="emphasis"><em>split brain</em></span>, where a cluster becomes
|
|---|
| 31 | partitioned and each partition attempts to operate
|
|---|
| 32 | independently. Issues that can result from a split brain
|
|---|
| 33 | include file data corruption, because file locking metadata may
|
|---|
| 34 | not be tracked correctly.
|
|---|
| 35 | </p><p>
|
|---|
| 36 | CTDB uses a <span class="emphasis"><em>cluster leader and follower</em></span>
|
|---|
| 37 | model of cluster management. All nodes in a cluster elect one
|
|---|
| 38 | node to be the leader. The leader node coordinates privileged
|
|---|
| 39 | operations such as database recovery and IP address failover.
|
|---|
| 40 | CTDB refers to the leader node as the <span class="emphasis"><em>recovery
|
|---|
| 41 | master</em></span>. This node takes and holds the recovery lock
|
|---|
| 42 | to assert its privileged role in the cluster.
|
|---|
| 43 | </p><p>
|
|---|
| 44 | The recovery lock is implemented using a file residing in shared
|
|---|
| 45 | storage (usually) on a cluster filesystem. To support a
|
|---|
| 46 | recovery lock the cluster filesystem must support lock
|
|---|
| 47 | coherence. See
|
|---|
| 48 | <span class="citerefentry"><span class="refentrytitle">ping_pong</span>(1)</span> for more details.
|
|---|
| 49 | </p><p>
|
|---|
| 50 | If a cluster becomes partitioned (for example, due to a
|
|---|
| 51 | communication failure) and a different recovery master is
|
|---|
| 52 | elected by the nodes in each partition, then only one of these
|
|---|
| 53 | recovery masters will be able to take the recovery lock. The
|
|---|
| 54 | recovery master in the "losing" partition will not be able to
|
|---|
| 55 | take the recovery lock and will be excluded from the cluster.
|
|---|
| 56 | The nodes in the "losing" partition will elect each node in turn
|
|---|
| 57 | as their recovery master so eventually all the nodes in that
|
|---|
| 58 | partition will be excluded.
|
|---|
| 59 | </p><p>
|
|---|
| 60 | CTDB does sanity checks to ensure that the recovery lock is held
|
|---|
| 61 | as expected.
|
|---|
| 62 | </p><p>
|
|---|
| 63 | CTDB can run without a recovery lock but this is not recommended
|
|---|
| 64 | as there will be no protection from split brains.
|
|---|
| 65 | </p></div><div class="refsect1"><a name="idp53056000"></a><h2>Private vs Public addresses</h2><p>
|
|---|
| 66 | Each node in a CTDB cluster has multiple IP addresses assigned
|
|---|
| 67 | to it:
|
|---|
| 68 |
|
|---|
| 69 | </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
|
|---|
| 70 | A single private IP address that is used for communication
|
|---|
| 71 | between nodes.
|
|---|
| 72 | </p></li><li class="listitem"><p>
|
|---|
| 73 | One or more public IP addresses that are used to provide
|
|---|
| 74 | NAS or other services.
|
|---|
| 75 | </p></li></ul></div><p>
|
|---|
| 76 | </p><div class="refsect2"><a name="idp49235808"></a><h3>Private address</h3><p>
|
|---|
| 77 | Each node is configured with a unique, permanently assigned
|
|---|
| 78 | private address. This address is configured by the operating
|
|---|
| 79 | system. This address uniquely identifies a physical node in
|
|---|
| 80 | the cluster and is the address that CTDB daemons will use to
|
|---|
| 81 | communicate with the CTDB daemons on other nodes.
|
|---|
| 82 | </p><p>
|
|---|
| 83 | Private addresses are listed in the file specified by the
|
|---|
| 84 | <code class="varname">CTDB_NODES</code> configuration variable (see
|
|---|
| 85 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span>, default
|
|---|
| 86 | <code class="filename">/usr/local/etc/ctdb/nodes</code>). This file contains the
|
|---|
| 87 | list of private addresses for all nodes in the cluster, one
|
|---|
| 88 | per line. This file must be the same on all nodes in the
|
|---|
| 89 | cluster.
|
|---|
| 90 | </p><p>
|
|---|
| 91 | Private addresses should not be used by clients to connect to
|
|---|
| 92 | services provided by the cluster.
|
|---|
| 93 | </p><p>
|
|---|
| 94 | It is strongly recommended that the private addresses are
|
|---|
| 95 | configured on a private network that is separate from client
|
|---|
| 96 | networks.
|
|---|
| 97 | </p><p>
|
|---|
| 98 | Example <code class="filename">/usr/local/etc/ctdb/nodes</code> for a four node
|
|---|
| 99 | cluster:
|
|---|
| 100 | </p><pre class="screen">
|
|---|
| 101 | 192.168.1.1
|
|---|
| 102 | 192.168.1.2
|
|---|
| 103 | 192.168.1.3
|
|---|
| 104 | 192.168.1.4
|
|---|
| 105 | </pre></div><div class="refsect2"><a name="idp50980688"></a><h3>Public addresses</h3><p>
|
|---|
| 106 | Public addresses are used to provide services to clients.
|
|---|
| 107 | Public addresses are not configured at the operating system
|
|---|
| 108 | level and are not permanently associated with a particular
|
|---|
| 109 | node. Instead, they are managed by CTDB and are assigned to
|
|---|
| 110 | interfaces on physical nodes at runtime.
|
|---|
| 111 | </p><p>
|
|---|
| 112 | The CTDB cluster will assign/reassign these public addresses
|
|---|
| 113 | across the available healthy nodes in the cluster. When one
|
|---|
| 114 | node fails, its public addresses will be taken over by one or
|
|---|
| 115 | more other nodes in the cluster. This ensures that services
|
|---|
| 116 | provided by all public addresses are always available to
|
|---|
| 117 | clients, as long as there are nodes available capable of
|
|---|
| 118 | hosting this address.
|
|---|
| 119 | </p><p>
|
|---|
| 120 | The public address configuration is stored in a file on each
|
|---|
| 121 | node specified by the <code class="varname">CTDB_PUBLIC_ADDRESSES</code>
|
|---|
| 122 | configuration variable (see
|
|---|
| 123 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span>, recommended
|
|---|
| 124 | <code class="filename">/usr/local/etc/ctdb/public_addresses</code>). This file
|
|---|
| 125 | contains a list of the public addresses that the node is
|
|---|
| 126 | capable of hosting, one per line. Each entry also contains
|
|---|
| 127 | the netmask and the interface to which the address should be
|
|---|
| 128 | assigned.
|
|---|
| 129 | </p><p>
|
|---|
| 130 | Example <code class="filename">/usr/local/etc/ctdb/public_addresses</code> for a
|
|---|
| 131 | node that can host 4 public addresses, on 2 different
|
|---|
| 132 | interfaces:
|
|---|
| 133 | </p><pre class="screen">
|
|---|
| 134 | 10.1.1.1/24 eth1
|
|---|
| 135 | 10.1.1.2/24 eth1
|
|---|
| 136 | 10.1.2.1/24 eth2
|
|---|
| 137 | 10.1.2.2/24 eth2
|
|---|
| 138 | </pre><p>
|
|---|
| 139 | In many cases the public addresses file will be the same on
|
|---|
| 140 | all nodes. However, it is possible to use different public
|
|---|
| 141 | address configurations on different nodes.
|
|---|
| 142 | </p><p>
|
|---|
| 143 | Example: 4 nodes partitioned into two subgroups:
|
|---|
| 144 | </p><pre class="screen">
|
|---|
| 145 | Node 0:/usr/local/etc/ctdb/public_addresses
|
|---|
| 146 | 10.1.1.1/24 eth1
|
|---|
| 147 | 10.1.1.2/24 eth1
|
|---|
| 148 |
|
|---|
| 149 | Node 1:/usr/local/etc/ctdb/public_addresses
|
|---|
| 150 | 10.1.1.1/24 eth1
|
|---|
| 151 | 10.1.1.2/24 eth1
|
|---|
| 152 |
|
|---|
| 153 | Node 2:/usr/local/etc/ctdb/public_addresses
|
|---|
| 154 | 10.1.2.1/24 eth2
|
|---|
| 155 | 10.1.2.2/24 eth2
|
|---|
| 156 |
|
|---|
| 157 | Node 3:/usr/local/etc/ctdb/public_addresses
|
|---|
| 158 | 10.1.2.1/24 eth2
|
|---|
| 159 | 10.1.2.2/24 eth2
|
|---|
| 160 | </pre><p>
|
|---|
| 161 | In this example nodes 0 and 1 host two public addresses on the
|
|---|
| 162 | 10.1.1.x network while nodes 2 and 3 host two public addresses
|
|---|
| 163 | for the 10.1.2.x network.
|
|---|
| 164 | </p><p>
|
|---|
| 165 | Public address 10.1.1.1 can be hosted by either of nodes 0 or
|
|---|
| 166 | 1 and will be available to clients as long as at least one of
|
|---|
| 167 | these two nodes are available.
|
|---|
| 168 | </p><p>
|
|---|
| 169 | If both nodes 0 and 1 become unavailable then public address
|
|---|
| 170 | 10.1.1.1 also becomes unavailable. 10.1.1.1 can not be failed
|
|---|
| 171 | over to nodes 2 or 3 since these nodes do not have this public
|
|---|
| 172 | address configured.
|
|---|
| 173 | </p><p>
|
|---|
| 174 | The <span class="command"><strong>ctdb ip</strong></span> command can be used to view the
|
|---|
| 175 | current assignment of public addresses to physical nodes.
|
|---|
| 176 | </p></div></div><div class="refsect1"><a name="idp49115104"></a><h2>Node status</h2><p>
|
|---|
| 177 | The current status of each node in the cluster can be viewed by the
|
|---|
| 178 | <span class="command"><strong>ctdb status</strong></span> command.
|
|---|
| 179 | </p><p>
|
|---|
| 180 | A node can be in one of the following states:
|
|---|
| 181 | </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">OK</span></dt><dd><p>
|
|---|
| 182 | This node is healthy and fully functional. It hosts public
|
|---|
| 183 | addresses to provide services.
|
|---|
| 184 | </p></dd><dt><span class="term">DISCONNECTED</span></dt><dd><p>
|
|---|
| 185 | This node is not reachable by other nodes via the private
|
|---|
| 186 | network. It is not currently participating in the cluster.
|
|---|
| 187 | It <span class="emphasis"><em>does not</em></span> host public addresses to
|
|---|
| 188 | provide services. It might be shut down.
|
|---|
| 189 | </p></dd><dt><span class="term">DISABLED</span></dt><dd><p>
|
|---|
| 190 | This node has been administratively disabled. This node is
|
|---|
| 191 | partially functional and participates in the cluster.
|
|---|
| 192 | However, it <span class="emphasis"><em>does not</em></span> host public
|
|---|
| 193 | addresses to provide services.
|
|---|
| 194 | </p></dd><dt><span class="term">UNHEALTHY</span></dt><dd><p>
|
|---|
| 195 | A service provided by this node has failed a health check
|
|---|
| 196 | and should be investigated. This node is partially
|
|---|
| 197 | functional and participates in the cluster. However, it
|
|---|
| 198 | <span class="emphasis"><em>does not</em></span> host public addresses to
|
|---|
| 199 | provide services. Unhealthy nodes should be investigated
|
|---|
| 200 | and may require an administrative action to rectify.
|
|---|
| 201 | </p></dd><dt><span class="term">BANNED</span></dt><dd><p>
|
|---|
| 202 | CTDB is not behaving as designed on this node. For example,
|
|---|
| 203 | it may have failed too many recovery attempts. Such nodes
|
|---|
| 204 | are banned from participating in the cluster for a
|
|---|
| 205 | configurable time period before they attempt to rejoin the
|
|---|
| 206 | cluster. A banned node <span class="emphasis"><em>does not</em></span> host
|
|---|
| 207 | public addresses to provide services. All banned nodes
|
|---|
| 208 | should be investigated and may require an administrative
|
|---|
| 209 | action to rectify.
|
|---|
| 210 | </p></dd><dt><span class="term">STOPPED</span></dt><dd><p>
|
|---|
| 211 | This node has been administratively exclude from the
|
|---|
| 212 | cluster. A stopped node does no participate in the cluster
|
|---|
| 213 | and <span class="emphasis"><em>does not</em></span> host public addresses to
|
|---|
| 214 | provide services. This state can be used while performing
|
|---|
| 215 | maintenance on a node.
|
|---|
| 216 | </p></dd><dt><span class="term">PARTIALLYONLINE</span></dt><dd><p>
|
|---|
| 217 | A node that is partially online participates in a cluster
|
|---|
| 218 | like a healthy (OK) node. Some interfaces to serve public
|
|---|
| 219 | addresses are down, but at least one interface is up. See
|
|---|
| 220 | also <span class="command"><strong>ctdb ifaces</strong></span>.
|
|---|
| 221 | </p></dd></dl></div></div><div class="refsect1"><a name="idp49138160"></a><h2>CAPABILITIES</h2><p>
|
|---|
| 222 | Cluster nodes can have several different capabilities enabled.
|
|---|
| 223 | These are listed below.
|
|---|
| 224 | </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">RECMASTER</span></dt><dd><p>
|
|---|
| 225 | Indicates that a node can become the CTDB cluster recovery
|
|---|
| 226 | master. The current recovery master is decided via an
|
|---|
| 227 | election held by all active nodes with this capability.
|
|---|
| 228 | </p><p>
|
|---|
| 229 | Default is YES.
|
|---|
| 230 | </p></dd><dt><span class="term">LMASTER</span></dt><dd><p>
|
|---|
| 231 | Indicates that a node can be the location master (LMASTER)
|
|---|
| 232 | for database records. The LMASTER always knows which node
|
|---|
| 233 | has the latest copy of a record in a volatile database.
|
|---|
| 234 | </p><p>
|
|---|
| 235 | Default is YES.
|
|---|
| 236 | </p></dd><dt><span class="term">LVS</span></dt><dd><p>
|
|---|
| 237 | Indicates that a node is configued in Linux Virtual Server
|
|---|
| 238 | (LVS) mode. In this mode the entire CTDB cluster uses one
|
|---|
| 239 | single public address for the entire cluster instead of
|
|---|
| 240 | using multiple public addresses in failover mode. This is
|
|---|
| 241 | an alternative to using a load-balancing layer-4 switch.
|
|---|
| 242 | See the <em class="citetitle">LVS</em> section for more
|
|---|
| 243 | details.
|
|---|
| 244 | </p></dd></dl></div><p>
|
|---|
| 245 | The RECMASTER and LMASTER capabilities can be disabled when CTDB
|
|---|
| 246 | is used to create a cluster spanning across WAN links. In this
|
|---|
| 247 | case CTDB acts as a WAN accelerator.
|
|---|
| 248 | </p></div><div class="refsect1"><a name="idp49147232"></a><h2>LVS</h2><p>
|
|---|
| 249 | LVS is a mode where CTDB presents one single IP address for the
|
|---|
| 250 | entire cluster. This is an alternative to using public IP
|
|---|
| 251 | addresses and round-robin DNS to loadbalance clients across the
|
|---|
| 252 | cluster.
|
|---|
| 253 | </p><p>
|
|---|
| 254 | This is similar to using a layer-4 loadbalancing switch but with
|
|---|
| 255 | some restrictions.
|
|---|
| 256 | </p><p>
|
|---|
| 257 | In this mode the cluster selects a set of nodes in the cluster
|
|---|
| 258 | and loadbalance all client access to the LVS address across this
|
|---|
| 259 | set of nodes. This set of nodes are all LVS capable nodes that
|
|---|
| 260 | are HEALTHY, or if no HEALTHY nodes exists all LVS capable nodes
|
|---|
| 261 | regardless of health status. LVS will however never loadbalance
|
|---|
| 262 | traffic to nodes that are BANNED, STOPPED, DISABLED or
|
|---|
| 263 | DISCONNECTED. The <span class="command"><strong>ctdb lvs</strong></span> command is used to
|
|---|
| 264 | show which nodes are currently load-balanced across.
|
|---|
| 265 | </p><p>
|
|---|
| 266 | One of the these nodes are elected as the LVSMASTER. This node
|
|---|
| 267 | receives all traffic from clients coming in to the LVS address
|
|---|
| 268 | and multiplexes it across the internal network to one of the
|
|---|
| 269 | nodes that LVS is using. When responding to the client, that
|
|---|
| 270 | node will send the data back directly to the client, bypassing
|
|---|
| 271 | the LVSMASTER node. The command <span class="command"><strong>ctdb
|
|---|
| 272 | lvsmaster</strong></span> will show which node is the current
|
|---|
| 273 | LVSMASTER.
|
|---|
| 274 | </p><p>
|
|---|
| 275 | The path used for a client I/O is:
|
|---|
| 276 | </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
|
|---|
| 277 | Client sends request packet to LVSMASTER.
|
|---|
| 278 | </p></li><li class="listitem"><p>
|
|---|
| 279 | LVSMASTER passes the request on to one node across the
|
|---|
| 280 | internal network.
|
|---|
| 281 | </p></li><li class="listitem"><p>
|
|---|
| 282 | Selected node processes the request.
|
|---|
| 283 | </p></li><li class="listitem"><p>
|
|---|
| 284 | Node responds back to client.
|
|---|
| 285 | </p></li></ol></div><p>
|
|---|
| 286 | </p><p>
|
|---|
| 287 | This means that all incoming traffic to the cluster will pass
|
|---|
| 288 | through one physical node, which limits scalability. You can
|
|---|
| 289 | send more data to the LVS address that one physical node can
|
|---|
| 290 | multiplex. This means that you should not use LVS if your I/O
|
|---|
| 291 | pattern is write-intensive since you will be limited in the
|
|---|
| 292 | available network bandwidth that node can handle. LVS does work
|
|---|
| 293 | wery well for read-intensive workloads where only smallish READ
|
|---|
| 294 | requests are going through the LVSMASTER bottleneck and the
|
|---|
| 295 | majority of the traffic volume (the data in the read replies)
|
|---|
| 296 | goes straight from the processing node back to the clients. For
|
|---|
| 297 | read-intensive i/o patterns you can acheive very high throughput
|
|---|
| 298 | rates in this mode.
|
|---|
| 299 | </p><p>
|
|---|
| 300 | Note: you can use LVS and public addresses at the same time.
|
|---|
| 301 | </p><p>
|
|---|
| 302 | If you use LVS, you must have a permanent address configured for
|
|---|
| 303 | the public interface on each node. This address must be routable
|
|---|
| 304 | and the cluster nodes must be configured so that all traffic
|
|---|
| 305 | back to client hosts are routed through this interface. This is
|
|---|
| 306 | also required in order to allow samba/winbind on the node to
|
|---|
| 307 | talk to the domain controller. This LVS IP address can not be
|
|---|
| 308 | used to initiate outgoing traffic.
|
|---|
| 309 | </p><p>
|
|---|
| 310 | Make sure that the domain controller and the clients are
|
|---|
| 311 | reachable from a node <span class="emphasis"><em>before</em></span> you enable
|
|---|
| 312 | LVS. Also ensure that outgoing traffic to these hosts is routed
|
|---|
| 313 | out through the configured public interface.
|
|---|
| 314 | </p><div class="refsect2"><a name="idp55075776"></a><h3>Configuration</h3><p>
|
|---|
| 315 | To activate LVS on a CTDB node you must specify the
|
|---|
| 316 | <code class="varname">CTDB_PUBLIC_INTERFACE</code> and
|
|---|
| 317 | <code class="varname">CTDB_LVS_PUBLIC_IP</code> configuration variables.
|
|---|
| 318 | Setting the latter variable also enables the LVS capability on
|
|---|
| 319 | the node at startup.
|
|---|
| 320 | </p><p>
|
|---|
| 321 | Example:
|
|---|
| 322 | </p><pre class="screen">
|
|---|
| 323 | CTDB_PUBLIC_INTERFACE=eth1
|
|---|
| 324 | CTDB_LVS_PUBLIC_IP=10.1.1.237
|
|---|
| 325 | </pre><p>
|
|---|
| 326 | </p></div></div><div class="refsect1"><a name="idp55078960"></a><h2>TRACKING AND RESETTING TCP CONNECTIONS</h2><p>
|
|---|
| 327 | CTDB tracks TCP connections from clients to public IP addresses,
|
|---|
| 328 | on known ports. When an IP address moves from one node to
|
|---|
| 329 | another, all existing TCP connections to that IP address are
|
|---|
| 330 | reset. The node taking over this IP address will also send
|
|---|
| 331 | gratuitous ARPs (for IPv4, or neighbour advertisement, for
|
|---|
| 332 | IPv6). This allows clients to reconnect quickly, rather than
|
|---|
| 333 | waiting for TCP timeouts, which can be very long.
|
|---|
| 334 | </p><p>
|
|---|
| 335 | It is important that established TCP connections do not survive
|
|---|
| 336 | a release and take of a public IP address on the same node.
|
|---|
| 337 | Such connections can get out of sync with sequence and ACK
|
|---|
| 338 | numbers, potentially causing a disruptive ACK storm.
|
|---|
| 339 | </p></div><div class="refsect1"><a name="idp55081264"></a><h2>NAT GATEWAY</h2><p>
|
|---|
| 340 | NAT gateway (NATGW) is an optional feature that is used to
|
|---|
| 341 | configure fallback routing for nodes. This allows cluster nodes
|
|---|
| 342 | to connect to external services (e.g. DNS, AD, NIS and LDAP)
|
|---|
| 343 | when they do not host any public addresses (e.g. when they are
|
|---|
| 344 | unhealthy).
|
|---|
| 345 | </p><p>
|
|---|
| 346 | This also applies to node startup because CTDB marks nodes as
|
|---|
| 347 | UNHEALTHY until they have passed a "monitor" event. In this
|
|---|
| 348 | context, NAT gateway helps to avoid a "chicken and egg"
|
|---|
| 349 | situation where a node needs to access an external service to
|
|---|
| 350 | become healthy.
|
|---|
| 351 | </p><p>
|
|---|
| 352 | Another way of solving this type of problem is to assign an
|
|---|
| 353 | extra static IP address to a public interface on every node.
|
|---|
| 354 | This is simpler but it uses an extra IP address per node, while
|
|---|
| 355 | NAT gateway generally uses only one extra IP address.
|
|---|
| 356 | </p><div class="refsect2"><a name="idp55083952"></a><h3>Operation</h3><p>
|
|---|
| 357 | One extra NATGW public address is assigned on the public
|
|---|
| 358 | network to each NATGW group. Each NATGW group is a set of
|
|---|
| 359 | nodes in the cluster that shares the same NATGW address to
|
|---|
| 360 | talk to the outside world. Normally there would only be one
|
|---|
| 361 | NATGW group spanning an entire cluster, but in situations
|
|---|
| 362 | where one CTDB cluster spans multiple physical sites it might
|
|---|
| 363 | be useful to have one NATGW group for each site.
|
|---|
| 364 | </p><p>
|
|---|
| 365 | There can be multiple NATGW groups in a cluster but each node
|
|---|
| 366 | can only be member of one NATGW group.
|
|---|
| 367 | </p><p>
|
|---|
| 368 | In each NATGW group, one of the nodes is selected by CTDB to
|
|---|
| 369 | be the NATGW master and the other nodes are consider to be
|
|---|
| 370 | NATGW slaves. NATGW slaves establish a fallback default route
|
|---|
| 371 | to the NATGW master via the private network. When a NATGW
|
|---|
| 372 | slave hosts no public IP addresses then it will use this route
|
|---|
| 373 | for outbound connections. The NATGW master hosts the NATGW
|
|---|
| 374 | public IP address and routes outgoing connections from
|
|---|
| 375 | slave nodes via this IP address. It also establishes a
|
|---|
| 376 | fallback default route.
|
|---|
| 377 | </p></div><div class="refsect2"><a name="idp55086960"></a><h3>Configuration</h3><p>
|
|---|
| 378 | NATGW is usually configured similar to the following example configuration:
|
|---|
| 379 | </p><pre class="screen">
|
|---|
| 380 | CTDB_NATGW_NODES=/usr/local/etc/ctdb/natgw_nodes
|
|---|
| 381 | CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
|
|---|
| 382 | CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
|
|---|
| 383 | CTDB_NATGW_PUBLIC_IFACE=eth0
|
|---|
| 384 | CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
|
|---|
| 385 | </pre><p>
|
|---|
| 386 | Normally any node in a NATGW group can act as the NATGW
|
|---|
| 387 | master. Some configurations may have special nodes that lack
|
|---|
| 388 | connectivity to a public network. In such cases, those nodes
|
|---|
| 389 | can be flagged with the "slave-only" option in the
|
|---|
| 390 | <code class="varname">CTDB_NATGW_NODES</code> file to limit the NATGW
|
|---|
| 391 | functionality of those nodes.
|
|---|
| 392 | </p><p>
|
|---|
| 393 | See the <em class="citetitle">NAT GATEWAY</em> section in
|
|---|
| 394 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span> for more details of
|
|---|
| 395 | NATGW configuration.
|
|---|
| 396 | </p></div><div class="refsect2"><a name="idp55091728"></a><h3>Implementation details</h3><p>
|
|---|
| 397 | When the NATGW functionality is used, one of the nodes is
|
|---|
| 398 | selected to act as a NAT gateway for all the other nodes in
|
|---|
| 399 | the group when they need to communicate with the external
|
|---|
| 400 | services. The NATGW master is selected to be a node that is
|
|---|
| 401 | most likely to have usable networks.
|
|---|
| 402 | </p><p>
|
|---|
| 403 | The NATGW master hosts the NATGW public IP address
|
|---|
| 404 | <code class="varname">CTDB_NATGW_PUBLIC_IP</code> on the configured public
|
|---|
| 405 | interfaces <code class="varname">CTDB_NATGW_PUBLIC_IFACE</code> and acts as
|
|---|
| 406 | a router, masquerading outgoing connections from slave nodes
|
|---|
| 407 | via this IP address. If
|
|---|
| 408 | <code class="varname">CTDB_NATGW_DEFAULT_GATEWAY</code> is set then it
|
|---|
| 409 | also establishes a fallback default route to the configured
|
|---|
| 410 | this gateway with a metric of 10. A metric 10 route is used
|
|---|
| 411 | so it can co-exist with other default routes that may be
|
|---|
| 412 | available.
|
|---|
| 413 | </p><p>
|
|---|
| 414 | A NATGW slave establishes its fallback default route to the
|
|---|
| 415 | NATGW master via the private network
|
|---|
| 416 | <code class="varname">CTDB_NATGW_PRIVATE_NETWORK</code>with a metric of 10.
|
|---|
| 417 | This route is used for outbound connections when no other
|
|---|
| 418 | default route is available because the node hosts no public
|
|---|
| 419 | addresses. A metric 10 routes is used so that it can co-exist
|
|---|
| 420 | with other default routes that may be available when the node
|
|---|
| 421 | is hosting public addresses.
|
|---|
| 422 | </p><p>
|
|---|
| 423 | <code class="varname">CTDB_NATGW_STATIC_ROUTES</code> can be used to
|
|---|
| 424 | have NATGW create more specific routes instead of just default
|
|---|
| 425 | routes.
|
|---|
| 426 | </p><p>
|
|---|
| 427 | This is implemented in the <code class="filename">11.natgw</code>
|
|---|
| 428 | eventscript. Please see the eventscript file and the
|
|---|
| 429 | <em class="citetitle">NAT GATEWAY</em> section in
|
|---|
| 430 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span> for more details.
|
|---|
| 431 | </p></div></div><div class="refsect1"><a name="idp55099552"></a><h2>POLICY ROUTING</h2><p>
|
|---|
| 432 | Policy routing is an optional CTDB feature to support complex
|
|---|
| 433 | network topologies. Public addresses may be spread across
|
|---|
| 434 | several different networks (or VLANs) and it may not be possible
|
|---|
| 435 | to route packets from these public addresses via the system's
|
|---|
| 436 | default route. Therefore, CTDB has support for policy routing
|
|---|
| 437 | via the <code class="filename">13.per_ip_routing</code> eventscript.
|
|---|
| 438 | This allows routing to be specified for packets sourced from
|
|---|
| 439 | each public address. The routes are added and removed as CTDB
|
|---|
| 440 | moves public addresses between nodes.
|
|---|
| 441 | </p><div class="refsect2"><a name="idp55101776"></a><h3>Configuration variables</h3><p>
|
|---|
| 442 | There are 4 configuration variables related to policy routing:
|
|---|
| 443 | <code class="varname">CTDB_PER_IP_ROUTING_CONF</code>,
|
|---|
| 444 | <code class="varname">CTDB_PER_IP_ROUTING_RULE_PREF</code>,
|
|---|
| 445 | <code class="varname">CTDB_PER_IP_ROUTING_TABLE_ID_LOW</code>,
|
|---|
| 446 | <code class="varname">CTDB_PER_IP_ROUTING_TABLE_ID_HIGH</code>. See the
|
|---|
| 447 | <em class="citetitle">POLICY ROUTING</em> section in
|
|---|
| 448 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span> for more details.
|
|---|
| 449 | </p></div><div class="refsect2"><a name="idp55105744"></a><h3>Configuration</h3><p>
|
|---|
| 450 | The format of each line of
|
|---|
| 451 | <code class="varname">CTDB_PER_IP_ROUTING_CONF</code> is:
|
|---|
| 452 | </p><pre class="screen">
|
|---|
| 453 | <public_address> <network> [ <gateway> ]
|
|---|
| 454 | </pre><p>
|
|---|
| 455 | Leading whitespace is ignored and arbitrary whitespace may be
|
|---|
| 456 | used as a separator. Lines that have a "public address" item
|
|---|
| 457 | that doesn't match an actual public address are ignored. This
|
|---|
| 458 | means that comment lines can be added using a leading
|
|---|
| 459 | character such as '#', since this will never match an IP
|
|---|
| 460 | address.
|
|---|
| 461 | </p><p>
|
|---|
| 462 | A line without a gateway indicates a link local route.
|
|---|
| 463 | </p><p>
|
|---|
| 464 | For example, consider the configuration line:
|
|---|
| 465 | </p><pre class="screen">
|
|---|
| 466 | 192.168.1.99 192.168.1.1/24
|
|---|
| 467 | </pre><p>
|
|---|
| 468 | If the corresponding public_addresses line is:
|
|---|
| 469 | </p><pre class="screen">
|
|---|
| 470 | 192.168.1.99/24 eth2,eth3
|
|---|
| 471 | </pre><p>
|
|---|
| 472 | <code class="varname">CTDB_PER_IP_ROUTING_RULE_PREF</code> is 100, and
|
|---|
| 473 | CTDB adds the address to eth2 then the following routing
|
|---|
| 474 | information is added:
|
|---|
| 475 | </p><pre class="screen">
|
|---|
| 476 | ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
|
|---|
| 477 | ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
|
|---|
| 478 | </pre><p>
|
|---|
| 479 | This causes traffic from 192.168.1.1 to 192.168.1.0/24 go via
|
|---|
| 480 | eth2.
|
|---|
| 481 | </p><p>
|
|---|
| 482 | The <span class="command"><strong>ip rule</strong></span> command will show (something
|
|---|
| 483 | like - depending on other public addresses and other routes on
|
|---|
| 484 | the system):
|
|---|
| 485 | </p><pre class="screen">
|
|---|
| 486 | 0: from all lookup local
|
|---|
| 487 | 100: from 192.168.1.99 lookup ctdb.192.168.1.99
|
|---|
| 488 | 32766: from all lookup main
|
|---|
| 489 | 32767: from all lookup default
|
|---|
| 490 | </pre><p>
|
|---|
| 491 | <span class="command"><strong>ip route show table ctdb.192.168.1.99</strong></span> will show:
|
|---|
| 492 | </p><pre class="screen">
|
|---|
| 493 | 192.168.1.0/24 dev eth2 scope link
|
|---|
| 494 | </pre><p>
|
|---|
| 495 | The usual use for a line containing a gateway is to add a
|
|---|
| 496 | default route corresponding to a particular source address.
|
|---|
| 497 | Consider this line of configuration:
|
|---|
| 498 | </p><pre class="screen">
|
|---|
| 499 | 192.168.1.99 0.0.0.0/0 192.168.1.1
|
|---|
| 500 | </pre><p>
|
|---|
| 501 | In the situation described above this will cause an extra
|
|---|
| 502 | routing command to be executed:
|
|---|
| 503 | </p><pre class="screen">
|
|---|
| 504 | ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
|
|---|
| 505 | </pre><p>
|
|---|
| 506 | With both configuration lines, <span class="command"><strong>ip route show table
|
|---|
| 507 | ctdb.192.168.1.99</strong></span> will show:
|
|---|
| 508 | </p><pre class="screen">
|
|---|
| 509 | 192.168.1.0/24 dev eth2 scope link
|
|---|
| 510 | default via 192.168.1.1 dev eth2
|
|---|
| 511 | </pre></div><div class="refsect2"><a name="idp55120960"></a><h3>Sample configuration</h3><p>
|
|---|
| 512 | Here is a more complete example configuration.
|
|---|
| 513 | </p><pre class="screen">
|
|---|
| 514 | /usr/local/etc/ctdb/public_addresses:
|
|---|
| 515 |
|
|---|
| 516 | 192.168.1.98 eth2,eth3
|
|---|
| 517 | 192.168.1.99 eth2,eth3
|
|---|
| 518 |
|
|---|
| 519 | /usr/local/etc/ctdb/policy_routing:
|
|---|
| 520 |
|
|---|
| 521 | 192.168.1.98 192.168.1.0/24
|
|---|
| 522 | 192.168.1.98 192.168.200.0/24 192.168.1.254
|
|---|
| 523 | 192.168.1.98 0.0.0.0/0 192.168.1.1
|
|---|
| 524 | 192.168.1.99 192.168.1.0/24
|
|---|
| 525 | 192.168.1.99 192.168.200.0/24 192.168.1.254
|
|---|
| 526 | 192.168.1.99 0.0.0.0/0 192.168.1.1
|
|---|
| 527 | </pre><p>
|
|---|
| 528 | The routes local packets as expected, the default route is as
|
|---|
| 529 | previously discussed, but packets to 192.168.200.0/24 are
|
|---|
| 530 | routed via the alternate gateway 192.168.1.254.
|
|---|
| 531 | </p></div></div><div class="refsect1"><a name="idp55124176"></a><h2>NOTIFICATION SCRIPT</h2><p>
|
|---|
| 532 | When certain state changes occur in CTDB, it can be configured
|
|---|
| 533 | to perform arbitrary actions via a notification script. For
|
|---|
| 534 | example, sending SNMP traps or emails when a node becomes
|
|---|
| 535 | unhealthy or similar.
|
|---|
| 536 | </p><p>
|
|---|
| 537 | This is activated by setting the
|
|---|
| 538 | <code class="varname">CTDB_NOTIFY_SCRIPT</code> configuration variable.
|
|---|
| 539 | The specified script must be executable.
|
|---|
| 540 | </p><p>
|
|---|
| 541 | Use of the provided <code class="filename">/usr/local/etc/ctdb/notify.sh</code>
|
|---|
| 542 | script is recommended. It executes files in
|
|---|
| 543 | <code class="filename">/usr/local/etc/ctdb/notify.d/</code>.
|
|---|
| 544 | </p><p>
|
|---|
| 545 | CTDB currently generates notifications after CTDB changes to
|
|---|
| 546 | these states:
|
|---|
| 547 | </p><table border="0" summary="Simple list" class="simplelist"><tr><td>init</td></tr><tr><td>setup</td></tr><tr><td>startup</td></tr><tr><td>healthy</td></tr><tr><td>unhealthy</td></tr></table></div><div class="refsect1"><a name="idp55131120"></a><h2>DEBUG LEVELS</h2><p>
|
|---|
| 548 | Valid values for DEBUGLEVEL are:
|
|---|
| 549 | </p><table border="0" summary="Simple list" class="simplelist"><tr><td>ERR (0)</td></tr><tr><td>WARNING (1)</td></tr><tr><td>NOTICE (2)</td></tr><tr><td>INFO (3)</td></tr><tr><td>DEBUG (4)</td></tr></table></div><div class="refsect1"><a name="idp55134816"></a><h2>REMOTE CLUSTER NODES</h2><p>
|
|---|
| 550 | It is possible to have a CTDB cluster that spans across a WAN link.
|
|---|
| 551 | For example where you have a CTDB cluster in your datacentre but you also
|
|---|
| 552 | want to have one additional CTDB node located at a remote branch site.
|
|---|
| 553 | This is similar to how a WAN accelerator works but with the difference
|
|---|
| 554 | that while a WAN-accelerator often acts as a Proxy or a MitM, in
|
|---|
| 555 | the ctdb remote cluster node configuration the Samba instance at the remote site
|
|---|
| 556 | IS the genuine server, not a proxy and not a MitM, and thus provides 100%
|
|---|
| 557 | correct CIFS semantics to clients.
|
|---|
| 558 | </p><p>
|
|---|
| 559 | See the cluster as one single multihomed samba server where one of
|
|---|
| 560 | the NICs (the remote node) is very far away.
|
|---|
| 561 | </p><p>
|
|---|
| 562 | NOTE: This does require that the cluster filesystem you use can cope
|
|---|
| 563 | with WAN-link latencies. Not all cluster filesystems can handle
|
|---|
| 564 | WAN-link latencies! Whether this will provide very good WAN-accelerator
|
|---|
| 565 | performance or it will perform very poorly depends entirely
|
|---|
| 566 | on how optimized your cluster filesystem is in handling high latency
|
|---|
| 567 | for data and metadata operations.
|
|---|
| 568 | </p><p>
|
|---|
| 569 | To activate a node as being a remote cluster node you need to set
|
|---|
| 570 | the following two parameters in /etc/sysconfig/ctdb for the remote node:
|
|---|
| 571 | </p><pre class="screen">
|
|---|
| 572 | CTDB_CAPABILITY_LMASTER=no
|
|---|
| 573 | CTDB_CAPABILITY_RECMASTER=no
|
|---|
| 574 | </pre><p>
|
|---|
| 575 | </p><p>
|
|---|
| 576 | Verify with the command "ctdb getcapabilities" that that node no longer
|
|---|
| 577 | has the recmaster or the lmaster capabilities.
|
|---|
| 578 | </p></div><div class="refsect1"><a name="idp55139520"></a><h2>SEE ALSO</h2><p>
|
|---|
| 579 | <span class="citerefentry"><span class="refentrytitle">ctdb</span>(1)</span>,
|
|---|
| 580 |
|
|---|
| 581 | <span class="citerefentry"><span class="refentrytitle">ctdbd</span>(1)</span>,
|
|---|
| 582 |
|
|---|
| 583 | <span class="citerefentry"><span class="refentrytitle">ctdbd_wrapper</span>(1)</span>,
|
|---|
| 584 |
|
|---|
| 585 | <span class="citerefentry"><span class="refentrytitle">ltdbtool</span>(1)</span>,
|
|---|
| 586 |
|
|---|
| 587 | <span class="citerefentry"><span class="refentrytitle">onnode</span>(1)</span>,
|
|---|
| 588 |
|
|---|
| 589 | <span class="citerefentry"><span class="refentrytitle">ping_pong</span>(1)</span>,
|
|---|
| 590 |
|
|---|
| 591 | <span class="citerefentry"><span class="refentrytitle">ctdbd.conf</span>(5)</span>,
|
|---|
| 592 |
|
|---|
| 593 | <span class="citerefentry"><span class="refentrytitle">ctdb-statistics</span>(7)</span>,
|
|---|
| 594 |
|
|---|
| 595 | <span class="citerefentry"><span class="refentrytitle">ctdb-tunables</span>(7)</span>,
|
|---|
| 596 |
|
|---|
| 597 | <a class="ulink" href="http://ctdb.samba.org/" target="_top">http://ctdb.samba.org/</a>
|
|---|
| 598 | </p></div></div></body></html>
|
|---|