1 | Changes in CTDB 2.5.1
|
---|
2 | =====================
|
---|
3 |
|
---|
4 | Important bug fixes
|
---|
5 | -------------------
|
---|
6 |
|
---|
7 | * The locking code now correctly implements a per-database active
|
---|
8 | locks limit. Whole database lock requests can no longer be denied
|
---|
9 | because there are too many active locks - this is particularly
|
---|
10 | important for freezing databases during recovery.
|
---|
11 |
|
---|
12 | * The debug_locks.sh script locks against itself. If it is already
|
---|
13 | running then subsequent invocations will exit immediately.
|
---|
14 |
|
---|
15 | * ctdb tool commands that operate on databases now work correctly when
|
---|
16 | a database ID is given.
|
---|
17 |
|
---|
18 | * Various code fixes for issues found by Coverity.
|
---|
19 |
|
---|
20 | Important internal changes
|
---|
21 | --------------------------
|
---|
22 |
|
---|
23 | * statd-callout has been updated so that statd client information is
|
---|
24 | always up-to-date across the cluster. This is implemented by
|
---|
25 | storing the client information in a persistent database using a new
|
---|
26 | "ctdb ptrans" command.
|
---|
27 |
|
---|
28 | * The transaction code for persistent databases now retries until it
|
---|
29 | is able to take the transaction lock. This makes the transation
|
---|
30 | semantics compatible with Samba's implementation.
|
---|
31 |
|
---|
32 | * Locking helpers are created with vfork(2) instead of fork(2),
|
---|
33 | providing a performance improvement.
|
---|
34 |
|
---|
35 | * config.guess has been updated to the latest upstream version so CTDB
|
---|
36 | should build on more platforms.
|
---|
37 |
|
---|
38 |
|
---|
39 | Changes in CTDB 2.5
|
---|
40 | ===================
|
---|
41 |
|
---|
42 | User-visible changes
|
---|
43 | --------------------
|
---|
44 |
|
---|
45 | * The default location of the ctdbd socket is now:
|
---|
46 |
|
---|
47 | /var/run/ctdb/ctdbd.socket
|
---|
48 |
|
---|
49 | If you currently set CTDB_SOCKET in configuration then unsetting it
|
---|
50 | will probably do what you want.
|
---|
51 |
|
---|
52 | * The default location of CTDB TDB databases is now:
|
---|
53 |
|
---|
54 | /var/lib/ctdb
|
---|
55 |
|
---|
56 | If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
|
---|
57 | you probably want to move your databases to /var/lib/ctdb, drop your
|
---|
58 | setting of CTDB_DBDIR and just use the default.
|
---|
59 |
|
---|
60 | To maintain the database files in /var/ctdb you will need to set
|
---|
61 | CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
|
---|
62 | these have moved.
|
---|
63 |
|
---|
64 | * Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
|
---|
65 | supported. Please use individual configuration variables instead.
|
---|
66 |
|
---|
67 | * Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
|
---|
68 | VacuumMaxInterval have been removed. Setting them had no effect but
|
---|
69 | if you now try to set them in a configuration files via CTDB_SET_X=Y
|
---|
70 | then CTDB will not start.
|
---|
71 |
|
---|
72 | * Much improved manual pages. Added new manpages ctdb(7),
|
---|
73 | ctdbd.conf(5), ctdb-tunables(7). Still some work to do.
|
---|
74 |
|
---|
75 | * Most CTDB-specific configuration can now be set in
|
---|
76 | /etc/ctdb/ctdbd.conf.
|
---|
77 |
|
---|
78 | This avoids cluttering distribution-specific configuration files,
|
---|
79 | such as /etc/sysconfig/ctdb. It also means that we can say: see
|
---|
80 | ctdbd.conf(5) for more details. :-)
|
---|
81 |
|
---|
82 | * Configuration variable NFS_SERVER_MODE is deprecated and has been
|
---|
83 | replaced by CTDB_NFS_SERVER_MODE. See ctdbd.conf(5) for more
|
---|
84 | details.
|
---|
85 |
|
---|
86 | * "ctdb reloadips" is much improved and should be used for reloading
|
---|
87 | the public IP configuration.
|
---|
88 |
|
---|
89 | This commands attempts to yield much more predictable IP allocations
|
---|
90 | than using sequences of delip and addip commands. See ctdb(1) for
|
---|
91 | details.
|
---|
92 |
|
---|
93 | * Ability to pass comma-separated string to ctdb(1) tool commands via
|
---|
94 | the -n option is now documented and works for most commands. See
|
---|
95 | ctdb(1) for details.
|
---|
96 |
|
---|
97 | * "ctdb rebalancenode" is now a debugging command and should not be
|
---|
98 | used in normal operation. See ctdb(1) for details.
|
---|
99 |
|
---|
100 | * "ctdb ban 0" is now invalid.
|
---|
101 |
|
---|
102 | This was documented as causing a permanent ban. However, this was
|
---|
103 | not implemented and caused an "unban" instead. To avoid confusion,
|
---|
104 | 0 is now an invalid ban duration. To administratively "ban" a node
|
---|
105 | use "ctdb stop" instead.
|
---|
106 |
|
---|
107 | * The systemd configuration now puts the PID file in /run/ctdb (rather
|
---|
108 | than /run/ctdbd) for consistency with the initscript and other uses
|
---|
109 | of /var/run/ctdb.
|
---|
110 |
|
---|
111 | Important bug fixes
|
---|
112 | -------------------
|
---|
113 |
|
---|
114 | * Traverse regression fixed.
|
---|
115 |
|
---|
116 | * The default recovery method for persistent databases has been
|
---|
117 | changed to use database sequence numbers instead of doing
|
---|
118 | record-by-record recovery (using record sequence numbers). This
|
---|
119 | fixes issues including registry corruption.
|
---|
120 |
|
---|
121 | * Banned nodes are no longer told to run the "ipreallocated" event
|
---|
122 | during a takeover run, when in fallback mode with nodes that don't
|
---|
123 | support the IPREALLOCATED control.
|
---|
124 |
|
---|
125 | Important internal changes
|
---|
126 | --------------------------
|
---|
127 |
|
---|
128 | * Persistent transactions are now compatible with Samba and work
|
---|
129 | reliably.
|
---|
130 |
|
---|
131 | * The recovery master role has been made more stable by resetting the
|
---|
132 | priority time each time a node becomes inactive. This means that
|
---|
133 | nodes that are active for a long time are more likely to retain the
|
---|
134 | recovery master role.
|
---|
135 |
|
---|
136 | * The incomplete libctdb library has been removed.
|
---|
137 |
|
---|
138 | * Test suite now starts ctdbd with the --sloppy-start option to speed
|
---|
139 | up startup. However, this should not be done in production.
|
---|
140 |
|
---|
141 |
|
---|
142 | Changes in CTDB 2.4
|
---|
143 | ===================
|
---|
144 |
|
---|
145 | User-visible changes
|
---|
146 | --------------------
|
---|
147 |
|
---|
148 | * A missing network interface now causes monitoring to fail and the
|
---|
149 | node to become unhealthy.
|
---|
150 |
|
---|
151 | * Changed ctdb command's default control timeout from 3s to 10s.
|
---|
152 |
|
---|
153 | * debug-hung-script.sh now includes the output of "ctdb scriptstatus"
|
---|
154 | to provide more information.
|
---|
155 |
|
---|
156 | Important bug fixes
|
---|
157 | -------------------
|
---|
158 |
|
---|
159 | * Starting CTDB daemon by running ctdbd directly should not remove
|
---|
160 | existing unix socket unconditionally.
|
---|
161 |
|
---|
162 | * ctdbd once again successfully kills client processes on releasing
|
---|
163 | public IPs. It was checking for them as tracked child processes
|
---|
164 | and not finding them, so wasn't killing them.
|
---|
165 |
|
---|
166 | * ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
|
---|
167 | ctdbd (such as uses of ctdb in eventscripts) use the correct socket.
|
---|
168 |
|
---|
169 | * Always use Jenkins hash when creating volatile databases. There
|
---|
170 | were a few places where TDBs would be attached with the wrong flags.
|
---|
171 |
|
---|
172 | * Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
|
---|
173 | which led to header corruption for empty records. This resulted
|
---|
174 | in inconsistent headers on two nodes and a request for such a record
|
---|
175 | keeps bouncing between nodes indefinitely and logs "High hopcount"
|
---|
176 | messages in the log. This also caused performance degradation.
|
---|
177 |
|
---|
178 | * ctdbd was losing log messages at shutdown because they weren't being
|
---|
179 | given time to flush. ctdbd now sleeps for a second during shutdown
|
---|
180 | to allow time to flush log messages.
|
---|
181 |
|
---|
182 | * Improved socket handling introduced in CTDB 2.2 caused ctdbd to
|
---|
183 | process a large number of packets available on single FD before
|
---|
184 | polling other FDs. Use fixed size queue buffers to allow fair
|
---|
185 | scheduling across multiple FDs.
|
---|
186 |
|
---|
187 | Important internal changes
|
---|
188 | --------------------------
|
---|
189 |
|
---|
190 | * A node that fails to take/release multiple IPs will only incur a
|
---|
191 | single banning credit. This makes a brief failure less likely to
|
---|
192 | cause node to be banned.
|
---|
193 |
|
---|
194 | * ctdb killtcp has been changed to read connections from stdin and
|
---|
195 | 10.interface now uses this feature to improve the time taken to kill
|
---|
196 | connections.
|
---|
197 |
|
---|
198 | * Improvements to hot records statistics in ctdb dbstatistics.
|
---|
199 |
|
---|
200 | * Recovery daemon now assembles up-to-date node flags information
|
---|
201 | from remote nodes before checking if any flags are inconsistent and
|
---|
202 | forcing a recovery.
|
---|
203 |
|
---|
204 | * ctdbd no longer creates multiple lock sub-processes for the same
|
---|
205 | key. This reduces the number of lock sub-processes substantially.
|
---|
206 |
|
---|
207 | * Changed the nfsd RPC check failure policy to failover quickly
|
---|
208 | instead of trying to repair a node first by restarting NFS. Such
|
---|
209 | restarts would often hang if the cause of the RPC check failure was
|
---|
210 | the cluster filesystem or storage.
|
---|
211 |
|
---|
212 | * Logging improvements relating to high hopcounts and sticky records.
|
---|
213 |
|
---|
214 | * Make sure lower level tdb messages are logged correctly.
|
---|
215 |
|
---|
216 | * CTDB commands disable/enable/stop/continue are now resilient to
|
---|
217 | individual control failures and retry in case of failures.
|
---|
218 |
|
---|
219 |
|
---|
220 | Changes in CTDB 2.3
|
---|
221 | ===================
|
---|
222 |
|
---|
223 | User-visible changes
|
---|
224 | --------------------
|
---|
225 |
|
---|
226 | * 2 new configuration variables for 60.nfs eventscript:
|
---|
227 |
|
---|
228 | - CTDB_MONITOR_NFS_THREAD_COUNT
|
---|
229 | - CTDB_NFS_DUMP_STUCK_THREADS
|
---|
230 |
|
---|
231 | See ctdb.sysconfig for details.
|
---|
232 |
|
---|
233 | * Removed DeadlockTimeout tunable. To enable debug of locking issues set
|
---|
234 |
|
---|
235 | CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
|
---|
236 |
|
---|
237 | * In overall statistics and database statistics, lock buckets have been
|
---|
238 | updated to use following timings:
|
---|
239 |
|
---|
240 | < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s
|
---|
241 |
|
---|
242 | * Initscript is now simplified with most CTDB-specific functionality
|
---|
243 | split out to ctdbd_wrapper, which is used to start and stop ctdbd.
|
---|
244 |
|
---|
245 | * Add systemd support.
|
---|
246 |
|
---|
247 | * CTDB subprocesses are now given informative names to allow them to
|
---|
248 | be easily distinguished when using programs like "top" or "perf".
|
---|
249 |
|
---|
250 | Important bug fixes
|
---|
251 | -------------------
|
---|
252 |
|
---|
253 | * ctdb tool should not exit from a retry loop if a control times out
|
---|
254 | (e.g. under high load). This simple fix will stop an exit from the
|
---|
255 | retry loop on any error.
|
---|
256 |
|
---|
257 | * When updating flags on all nodes, use the correct updated flags. This
|
---|
258 | should avoid wrong flag change messages in the logs.
|
---|
259 |
|
---|
260 | * The recovery daemon will not ban other nodes if the current node
|
---|
261 | is banned.
|
---|
262 |
|
---|
263 | * ctdb dbstatistics command now correctly outputs database statistics.
|
---|
264 |
|
---|
265 | * Fixed a panic with overlapping shutdowns (regression in 2.2).
|
---|
266 |
|
---|
267 | * Fixed 60.ganesha "monitor" event (regression in 2.2).
|
---|
268 |
|
---|
269 | * Fixed a buffer overflow in the "reloadips" implementation.
|
---|
270 |
|
---|
271 | * Fixed segmentation faults in ping_pong (called with incorrect
|
---|
272 | argument) and test binaries (called when ctdbd not running).
|
---|
273 |
|
---|
274 | Important internal changes
|
---|
275 | --------------------------
|
---|
276 |
|
---|
277 | * The recovery daemon on stopped or banned node will stop participating in any
|
---|
278 | cluster activity.
|
---|
279 |
|
---|
280 | * Improve cluster wide database traverse by sending the records directly from
|
---|
281 | traverse child process to requesting node.
|
---|
282 |
|
---|
283 | * TDB checking and dropping of all IPs moved from initscript to "init"
|
---|
284 | event in 00.ctdb.
|
---|
285 |
|
---|
286 | * To avoid "rogue IPs" the release IP callback now fails if the
|
---|
287 | released IP is still present on an interface.
|
---|
288 |
|
---|
289 |
|
---|
290 | Changes in CTDB 2.2
|
---|
291 | ===================
|
---|
292 |
|
---|
293 | User-visible changes
|
---|
294 | --------------------
|
---|
295 |
|
---|
296 | * The "stopped" event has been removed.
|
---|
297 |
|
---|
298 | The "ipreallocated" event is now run when a node is stopped. Use
|
---|
299 | this instead of "stopped".
|
---|
300 |
|
---|
301 | * New --pidfile option for ctdbd, used by initscript
|
---|
302 |
|
---|
303 | * The 60.nfs eventscript now uses configuration files in
|
---|
304 | /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
|
---|
305 | hardcoding them into the script.
|
---|
306 |
|
---|
307 | * Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.
|
---|
308 |
|
---|
309 | * The NoIPTakeoverOnDisabled tunable has been renamed to
|
---|
310 | NoIPHostOnAllDisabled and now works properly when set on individual
|
---|
311 | nodes.
|
---|
312 |
|
---|
313 | * New ctdb subcommand "runstate" prints the current internal runstate.
|
---|
314 | Runstates are used for serialising startup.
|
---|
315 |
|
---|
316 | Important bug fixes
|
---|
317 | -------------------
|
---|
318 |
|
---|
319 | * The Unix domain socket is now set to non-blocking after the
|
---|
320 | connection succeeds. This avoids connections failing with EAGAIN
|
---|
321 | and not being retried.
|
---|
322 |
|
---|
323 | * Fetching from the log ringbuffer now succeeds if the buffer is full.
|
---|
324 |
|
---|
325 | * Fix a severe recovery bug that can lead to data corruption for SMB clients.
|
---|
326 |
|
---|
327 | * The statd-callout script now runs as root via sudo.
|
---|
328 |
|
---|
329 | * "ctdb delip" no longer fails if it is unable to move the IP.
|
---|
330 |
|
---|
331 | * A race in the ctdb tool's ipreallocate code was fixed. This fixes
|
---|
332 | potential bugs in the "disable", "enable", "stop", "continue",
|
---|
333 | "ban", "unban", "ipreallocate" and "sync" commands.
|
---|
334 |
|
---|
335 | * The monitor cancellation code could sometimes hang indefinitely.
|
---|
336 | This could cause "ctdb stop" and "ctdb shutdown" to fail.
|
---|
337 |
|
---|
338 | Important internal changes
|
---|
339 | --------------------------
|
---|
340 |
|
---|
341 | * The socket I/O handling has been optimised to improve performance.
|
---|
342 |
|
---|
343 | * IPs will not be assigned to nodes during CTDB initialisation. They
|
---|
344 | will only be assigned to nodes that are in the "running" runstate.
|
---|
345 |
|
---|
346 | * Improved database locking code. One improvement is to use a
|
---|
347 | standalone locking helper executable - the avoids creating many
|
---|
348 | forked copies of ctdbd and potentially running a node out of memory.
|
---|
349 |
|
---|
350 | * New control CTDB_CONTROL_IPREALLOCATED is now used to generate
|
---|
351 | "ipreallocated" events.
|
---|
352 |
|
---|
353 | * Message handlers are now indexed, providing a significant
|
---|
354 | performance improvement.
|
---|