]> git.proxmox.com Git - mirror_corosync.git/log
mirror_corosync.git
3 years agoconfigure: move exec_prefix sanitize
Fabio M. Di Nitto [Tue, 9 Mar 2021 10:04:30 +0000 (11:04 +0100)]
configure: move exec_prefix sanitize

Move exec_prefix sanitize closer to prefix. This is not
functional change, just group functional tests together.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agoconfigure: drop dead code
Fabio M. Di Nitto [Tue, 9 Mar 2021 10:03:04 +0000 (11:03 +0100)]
configure: drop dead code

prefix is sanitized already at the top of configure.ac to /usr,
hence the second instance can never hit.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agoconfigure: detect and init pkg-config with macro
Fabio M. Di Nitto [Tue, 9 Mar 2021 10:01:50 +0000 (11:01 +0100)]
configure: detect and init pkg-config with macro

this also allows to use PKG_CONFIG_* macros immediately
in conditional calls

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agomain: Close race condition when moving to statedir
Christine Caulfield [Wed, 3 Mar 2021 14:10:09 +0000 (14:10 +0000)]
main: Close race condition when moving to statedir

Found by covscan which also didn't like us 'leaking' the
fd to the lockfile. So close that too.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agoinit: Use corosync-cfgtool for shutdown
Jan Friesse [Thu, 14 Jan 2021 13:00:42 +0000 (14:00 +0100)]
init: Use corosync-cfgtool for shutdown

... to trigger cfg shutdown callbacks.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agotest: Add testcfg to exercise some cfg functions
Christine Caulfield [Wed, 13 Jan 2021 08:26:27 +0000 (08:26 +0000)]
test: Add testcfg to exercise some cfg functions

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agocfg: Reinstate cfg tracking
Christine Caulfield [Mon, 11 Jan 2021 09:28:34 +0000 (09:28 +0000)]
cfg: Reinstate cfg tracking

CFG tracking was removed in 815375411e80131f31b172d7c43625769ee8b53d,
probably as a mistake, as part of the tidy up of cfg and the removal of
dynamic loading. This means that shutdown tracking (using
cfg_try_shutdown()) stopped working.

This patch restores the trackstart & trackstop API calls (renamed to be
more consistent with the exiting libraries) so that shutdown tracking
can be used again.

Change cfg.shutdown_timeout to be in milliseconds rather than seconds
nd use libqb macros for conversion.

Add --force option to corosync-cfgtool -H

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agocfg: Improve nodestatusget versioning
Jan Friesse [Tue, 24 Nov 2020 11:20:25 +0000 (12:20 +0100)]
cfg: Improve nodestatusget versioning

Patch tries to make nodestatusget really extendable. Following changes
are implemented:
- corosync_cfg_node_status_version_t is added with (for now) single
  value CFG_NODE_STATUS_V1
- corosync_knet_node_status renamed to corosync_cfg_node_status_v1 (it
  isn't really knet because it works as well for udp(u()
- struct res_lib_cfg_nodestatusget_version is added which holds only ipc
  result header and version on same position as for
  corosync_cfg_node_status_v1
- corosync_cfg_node_status_get requires version and pointer to one of
  corosync_cfg_node_status_v structures
- request is handled in case switches to make adding new version easier

Also fix following bugs:
- totempg_nodestatus_get error was retyped to cs_error_t without any
  meaning.
- header.error was not checked at all in the library

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agocfg: New API to get extended node/link infomation
Christine Caulfield [Thu, 29 Oct 2020 11:07:48 +0000 (11:07 +0000)]
cfg: New API to get extended node/link infomation

Current we horribly over-use totempg_ifaces_get() to
retrieve information about knet interfaces. This is an attempt to
improve on that.

All transports are supported (so not only Knet but also UDP(U)).

This patch builds best against the "onwire-upgrade" branch of knet
as that's what sparked my interest in getting more information out.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agototemknet: Check both cipher and hash for crypto
Jan Friesse [Tue, 10 Nov 2020 17:10:17 +0000 (18:10 +0100)]
totemknet: Check both cipher and hash for crypto

Previously only crypto cipher was used as a way to find out if crypto is
enabled or disabled.

This usually works ok until cipher is set to none and hash to some other
value (like sha1). Such config is perfectly valid and it was not
supported correctly.

As a solution, check both cipher and hash.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agoThe ring id file needn't be executable
Ferenc Wágner [Sun, 8 Nov 2020 19:49:15 +0000 (20:49 +0100)]
The ring id file needn't be executable

At the same time simplify the overwrite logic and stop clearing the
umask (which is unexpected and quite pointless here, as applications
can't really protect the users from their own pathological settings).

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agopkgconfig: export LOGDIR in corosync.pc
Fabio M. Di Nitto [Fri, 6 Nov 2020 04:12:19 +0000 (05:12 +0100)]
pkgconfig: export LOGDIR in corosync.pc

logdir is configurable at build time and can change
from distro to distro. Export the path for pcs to use.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agospec: Add isa version of corosync-devel provides
Jan Friesse [Mon, 2 Nov 2020 09:53:33 +0000 (10:53 +0100)]
spec: Add isa version of corosync-devel provides

Also add release to version to match autogenerated corosynclib-devel
provides.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agototemconfig: remove redundant nodeid error log v3.1.0
liangxin1300 [Sun, 18 Oct 2020 14:40:25 +0000 (22:40 +0800)]
totemconfig: remove redundant nodeid error log

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agototemsrp: More informative messages
Aleksei Burlakov [Tue, 13 Oct 2020 08:57:24 +0000 (10:57 +0200)]
totemsrp: More informative messages

... when token and consensus timeouts pop.

Signed-off-by: Aleksei Burlakov <aburlakov@suse.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agoconfig: Increase default token timeout to 3000 ms
Jan Friesse [Mon, 12 Oct 2020 12:26:10 +0000 (14:26 +0200)]
config: Increase default token timeout to 3000 ms

Default token timeout of 1000 ms was often changed by users because of
other workloads on machine which may make corosync responding a bit
later than needed and resulting in token loss.

3000 ms was chosen as a compromise between token timeout increase
and allow live cluster upgrade (other nodes should receive token
by node with new default on time).

It doesn't affect token token_coefficient so final token timeout still
depends on number of configured nodes (just base is higher).

This change slows down failover a bit so for clusters where failover
times are important, please change the token timeout in configuration
file corosync.conf as a:

totem {
    version: 2
    token: 1000
...

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agoman: votequorum.5: use proper single quotes
Ferenc Wágner [Wed, 30 Sep 2020 08:26:42 +0000 (10:26 +0200)]
man: votequorum.5: use proper single quotes

Backtick and apostrophe are formatted as directional quotes by plain
groff, but they behave literally in the body of a man page.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agoman: fix typo: avaialable
Ferenc Wágner [Sun, 25 Aug 2019 13:48:33 +0000 (15:48 +0200)]
man: fix typo: avaialable

By slightly rewording the documentation of knet_compression_model.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agotests: Use CS_DISPATCH_BLOCKING instead of cycle
Jan Friesse [Tue, 29 Sep 2020 16:44:44 +0000 (18:44 +0200)]
tests: Use CS_DISPATCH_BLOCKING instead of cycle

Some tests were using dispatch function in CS_DISPATCH_ALL mode
without poll/select on fd. This leads to busywait cycle, because
CS_DISPATCH_ALL masks CS_ERR_TRY_AGAIN error.

Simpliest solution is to use CS_DISPATCH_BLOCKING instead and remove
while cycle, because CS_DISPATCH_BLOCKING handles CS_ERR_TRY_AGAIN
correctly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agoquorum: Add support for nodelist callback
Jan Friesse [Wed, 9 Sep 2020 12:16:55 +0000 (14:16 +0200)]
quorum: Add support for nodelist callback

Current quorum callback contains only actual view list and there is no
way how to find out joined/left nodes. This cannot be emulated by user
app, because when corosync restarts before other nodes notices then view
list is unchanged (ring id is changed tho).

Solution is to implement similar callback as for cpg which contains ring
id, member list, joined list and left list.

To implement such callback and keep backwards compatibility,
quorum_model_initialize is introduced. Its behavior is similar to
cpg_model_initialize. This allows passing model v1, which contains
enhanced quorum (full ring id is passed instead of just seq number)
and nodelist callbacks.

To find out which events should be sent by corosync daemon, new message
MESSAGE_REQ_QUORUM_MODEL_GETTYPE is used. Quorum library on init was
sending MESSAGE_REQ_QUORUM_GETTYPE. Whem model v1 is requested the
MESSAGE_REQ_QUORUM_MODEL_GETTYPE is used, which contains model number
so corosync knows that client is using model v1 and can send enhanced
quorum and nodelist events.

Nodelist event is (for now) send both in case of change of membership
and also when requested, also when CS_TRACK_CURRENT is requested, but
then left_list and joined_list is left empty, because they don't make
too much sense there.

New test application testquorummodel is added as an example of new API
usage.

Also during patch developement, I found few bugs here and there, which
are also fixed:
- quorum_initialize was never returning error code returned by
  MESSAGE_REQ_QUORUM_GETTYPE call (always returned CS_OK)
- Allocated memory in send_library_notification was based
  on sizeof(unsigned int) instead of mar_uint32_t. That's not wrong,
  but   it make more sense to use sizeof(mar_uint32_t) instead

(big thanks to Chrissie for englishify the man pages)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agoman: reload during rolling upgrade
Christine Caulfield [Wed, 30 Sep 2020 07:49:50 +0000 (08:49 +0100)]
man: reload during rolling upgrade

Make it clear that reloads during a rolling upgrade are not
supported.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
3 years agototemsrp: Move token received callback
Jan Friesse [Tue, 1 Sep 2020 13:24:19 +0000 (15:24 +0200)]
totemsrp: Move token received callback

Trigger token received callback only for valid token.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
3 years agocommon_lib: Remove trailing spaces in cs_strerror
Jan Friesse [Thu, 17 Sep 2020 13:30:07 +0000 (15:30 +0200)]
common_lib: Remove trailing spaces in cs_strerror

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agototemconfig: improve linknumber checking
liangxin1300 [Thu, 17 Sep 2020 02:54:03 +0000 (10:54 +0800)]
totemconfig: improve linknumber checking

Check whether linknumber larger than INTERFACE_MAX and display error if
so.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemconfig: add interface number to the error str
liangxin1300 [Wed, 16 Sep 2020 10:41:43 +0000 (18:41 +0800)]
totemconfig: add interface number to the error str

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfg: enhance message_handler_req_lib_cfg_killnode
liangxin1300 [Fri, 11 Sep 2020 04:10:41 +0000 (12:10 +0800)]
cfg: enhance message_handler_req_lib_cfg_killnode

While execute corosync-cfgtool -k <nodeid> to kill node:
* Check whether nodeid exists
* Check whether the node was joined

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemconfig: validate totem.transport value
liangxin1300 [Tue, 1 Sep 2020 03:02:37 +0000 (11:02 +0800)]
totemconfig: validate totem.transport value

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocmapctl: return error on no result of print prefix
liangxin1300 [Thu, 20 Aug 2020 16:13:11 +0000 (00:13 +0800)]
cmapctl: return error on no result of print prefix

return  EXIT_FAILURE if no result print for ACTION_PRINT_PREFIX.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocmapctl: check NULL for key type and value for -p
liangxin1300 [Fri, 21 Aug 2020 05:30:50 +0000 (13:30 +0800)]
cmapctl: check NULL for key type and value for -p

To avoid segmentation fault.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoquorumtool: strict check for -o option
liangxin1300 [Thu, 20 Aug 2020 06:02:40 +0000 (14:02 +0800)]
quorumtool: strict check for -o option

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoquorumtool: Help shouldn't require running service
liangxin1300 [Wed, 19 Aug 2020 03:11:37 +0000 (11:11 +0800)]
quorumtool: Help shouldn't require running service

Do not require corosync running when usage is requested.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfgtool: Return error when -i doesn't match
liangxin1300 [Mon, 17 Aug 2020 06:25:47 +0000 (14:25 +0800)]
cfgtool: Return error when -i doesn't match

Give error message and EXIT_FAILURE return code when -i
option doesn't match.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoman: update output of -s and -b for cfgtool
liangxin1300 [Mon, 17 Aug 2020 09:21:15 +0000 (17:21 +0800)]
man: update output of -s and -b for cfgtool

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocmapctl: return EXIT_FAILURE on failure
liangxin1300 [Mon, 17 Aug 2020 07:14:46 +0000 (15:14 +0800)]
cmapctl: return EXIT_FAILURE on failure

For -g and -d option return EXIT_FAILURE when error occurs (most often
because key does not exist).

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agotools: use util_strtonum for options checking
liangxin1300 [Fri, 7 Aug 2020 16:50:29 +0000 (00:50 +0800)]
tools: use util_strtonum for options checking

Function atoi is not safe since miss validation;
Function strtol is better but need to consider empty string and overflows
Function util_strtonum is a safer wrapper of strtoll

Use util_strtonum to check nodeid option and strict checking condition.

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfgtool: enhancement -a option
liangxin1300 [Mon, 10 Aug 2020 02:08:36 +0000 (10:08 +0800)]
cfgtool: enhancement -a option

  * Add return code
  * Give error message when nodeid not exist

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfgtool: output error messages to stderr
liangxin1300 [Fri, 7 Aug 2020 04:04:56 +0000 (12:04 +0800)]
cfgtool: output error messages to stderr

... and standardize the return code

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoconfigure: Use default systemd path with prefix
Jan Friesse [Thu, 16 Jul 2020 14:07:31 +0000 (16:07 +0200)]
configure: Use default systemd path with prefix

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agobuild: Use git-version-gen during specfile build
Jan Friesse [Thu, 16 Jul 2020 13:50:42 +0000 (15:50 +0200)]
build: Use git-version-gen during specfile build

Instead of copying parts of git-version-gen for spec target use
git-version-gen directly and parse final version into components
(rpmver, alphatag, numcomm) and use them.

Main reason is to simplify code a bit (sed scripts are a bit repetitive
tho), reuse the code and also allow building of RPM from dist tarball
generated from non-tagged commit or dirty git (not very useful).

The code relies on fact, that hyphen is never used in tagged release
name.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agobuild: Update git-version-gen
Jan Friesse [Tue, 14 Jul 2020 13:22:55 +0000 (15:22 +0200)]
build: Update git-version-gen

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agospec: Require at least knet 1.18 for crypto reload
Jan Friesse [Thu, 16 Jul 2020 13:40:27 +0000 (15:40 +0200)]
spec: Require at least knet 1.18 for crypto reload

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoconfig: Allow reconfiguration of crypto options
Christine Caulfield [Wed, 8 Jul 2020 09:31:20 +0000 (10:31 +0100)]
config: Allow reconfiguration of crypto options

Needs new knet crypto API.

If it's not available, then fall back to the old
API and forbid changing crypto while running.

To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agotest: Fix cpgtest
Christine Caulfield [Mon, 18 May 2020 12:34:07 +0000 (13:34 +0100)]
test: Fix cpgtest

... to cope with the max number of group members.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoconfig: Fix crash when a reload fails twice
Christine Caulfield [Mon, 6 Apr 2020 12:42:47 +0000 (13:42 +0100)]
config: Fix crash when a reload fails twice

Have string values stored in char arrays in totem_config
so we don't get into a mess with the pointers.

Also remove vsftype (which hasn't been used since corosync 1)

Use strncpy even though we know the string is fine. Keep covscan happy

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoconfig: Don't free pointers used by transports
Christine Caulfield [Fri, 3 Apr 2020 14:48:26 +0000 (15:48 +0100)]
config: Don't free pointers used by transports

reload failed for UDP[U] because they had saved pointers
to the interfaces[] array. so memcpy into that rather then
re-allocate it.

Also, move the check for different IP address families so
it also gets run at reload time.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoconfig: don't reload vquorum if reload fails
Christine Caulfield [Thu, 2 Apr 2020 07:43:32 +0000 (08:43 +0100)]
config: don't reload vquorum if reload fails

Fix an 'error: success' stype message by propogating error_string
back down the stack.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfg: Improve error return to cfgtool -R
Christine Caulfield [Mon, 30 Mar 2020 10:28:28 +0000 (11:28 +0100)]
cfg: Improve error return to cfgtool -R

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoconfig: Reorganise the config system
Christine Caulfield [Thu, 26 Mar 2020 08:28:18 +0000 (08:28 +0000)]
config: Reorganise the config system

To be more reliable & maintainable

The basic plan here is to fix reloads to be more stable
using read/parse/verify/build/commit stages, so that any errors
will not leave corosync in an unstable state. This should
also make the code more maintainable as currently the verify/commit
stages are horribly intertwined.

Also:
- Fix local_node_pos not being updated in the new map during validation
 (broke adding and removing new nodes in the middle of the list).
- Fix reconfiguration so that nodes are indexed by nodeid and not their
  position in the list. This is an old bug that's just been carried
  over

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoRevert "totemip: compare sin6_scope_id and interface_num" v3.0.4
Jan Friesse [Wed, 22 Apr 2020 11:30:36 +0000 (13:30 +0200)]
Revert "totemip: compare sin6_scope_id and interface_num"

This reverts commit efd34df531d1b23d6458dca863a7517b7ac0099d to make
master compile after revert of 934c47ed4384daf2819c26306bebba3225807499.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agoRevert "totemip: Add support for sin6_scope_id"
Jan Friesse [Wed, 22 Apr 2020 11:28:57 +0000 (13:28 +0200)]
Revert "totemip: Add support for sin6_scope_id"

This reverts commit 934c47ed4384daf2819c26306bebba3225807499 which is
causing protocol incompatibility in needle. Master seems to be not
affected, but it needs more checking.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfgtool: Fix error code as described in MP
Hideo Yamauchi [Thu, 26 Mar 2020 01:38:54 +0000 (10:38 +0900)]
cfgtool: Fix error code as described in MP

If all links are connected 0 is returned to the shell, otherwise it's
error code 1.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoicmap: icmap_init_r() leaks if trie_create() fails
Christine Caulfield [Thu, 26 Mar 2020 10:26:16 +0000 (10:26 +0000)]
icmap: icmap_init_r() leaks if trie_create() fails

Thanks to Coverity for finding this

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agovotequorum: set wfa status only on startup
Jan Friesse [Tue, 10 Mar 2020 16:49:27 +0000 (17:49 +0100)]
votequorum: set wfa status only on startup

Previously reload of configuration with enabled wait_for_all result in
set of wait_for_all_status which set cluster_is_quorate to 0 but didn't
inform the quorum service so votequorum and quorum information may get
out of sync.

Example is 1 node cluster, which is extended to 3 nodes. Quorum service
reports cluster as a quorate (incorrect) and votequorum as not-quorate
(correct). Similar behavior happens when extending cluster in general,
but some configurations are less incorrect (3->4).

Discussed solution was to inform quorum service but that would mean
every reload would cause loss of quorum until all nodes would be seen
again.

Such behaviour is consistent but seems to be a bit too strict.

Proposed solution sets wait_for_all_status only on startup and
doesn't touch it during reload.

This solution fulfills requirement of "cluster will be quorate for
the first time only after all nodes have been visible at least
once at the same time." because node clears wait_for_all_status only
after it sees all other nodes or joins cluster which is quorate. It also
solves problem with extending cluster, because when cluster becomes
unquorate (1->3) wait_for_all_status is set.

Added assert is only for ensure that I haven't missed any case when
quorate cluster may become unquorate.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agoquorumtool: exit on invalid expected votes
Jan Friesse [Wed, 4 Mar 2020 07:53:41 +0000 (08:53 +0100)]
quorumtool: exit on invalid expected votes

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agovotequorum: Change check of expected_votes
Jan Friesse [Wed, 4 Mar 2020 10:42:15 +0000 (11:42 +0100)]
votequorum: Change check of expected_votes

Previously value of new expected_votes was checked so newly computed
quorum value was in the interval <total_votes / 2, total_votes>. The
upper range prevented the cluster to become unquorate, but bottom check
was almost useless because it allowed to change expected_votes so it is
smaller than total_votes.

Solution is to check if expected_votes is bigger or equal to total_votes
and for quorate cluster only check if cluster doesn't become unquorate
(for unquorate cluster one can set upper range freely - as it is
perfectly possible when using config file)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agocfgtool: Simplify output a bit for link status
Jan Friesse [Tue, 3 Mar 2020 14:07:55 +0000 (15:07 +0100)]
cfgtool: Simplify output a bit for link status

Display words connected/disconnected instead of 1/0 and show enabled
status only when link is not enabled (shouldn't happen).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agoman: Enhance link_mode priority description
Jan Friesse [Tue, 25 Feb 2020 14:17:05 +0000 (15:17 +0100)]
man: Enhance link_mode priority description

Some users found description of priority for passive link_mode
confusing (probably because "priority" word is too
overloaded) so add some redundancy to make description
unambiguous.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agomain: Add schedmiss timestamp into message
Jan Friesse [Mon, 24 Feb 2020 13:58:45 +0000 (14:58 +0100)]
main: Add schedmiss timestamp into message

This is useful for matching schedmiss event in stats map with logged
event.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agototemip: compare sin6_scope_id and interface_num
liangxin1300 [Thu, 20 Feb 2020 05:38:49 +0000 (13:38 +0800)]
totemip: compare sin6_scope_id and interface_num

When user configure a specific interface like vlan
with the same IPv6 link-local address, Corosync should
compare sin6_scope_id with interface_num, to make sure got
the right interface to bind

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemip: Really remove totemip_copy_endian_convert
Jan Friesse [Mon, 17 Feb 2020 16:54:09 +0000 (17:54 +0100)]
totemip: Really remove totemip_copy_endian_convert

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemip: Remove unused totemip_copy_endian_convert
Jan Friesse [Fri, 14 Feb 2020 08:52:40 +0000 (09:52 +0100)]
totemip: Remove unused totemip_copy_endian_convert

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agototemip: Add support for sin6_scope_id
Jan Friesse [Fri, 14 Feb 2020 08:42:08 +0000 (09:42 +0100)]
totemip: Add support for sin6_scope_id

sin6_scope_id was not present in totemip structure making impossible to
use link-local ipv6 address.

Patch adds sin6_scope_id and changes convert/copy functions to use it
(formally also comparator functions should be changed, but it seems to
cause more harm and it is not really needed).

This makes corosync work with link-local addresses fine for both UDPU
and Knet transport as long as interface specification is used (so
fe80::xxxx:xxxx:xxxx:xxxx%eth0).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agocfgtool: Improve link status display
Jan Friesse [Mon, 10 Feb 2020 15:13:42 +0000 (16:13 +0100)]
cfgtool: Improve link status display

Totemknet is enhanced to use 'n' character for localhost and not adding
status, because it is safe to expect that localhost link is always
connectd. corosync-cfgtool is enhanced to properly decode 'n', '?' and
'd' characters and display its meaning for extended status. Special
characters are also documented in man page.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agototemknet: Change the initial value of the status
Hideo Yamauchi [Fri, 7 Feb 2020 04:02:47 +0000 (13:02 +0900)]
totemknet: Change the initial value of the status

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agostats: Use nanoseconds from epoch for schedmiss
Jan Friesse [Thu, 23 Jan 2020 16:11:54 +0000 (17:11 +0100)]
stats: Use nanoseconds from epoch for schedmiss

Using monotonic time is not working because it doesn't have to match
time from epoch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agostats: Add stats for scheduler misses
Christine Caulfield [Fri, 17 Jan 2020 14:22:16 +0000 (14:22 +0000)]
stats: Add stats for scheduler misses

This patch add a stats.schedmiss.* set of entries that
are a record of the last 10 times corosync was not scheduled
in time.

These entries are keypt in reverse order (so stats.schedmiss.0.* is
always the latest one kept) and the values, including the timestamp,
are in milliseconds.

It's also possible to use a cmap tracker to follow these events, which
might be useful.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agovotequorum: Reflect runtime change of 2Node to WFA
Jan Friesse [Thu, 16 Jan 2020 14:43:59 +0000 (15:43 +0100)]
votequorum: Reflect runtime change of 2Node to WFA

When 2Node mode is set, WFA is also set unless WFA is configured
explicitly. This behavior was not reflected on runtime change, so
restarted corosync behavior was different (WFA not set). Also when
cluster is reduced from 3 nodes to 2 nodes during runtime, WFA was not
set, what may result in two quorate partitions.

Solution is to set WFA depending on 2Node when WFA
is not explicitly configured.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agocpg: Change downlist log level
Hideo Yamauchi [Wed, 8 Jan 2020 23:39:55 +0000 (08:39 +0900)]
cpg: Change downlist log level

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoman: move cmap_keys man page from section 8 to 7
Ferenc Wágner [Sat, 4 Jan 2020 12:38:08 +0000 (13:38 +0100)]
man: move cmap_keys man page from section 8 to 7

Section 8 is for "System administration commands", 7 is "Miscellaneous".

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agostats: Check return code of stats_map_get
Jan Friesse [Tue, 26 Nov 2019 15:47:31 +0000 (16:47 +0100)]
stats: Check return code of stats_map_get

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agoquorumtool: Assert copied string length
Jan Friesse [Tue, 26 Nov 2019 14:05:46 +0000 (15:05 +0100)]
quorumtool: Assert copied string length

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agonotifyd: Check cmap_track_add result
Jan Friesse [Tue, 26 Nov 2019 13:17:53 +0000 (14:17 +0100)]
notifyd: Check cmap_track_add result

And assert length of key_name to strcpy.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agocmapctl: Free bin_value on error
Jan Friesse [Tue, 26 Nov 2019 13:09:14 +0000 (14:09 +0100)]
cmapctl: Free bin_value on error

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agocfgtool: Remove unused callbacks
Jan Friesse [Tue, 26 Nov 2019 12:59:23 +0000 (13:59 +0100)]
cfgtool: Remove unused callbacks

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agocpghum: Remove unused time variables and functions
Jan Friesse [Tue, 26 Nov 2019 12:56:07 +0000 (13:56 +0100)]
cpghum: Remove unused time variables and functions

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agovotequorum: Assert copied strings length
Jan Friesse [Tue, 26 Nov 2019 12:52:27 +0000 (13:52 +0100)]
votequorum: Assert copied strings length

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemknet: Assert strcpy length
Jan Friesse [Tue, 26 Nov 2019 12:13:53 +0000 (13:13 +0100)]
totemknet: Assert strcpy length

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemknet: Check result of fcntl O_NONBLOCK call
Jan Friesse [Tue, 26 Nov 2019 12:05:42 +0000 (13:05 +0100)]
totemknet: Check result of fcntl O_NONBLOCK call

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemconfig: Initialize warnings variable
Jan Friesse [Tue, 26 Nov 2019 12:02:04 +0000 (13:02 +0100)]
totemconfig: Initialize warnings variable

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agosync: Assert sync_callbacks.name length
Jan Friesse [Tue, 26 Nov 2019 12:01:16 +0000 (13:01 +0100)]
sync: Assert sync_callbacks.name length

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemknet: Don't mix corosync and knet error codes
Jan Friesse [Tue, 26 Nov 2019 09:26:36 +0000 (10:26 +0100)]
totemknet: Don't mix corosync and knet error codes

And use correct return code in stats.c.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agostats: Assert value_len when value is needed
Jan Friesse [Tue, 26 Nov 2019 09:10:42 +0000 (10:10 +0100)]
stats: Assert value_len when value is needed

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agocmap: Assert copied string length
Jan Friesse [Tue, 26 Nov 2019 08:58:52 +0000 (09:58 +0100)]
cmap: Assert copied string length

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemconfig: Reuse already fetched pointer
Jan Friesse [Tue, 26 Nov 2019 07:27:12 +0000 (08:27 +0100)]
totemconfig: Reuse already fetched pointer

Make code a bit readable and easier to process for coverity.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agologconfig: Remove double free of value
Jan Friesse [Mon, 25 Nov 2019 17:26:35 +0000 (18:26 +0100)]
logconfig: Remove double free of value

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agovotequorum: Ignore the icmap_get_* return value
Jan Friesse [Mon, 25 Nov 2019 17:21:52 +0000 (18:21 +0100)]
votequorum: Ignore the icmap_get_* return value

Express intention to ignore icmap_get_* return
value and rely on default behavior of not changing the output
parameter on error.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemconfig: Free leaks found by coverity
Jan Friesse [Mon, 25 Nov 2019 17:16:36 +0000 (18:16 +0100)]
totemconfig: Free leaks found by coverity

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agoicmap: fix the icmap_get_*_r functions v3.0.3
Christine Caulfield [Mon, 18 Nov 2019 15:19:45 +0000 (15:19 +0000)]
icmap: fix the icmap_get_*_r functions

Make the icmap*_r functions read from the specified map rather
than the global map.

Also include icmap_get_string_r() which seems to have been missed out.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agopkgconfig: Add libqb dependency
Fabio M. Di Nitto [Mon, 18 Nov 2019 08:54:32 +0000 (09:54 +0100)]
pkgconfig: Add libqb dependency

To make sure libqb dependency is visible across all libraries.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agoInitialize stack allocated memory
Jan Friesse [Thu, 10 Oct 2019 10:33:58 +0000 (12:33 +0200)]
Initialize stack allocated memory

Some functions allocated memory on stack without clearing memory and
then send them on wire. This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.

Solution is to clear stack memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
4 years agoman: Fix corosync.conf knet pong count default
Thomas Lamprecht [Wed, 16 Oct 2019 16:56:15 +0000 (18:56 +0200)]
man: Fix corosync.conf knet pong count default

commit 029b8ebad60314d3daa285eb945c55355fade389 changed the default
of the KNET_PONG_COUNT from the kronosnet default of 5 to 2, as
corosync bring up was deemed to slow.

The documentation, and the comment stating that the totem config
default values match the knet ones were not updated, and thus now out
of date.

Fixhis by noting the correct default of 2 for KNET_PONG_COUNT and
note that all but that one are in sync with the korosync defaults in
the comment.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
4 years agototemsrp: Reduce MTU to left room second mcast
Jan Friesse [Mon, 7 Oct 2019 13:26:22 +0000 (15:26 +0200)]
totemsrp: Reduce MTU to left room second mcast

Messages sent during recovery phase are encapsulated so such message has
extra size of mcast structure. This is not so big problem for UDPU,
because most of the switches are able to fragment and defragment packet
but it is problem for knet, because totempg is using maximum packet size
(65536 bytes) and when another header is added during retransmition,
then packet is too large.

Solution is to reduce mtu by 2 * sizeof (struct mcast).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agototempg: Check sanity (length) of received message
Jan Friesse [Thu, 3 Oct 2019 09:35:37 +0000 (11:35 +0200)]
totempg: Check sanity (length) of received message

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agobuild: add option for enabling sanitizer builds
Fabio M. Di Nitto [Wed, 9 Oct 2019 08:46:19 +0000 (10:46 +0200)]
build: add option for enabling sanitizer builds

--with-sanitizers= option is stricly meant for runtime debugging
purposes. Do NOT use in production.

Please check gcc/clang man pages on how to use ASAN/UBSAN/TSAN.

Also allow users to specificy SANITIZERS_CFLAGS and SANITIZERS_LDFLAGS
for advanced use.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agototemknet: Add locking for log call
Jan Friesse [Mon, 9 Sep 2019 15:47:24 +0000 (17:47 +0200)]
totemknet: Add locking for log call

Knet callbacks may be called from different thread than main thread. If
this happens, log messages may be lost. Most prominent example is when
link goes up (logged by main thread) and host_change_callback_fn is
called.

Implemented solution is adding mutex for every log call in totemknet.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
5 years agoman: Fix link_mode priority description
Jan Friesse [Mon, 26 Aug 2019 13:44:18 +0000 (15:44 +0200)]
man: Fix link_mode priority description

... to match knet source code.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
5 years agonotifyd: Don't dereference NULL key_name
Jan Friesse [Tue, 30 Jul 2019 12:24:32 +0000 (14:24 +0200)]
notifyd: Don't dereference NULL key_name

This problem shouldn't really happen, but better safe than sorry.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
5 years agototem: Increase ring_id seq after load
Jan Friesse [Mon, 15 Jul 2019 12:08:39 +0000 (14:08 +0200)]
totem: Increase ring_id seq after load

This patch handles the situation where the leader
node (the node with lowest node_id) crashes and is started again
before token timeout of the rest of the cluster.
The newly restarted node restores the ringid of the old ring from
stable storage, so it has the same ringid as rest of the nodes,
but ARU is zero. If the node is able to create a singleton membership
before receiving the joinlist from rest of the cluster,
everything works as expected, because the ring id gets increased
correctly.

But if the node receives a joinlist from another cluster node before
its own joinlist, then it continues as it would had it never left
the cluster. This is not correct, because the new node should always
create a singleton configuration first.

During the recovery phase, ARUs are compared and because they differ
(the ARU of the old leader node is 0), the other nodes
try to sent all of their previous messages. This is impossible
(even if it was correct), because other nodes have already freed most
of those messages. The implementation uses an assert to limit maximum
number of messages sent during recovery (we could fix this,
but it's not really the point).

The solution here is to increase the ring_id sequence number by 1 after
loading it from storage. During creation of the commit token it is
always increased by 4, so it will not collide with an existing
sequence.

Thanks Christine Caulfield <ccaulfie@redhat.com> for clarify commit
message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoinit: Use cpgtool instead of cfgtool
Jan Friesse [Thu, 4 Jul 2019 13:07:44 +0000 (15:07 +0200)]
init: Use cpgtool instead of cfgtool

Init script used to use corosync-cfgtool -s to wait till
corosync accepts ipc connection. Problem with this approach
is that error code is returned not only if ipc cannot be initialized,
but also when one of the ring is marked as failed, making corosync
service not to start. Corosync with one failed ring can work just
fine and there is no need to fail startup.

Patch is changing call of corosync-cfgtool to corosync-cpgtool. Also to
make spotting of broken ring easier, corosync-cfgtool -s is called after
successful return of the cpgtool, and warning is issued if cfgtool
fails.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agonotifyd: Fix warning produced by 32-bit compiler
Jan Friesse [Thu, 4 Jul 2019 12:36:54 +0000 (14:36 +0200)]
notifyd: Fix warning produced by 32-bit compiler

time_t is platform dependent real type which is usually long int on
64-bit platform, but only int on 32-bit platform and printing it with
%ld generated warning.

Solution seems to be ether retype time_t to long int or use functions
which works with time_t. Later option is used in this patch, which uses
localtime and strftime to print time_t value.

Also code is refactored to remove duplicate calls and add _cs_snmp
prefix to prevent snmp_ prefix collision.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>