]> git.proxmox.com Git - mirror_corosync.git/log
mirror_corosync.git
6 years agocpg: Inform clients about left nodes during pause v2.99.2
Jan Friesse [Tue, 24 Apr 2018 15:44:48 +0000 (17:44 +0200)]
cpg: Inform clients about left nodes during pause

Patch tries to fix incorrect behaviour during following test-case:
- 3 nodes
- Node 1 is paused
- Node 2 and 3 detects node 1 as failed and informs CPG clients
- Node 1 is unpaused
- Node 1 clients are informed about new membership, but not about Node 1
  being paused, so from Node 1 point-of-view, Node 2 and 3 failure

Solution is to:
- Remove downlist master choose and always choose local node downlist.
  For Node 1 in example above, downlist contains Node 2 and 3.
- Keep code which informs clients about left nodes
- Use joinlist as a authoritative source of nodes/clients which exists
  in membership

This patch doesn't break backwards compatibility.

I've walked thru all the patches which changed behavior of cpg to ensure
patch does not break CPG behavior. Most important were:
058f50314cd20abe67f5e8fb3c029a63b0e10cdc - Base. Code was significantly
  changed to handle double free by split group_info into two structures
  cpg_pd (local node clients) and process_info (all clients). Joinlist
  was
97c28ea756cdf59316b2f609103122cc678329bd - This patch removed
  confchg_fn and made CPG sync correct
feff0e8542463773207a3b2c1f6004afba1f58d5 - I've tested described
  behavior without any issues
6bbbfcb6b4af72cf35ab9fdb4412fa6c6bdacc12 - Added idea of using
  heuristics to choose same downlist on all nodes. Sadly this idea
  was beginning of the problems described in
  040fda8872a4a20340d73fa1c240b86afb2489f8,
  ac1d79ea7c14997353427e962865781d0836d9fa,
  559d4083ed8355fe83f275e53b9c8f52a91694b2,
  02c5dffa5bb8579c223006fa1587de9ba7409a3d,
  64d0e5ace025cc929e42896c5d6beb3ef75b8244 and
  b55f32fe2e1538db33a1ec584b67744c724328c6
02c5dffa5bb8579c223006fa1587de9ba7409a3d - Made joinlist as
  authoritative source of nodes/clients but left downlist_master_choose
  as a source of information about left nodes

Long story made short. This patch basically reverts
idea of using heuristics to choose same downlist on all nodes.

(ported from needle 9c2a97f4f96b9639d07e2a9fe378c28ab1963191)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoman: Make the manpages reproducible
Chris Lamb [Sat, 21 Apr 2018 06:13:33 +0000 (08:13 +0200)]
man: Make the manpages reproducible

Whilst working on the Reproducible Builds effort [0], we noticed
that corosync could not be built reproducibly.

This is because, whilst it uses SOURCE_DATE_EPOCH[1], the output
varies depending on the current timezone.

(The LC_ALL is not needed as we only use %Y-%m-%d)

This was originally filed in Debian as #896441.

 [0] https://reproducible-builds.org/
 [1] https://reproducible-builds.org/specs/source-date-epoch/
 [2] https://bugs.debian.org/896441

Signed-off-by: Chris Lamb <lamby@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemsrp: Fix leave message regression
Jan Friesse [Thu, 19 Apr 2018 10:52:39 +0000 (12:52 +0200)]
totemsrp: Fix leave message regression

Leave message in totem is just join message where leaving member is
excluded from member list and included in fail list. It also contains
special nodeid in header.nodeid and system_from.nodeid fields.

Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions
were using system_from addresses and not nodeid, which was used only in
one specific case for memb_consensus_set function.

After the patch, addresses are gone and only nodeid is used. Result is,
that leaving node nodeid is not added into local fail list
(my_faillist) so node is unable to reach consensus till token timeout,
which starts new gather process.

Solution is to send valid leaving node nodeid in system_from.nodeid and
handle specific case for memb_consensus_set in memb_join_process.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Log proc/fail lists in memb_join_process
Jan Friesse [Thu, 19 Apr 2018 10:49:12 +0000 (12:49 +0200)]
totemsrp: Log proc/fail lists in memb_join_process

These information are useful and with trace log level they should not be
too much irritating.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Fix srp_addr_compare
Jan Friesse [Wed, 18 Apr 2018 13:34:04 +0000 (15:34 +0200)]
totemsrp: Fix srp_addr_compare

There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch
in srp_addr_compare function. This function should be usable with qsort,
so it should return values less than, equal to or greater than zero. It
was however returning only zero or negation of a zero. Final results
were unable to reach consensus in following test case:
- 3 node cluster
- start nodes 1, 2, 3
- shutdown node 3
- start node 3
- shutdown node 2
- start node 2
- shutdown node 1

After this steps, node 2 and 3 were unable to reach consensus.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agotools: don't distribute what we can easily make
Ferenc Wágner [Fri, 20 Apr 2018 08:44:52 +0000 (10:44 +0200)]
tools: don't distribute what we can easily make

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoDrop all references to SECURITY file
Fabio M. Di Nitto [Mon, 23 Apr 2018 03:31:33 +0000 (05:31 +0200)]
Drop all references to SECURITY file

File was removed by 6bdf0962ad035ac659bcbf36a918fe39931ed75d.
Patch fixes master branch build again.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoSECURITY: Remove SECURITY file
Jan Friesse [Fri, 20 Apr 2018 13:28:13 +0000 (15:28 +0200)]
SECURITY: Remove SECURITY file

Basically no information from SECURITY file is valid.

Library interface and related uidgid are better described in manpages.

LibNSS is not directly used any longer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoNSS_NoDB_Init: the parameter is reserved, must be NULL
Ferenc Wágner [Wed, 18 Apr 2018 16:17:41 +0000 (18:17 +0200)]
NSS_NoDB_Init: the parameter is reserved, must be NULL

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix typo: defualt -> default
Ferenc Wágner [Wed, 18 Apr 2018 17:06:50 +0000 (19:06 +0200)]
Fix typo: defualt -> default

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix typo: sucesfully -> successfully
Ferenc Wágner [Tue, 10 Apr 2018 10:47:49 +0000 (12:47 +0200)]
Fix typo: sucesfully -> successfully

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemsrp: Check join and leave msg length
Jan Friesse [Wed, 11 Apr 2018 14:15:01 +0000 (16:15 +0200)]
totemsrp: Check join and leave msg length

If number of proc_list, failed_list or active members is too high it
may be impossible to put them into message, which is allocated on the
stack what results in stack corruption.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Implement sanity checks of received msgs
Jan Friesse [Wed, 11 Apr 2018 14:12:43 +0000 (16:12 +0200)]
totemsrp: Implement sanity checks of received msgs

Sanity checkers are used to prevent crashing because of
accessing unallocated memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocpg: Handle fragmented message sending interrupt
Rytis Karpuška [Tue, 27 Mar 2018 12:01:36 +0000 (15:01 +0300)]
cpg: Handle fragmented message sending interrupt

It turns out that there are some legitimate cases where fragmented
messages might be interrupted during sending (e.g. CS_ERR_TRY_AGAIN or
as in my case: CS_ERR_INTERRUPT). This creates a situation where
LIBCPG_PARTIAL_FIRST is sent multiple times before receiving
LIBCPG_PARTIAL_LAST.

Solution is to drop incomplete message and start assembly of new message
as libcpg should have reported error during sending of that
incomplete message.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Display IP of sender v2.99.1
Jan Friesse [Wed, 14 Mar 2018 16:23:29 +0000 (17:23 +0100)]
totem: Display IP of sender

To make finding victim of incompatible messages easier, IP of sender is
logged. Propagating IP in layers makes patch slightly larger.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Add magic and version into header
Jan Friesse [Wed, 14 Mar 2018 15:25:11 +0000 (16:25 +0100)]
totemsrp: Add magic and version into header

Magic number (0xC070) together with version in every packet
is used for detecting that other node is really
Corosync 3.x.

Endian_detector field is removed and magic number is now
used instead.

If received packet magic number differs, guessing is used to show more
about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and
unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted
Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoknet: Fix display of links with unconfigured link0
Christine Caulfield [Fri, 16 Mar 2018 10:09:39 +0000 (10:09 +0000)]
knet: Fix display of links with unconfigured link0

because totemknet always configures link0 as loopback even
if it's not known to corosync, we need to filter it
out when returning the link status, as things get misaligned
in cfg.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agomain: Set errno before calling of strtol
Jan Friesse [Fri, 2 Mar 2018 14:50:02 +0000 (15:50 +0100)]
main: Set errno before calling of strtol

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoquorumtool: Don't set our_flags without v_handle
Jan Friesse [Wed, 3 Jan 2018 14:25:20 +0000 (15:25 +0100)]
quorumtool: Don't set our_flags without v_handle

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosam_test_agent: Remove unused assignment
Jan Friesse [Tue, 2 Jan 2018 17:17:03 +0000 (18:17 +0100)]
sam_test_agent: Remove unused assignment

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoblackbox: Quote subshell result properly
Jan Friesse [Tue, 2 Jan 2018 17:02:00 +0000 (18:02 +0100)]
blackbox: Quote subshell result properly

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoinit: Quote subshell result properly
Jan Friesse [Fri, 2 Mar 2018 14:59:58 +0000 (15:59 +0100)]
init: Quote subshell result properly

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocfgtool: Don't assume link ID is a single char
Christine Caulfield [Thu, 1 Mar 2018 13:48:39 +0000 (13:48 +0000)]
cfgtool: Don't assume link ID is a single char

For the moment link-ids are a single digit, but that could change and
the tools shouldn't be quite so fragile. So parse the interface_name
properly by looking for the space between the linkID and the IP.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoknet: Always use link0 for loopback
Christine Caulfield [Mon, 26 Feb 2018 16:00:24 +0000 (16:00 +0000)]
knet: Always use link0 for loopback

Even if it's not used for anything else.

Also, make cfgtool show the correct link ID when links are not
contiguous

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Fix debug warnings printed by knet
Christine Caulfield [Mon, 26 Feb 2018 14:05:40 +0000 (14:05 +0000)]
totem: Fix debug warnings printed by knet

Fix crash introduced a couple of commits ago in iface_get

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Allow use of ring0_addr
Christine Caulfield [Thu, 22 Feb 2018 13:57:50 +0000 (13:57 +0000)]
config: Allow use of ring0_addr

Allow ring0_addr to be used in place of 'name' for
backwards compatibility

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Update message when local host isn't found
Christine Caulfield [Thu, 22 Feb 2018 09:56:48 +0000 (09:56 +0000)]
config: Update message when local host isn't found

Make the message more representative of what's going on.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocfg: Fix cfg_get_node_addrs so that DLM works
Christine Caulfield [Tue, 20 Feb 2018 14:27:51 +0000 (14:27 +0000)]
cfg: Fix cfg_get_node_addrs so that DLM works

Also update copyright dates

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Return interface count correctly
Christine Caulfield [Fri, 16 Feb 2018 09:43:31 +0000 (09:43 +0000)]
totem: Return interface count correctly

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Use nodeid ONLY in srp_addr
Christine Caulfield [Thu, 15 Feb 2018 15:58:41 +0000 (15:58 +0000)]
totem: Use nodeid ONLY in srp_addr

This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.

This then allows us to make ring0 optional and replaceable when running
knet.

It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.

This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[rpm] use rpm macros to identify build distro
Fabio M. Di Nitto [Wed, 14 Feb 2018 08:39:23 +0000 (09:39 +0100)]
[rpm] use rpm macros to identify build distro

thanks Honza for spotting it

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[rpm] fixup corosync.spec.in to build on opensuse
Fabio M. Di Nitto [Wed, 14 Feb 2018 06:05:26 +0000 (07:05 +0100)]
[rpm] fixup corosync.spec.in to build on opensuse

- move dbus-devel and nss-devel BuildRequires to file based depedency.
  Those 2 BR have different names in OpenSUSE vs Fedora/RHEL/Centos.
  This is kind of controversial as most distribution prefers a package
  based build depedency, but the rpm version that supports
  BuildRequires: foo || bar
  is only available in rawhide and tumbleweed (aka no stable releases
  are shipping it yet).
  In order to build rpms in CI and have some level of flexibility
  with upstream spec file, we need to compromise a bit.

- add explicit --docdir
  OpenSUSE does not ship docs in the normal dir and their rpm macro
  does not appear to set it for us.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: Fix corrupted messages
Rytis Karpuška [Fri, 9 Feb 2018 14:00:19 +0000 (16:00 +0200)]
totempg: Fix corrupted messages

Commit 899cb299831fea479ca8bc64d99fb1fce215d795 changed copy_len
to iovec[i].iov_len, assuming,
copy_len is always the same as iovec[i].iov_len under those
circumstances, but it missed the possability of small message being
partly put at the end of packet, which cuts this message in two parts
and therefore making copy_len not equal to iovec[i].iov_len.

This is revert of 899cb299831fea479ca8bc64d99fb1fce215d795

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: use iovec[i].iov_len instead of copy_len
Rytis Karpuška [Wed, 7 Feb 2018 15:16:39 +0000 (17:16 +0200)]
totempg: use iovec[i].iov_len instead of copy_len

To be more explicit that we are copying whole message.

Related to 0ebae6b47d39940c62dcbd9185b9af2f265a47ff.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: Fix fragmentation segfault
Rytis Karpuška [Wed, 7 Feb 2018 12:44:30 +0000 (14:44 +0200)]
totempg: Fix fragmentation segfault

The problem was that two or more messages were concatenated
together during fragmentation in mcast_msg() function. In specific case,
message of just short of 1MB was provided for mcast_msg() and it
happened so, that the remainder (212 bytes to be exact) left some free
space in packet, therefore branch

  if ((copy_len + fragment_size) <
    (max_packet_size - sizeof (unsigned short))) {
...

was selected and this was the last mesage in provided iovec.
Then, on the second call, came another big message (about 300KB ) and
during fragmentation mcast.fragmented was set to 1.

On the other end, while receiving messages, due to missing
mcast.fragmentation==0 those two messages were concatenated and
therefore assembly->data array overflowed overwriting linked list
pointers and offset (which happened to be set to 0 and that 300KB
message was being copied from the beginning again).
After whole 300KB message has been sent, mcast.fragmentation==0 arrived
and totempg_deliver_fn() tried to move assembly structure to
assembly_list_free list, but as linked list pointers has been overriden,
segfault occured.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[build] fix build with non-standard knet location
Fabio M. Di Nitto [Mon, 5 Feb 2018 14:41:08 +0000 (15:41 +0100)]
[build] fix build with non-standard knet location

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[man] fix regression introduced by 7162e75dcf81b7e475536e3060bf5e9312cd43b8
Fabio M. Di Nitto [Mon, 5 Feb 2018 06:19:06 +0000 (07:19 +0100)]
[man] fix regression introduced by 7162e75dcf81b7e475536e3060bf5e9312cd43b8

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
6 years agoMan: Move overview mp to sections 3 and 7 from 8
Christoph Berg [Tue, 24 May 2016 16:49:26 +0000 (18:49 +0200)]
Man: Move overview mp to sections 3 and 7 from 8

The _overview manpages are not actually commands and hence do not belong
into manpage section 8. Move corosync_overview to section 7
("Miscellaneous") and the other *_overview pages to section 3 as they
contain API documentation (cf. string(3) for precedence of
multi-function manpages).

Signed-off-by: Christoph Berg <myon@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agologging: Close before and open blackbox after fork
Jan Friesse [Mon, 22 Jan 2018 10:17:52 +0000 (11:17 +0100)]
logging: Close before and open blackbox after fork

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agologging: Make blackbox configurable
Jan Friesse [Mon, 22 Jan 2018 09:42:25 +0000 (10:42 +0100)]
logging: Make blackbox configurable

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocorosync-notifyd: improve error handling
Andrey Ter-Zakhariants [Tue, 23 Jan 2018 04:08:27 +0000 (20:08 -0800)]
corosync-notifyd: improve error handling

Better handling of errors in _cs_cmap_members_key_changed().

Signed-off-by: Andrey Ter-Zakhariants <at1984z@live.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agospec: Modernize spec to comply with newest Fedora v2.99.0
Jan Friesse [Wed, 24 Jan 2018 15:47:01 +0000 (16:47 +0100)]
spec: Modernize spec to comply with newest Fedora

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agobuild: Remove support for upstart
Jan Friesse [Tue, 23 Jan 2018 15:22:26 +0000 (16:22 +0100)]
build: Remove support for upstart

Upstart files were already mostly removed but not from spec file and
configure.ac.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agobuild: Replace -lknet with autoconf generated vars
Jan Friesse [Tue, 23 Jan 2018 15:18:17 +0000 (16:18 +0100)]
build: Replace -lknet with autoconf generated vars

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agobuild: Remove rdma/ibverbs
Jan Friesse [Tue, 23 Jan 2018 15:09:58 +0000 (16:09 +0100)]
build: Remove rdma/ibverbs

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoqdevice: Remove qdevices
Jan Friesse [Tue, 23 Jan 2018 14:46:53 +0000 (15:46 +0100)]
qdevice: Remove qdevices

corosync-qdevice and corosync-qnetd now has a new home
https://github.com/corosync/corosync-qdevice

This will allow us to better react on actual needs of quite independent
projects.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoconfig: Don't fudge port numbers
Christine Caulfield [Tue, 16 Jan 2018 11:33:09 +0000 (11:33 +0000)]
config: Don't fudge port numbers

When I was adding knet I wanted the port numbers to default to the
base port number + the linknumber.

However I seem to have messed this up such that any port number
specified in the config file has the link number added to it. Which
is almost certainly not what people would expect.

This patch sets it right. If a port number is not specified
then 5405+linknumber is used. If a port number IS specified
then that actual number is used.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoknet: Allow ping_timers to be auto-configured
Christine Caulfield [Fri, 12 Jan 2018 16:12:38 +0000 (16:12 +0000)]
knet: Allow ping_timers to be auto-configured

knet ping_timers are auto-configured according to token value.

This patch also fixes some knet config bugs that resulted in defaults
not being applied when values were removed from corosync.conf.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocts: Make code compatible with Python 3
Bin Liu [Thu, 4 Jan 2018 03:12:41 +0000 (11:12 +0800)]
cts: Make code compatible with Python 3

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync-notifyd: make SNMP work again
Bin Liu [Tue, 5 Dec 2017 05:47:40 +0000 (13:47 +0800)]
corosync-notifyd: make SNMP work again

rrp_faulty_fn in notify_callbacks no longer exists, and now become
link_faulty_fn, and also link_faulty_fn needs 5 arguments while
rrp_faulty_fn needs 4.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agobuild: Add the headers necessary for RPM build
yuskiida [Thu, 11 Jan 2018 08:27:31 +0000 (17:27 +0900)]
build: Add the headers necessary for RPM build

Signed-off-by: yuskiida <yusk.iida@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: if local node addr is wrong, fail with a sensible message
Christine Caulfield [Mon, 8 Jan 2018 11:05:46 +0000 (11:05 +0000)]
config: if local node addr is wrong, fail with a sensible message

If no valid local address is found in corosync.conf then corosync
exits with: "parse error in config: No multicast port specified"

This is because of the config change for knet that always populates
the interfaces. The old error of "no interfaces found" was only
slightly better anyway IMHO.

This patch adds an explicit check that local_node_pos has been
set in icmap and uses that to determine if a valid local address
has been found.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemknet: Drop truncated packets on receive
Jan Friesse [Fri, 5 Jan 2018 15:19:45 +0000 (16:19 +0100)]
totemknet: Drop truncated packets on receive

This is backport of part of "totemudpu: Scale receive buffer" patch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemudp: Make use of UDP_RECEIVE_FRAME_SIZE_MAX
Jan Friesse [Fri, 5 Jan 2018 15:10:30 +0000 (16:10 +0100)]
totemudp: Make use of UDP_RECEIVE_FRAME_SIZE_MAX

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemudpu: Export and rename UDPU_FRAME_SIZE_MAX
Jan Friesse [Fri, 5 Jan 2018 15:05:36 +0000 (16:05 +0100)]
totemudpu: Export and rename UDPU_FRAME_SIZE_MAX

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemconfig: Fix UDP autogeneration of mcast addr
Jan Friesse [Fri, 5 Jan 2018 15:04:24 +0000 (16:04 +0100)]
totemconfig: Fix UDP autogeneration of mcast addr

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemudpu: Scale receive buffer
Jan Friesse [Thu, 4 Jan 2018 16:07:20 +0000 (17:07 +0100)]
totemudpu: Scale receive buffer

Receive buffer should be based on PROCESSOR_COUNT_MAX and not static
buffer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoconfig: Allow selection of crypto_model
Christine Caulfield [Fri, 5 Jan 2018 10:13:17 +0000 (10:13 +0000)]
config: Allow selection of crypto_model

KNET has options for nss or openssl crpyto libraries, make this
available to corosync.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agolibcpg: Fix issue with partial big packet assembly
Rytis Karpuška [Thu, 28 Dec 2017 13:17:12 +0000 (15:17 +0200)]
libcpg: Fix issue with partial big packet assembly

Packet assembly is done seperately for each nodeid, pid pair, therefore
multiple packets are not mixed into single buffer.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoqdevice: mv free(str) after port validation
Bin Liu [Wed, 27 Dec 2017 10:21:34 +0000 (18:21 +0800)]
qdevice: mv free(str) after port validation

in the previous code of qdevice_net_instance_init_from_cmap:
   host_port = strtol(str, &ep, 10);

   free(str);

   if (host_port <= 0 || host_port > ((uint16_t)~0) || *ep != '\0')

before free, *ep is '\0'. But after free, *ep changed to 'U', so mv
free behind the comparison.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync.aug: Add missing options
Toki Winter [Fri, 15 Dec 2017 00:49:35 +0000 (18:49 -0600)]
corosync.aug: Add missing options

Knet related options are not yet included.

Signed-off-by: Toki Winter <toki@linuxfoundation.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Allow links to have different ip_versions
Christine Caulfield [Tue, 12 Dec 2017 14:01:57 +0000 (14:01 +0000)]
config: Allow links to have different ip_versions

knet allows links to have different IP versions - proivided they
all match per link. So don't force them all to be the same.

I've added a check here to make sure that all nodes on the same
link are using the same IP version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix compile errors in qdevice and vqsim on FreeBSD
Bin Liu [Tue, 5 Dec 2017 07:28:54 +0000 (15:28 +0800)]
Fix compile errors in qdevice and vqsim on FreeBSD

Some header files need to be specified on FreeBSD, otherwise there
are compile errors. These files does not affect Linux compilation.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocmapctl: mention the Clear stats option in usage message
Christine Caulfield [Thu, 16 Nov 2017 14:30:05 +0000 (14:30 +0000)]
cmapctl: mention the Clear stats option in usage message

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync-cfgtool: refactor cli parameters handling
Bin Liu [Wed, 29 Nov 2017 06:41:37 +0000 (14:41 +0800)]
corosync-cfgtool: refactor cli parameters handling

use the idea from corosync-cmapctl to set ACTION and params in the first
swtich, and add another swtich to call function based on ACTION and the
params.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agowd: fix snprintf warnings
Bin Liu [Fri, 1 Dec 2017 02:58:50 +0000 (10:58 +0800)]
wd: fix snprintf warnings

When running ./configure --enable-watchdog, gcc 7.2.1 will report
warnings for snprintf. This patch fixes the warnings.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemsrp: Revert totemsrp_get_ifaces() changes
Christine Caulfield [Thu, 30 Nov 2017 14:56:35 +0000 (14:56 +0000)]
totemsrp: Revert totemsrp_get_ifaces() changes

In my enthusiasm for removing code while integrating knet I
also deleted the correct code for returning IP address for a node,
so that only the IP addres of the local node was ever returned.

This commit restores the the previous code.

Also, because we always return INTERFACE_MAX interfaces now (they don't
have to be contiguous) set ss_family to zero if that interface is not
in use so that downstream apps know and don't display a lot of 0.0.0.0

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: Add note about corosync not using name option
Jan Friesse [Thu, 30 Nov 2017 15:19:21 +0000 (16:19 +0100)]
man: Add note about corosync not using name option

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync.conf: publicize nodelist.node.name
Jan Pokorný [Wed, 29 Nov 2017 19:01:48 +0000 (20:01 +0100)]
corosync.conf: publicize nodelist.node.name

It was discovered that pacemaker has been occassionaly relying on
those items configured in corosync.conf (and documenting so), while
backpropagation got stuck somewhere.  As the option is deemed generally
beneficial, rectify this gap now and make it standard,
public part of the configuration space, possibly also for other
client SW to use now.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: fixes for corosync.conf man page
Bin Liu [Wed, 29 Nov 2017 08:20:03 +0000 (16:20 +0800)]
man: fixes for corosync.conf man page

1. multicast address/port is only for UDP
2. change kronosnet to Kronosnet
3. nodeid must be set with KNET, not UDPU

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocmapctl: add "-m" option into help message
Bin Liu [Wed, 29 Nov 2017 08:53:32 +0000 (16:53 +0800)]
cmapctl: add "-m" option into help message

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemconfig: remove duplicate aes256 test
Bin Liu [Wed, 29 Nov 2017 08:10:59 +0000 (16:10 +0800)]
totemconfig: remove duplicate aes256 test

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agofix output format for corosync-cfgtool with knet (#283)
Bin Liu [Mon, 27 Nov 2017 08:35:40 +0000 (16:35 +0800)]
fix output format for corosync-cfgtool with knet (#283)

Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosync: Call sync_init of all services at once
Jan Friesse [Mon, 13 Nov 2017 16:38:54 +0000 (17:38 +0100)]
sync: Call sync_init of all services at once

This patch solves situation which can happen very rearly:
- Node B is running
- Node A is started and tries to create singleton membership. It also
  initialize service S which tries to send message during initialization
- Just before node A finished move to operational state, it gets
  Node B multicast message so moves to gather state
- Node A and B creates membership and moves to operational state and
  sync is started
- Node A and B receives message sent by node A during initialization of
  service S
- Node A exits before sync of service is finished

In this situation, node B may never execute sync_init for
service S. So node B service S is not aware of existence of node A but
it received message from it.

Similar situation can theoretically also happen during merge.

Solution is to change flow of sync, so now it looks like:

- Build service_list
- Call sync_init for all local services
- Send service_list
- Receive service_list from all members and send barier
- For all services:
  - Receive barier
  - Call sync_activate if this is not first service
  - Call sync_process for next service or finish sync if previous
    this service is the last one
  - Send barier

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosync: Remove unneeded determine sync code
Jan Friesse [Mon, 13 Nov 2017 15:05:26 +0000 (16:05 +0100)]
sync: Remove unneeded determine sync code

Code was used for compatibility with old sync v1 (in needle this was
deleted and previous version 2 became v1), and it's no longer needed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agostats: Add some missing knet stats
Christine Caulfield [Tue, 14 Nov 2017 10:28:42 +0000 (10:28 +0000)]
stats: Add some missing knet stats

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: Add note about qdevice parallel cmds start
Jan Friesse [Tue, 14 Nov 2017 08:39:58 +0000 (09:39 +0100)]
man: Add note about qdevice parallel cmds start

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: corosync-qdevice: some more stylistics
Jan Pokorný [Thu, 9 Nov 2017 15:55:57 +0000 (16:55 +0100)]
man: corosync-qdevice: some more stylistics

Following the well-used scheme:
- expressly given defaults: italics (underline in standard terminals)
- key cross-references: bold (as well as the originals)

+ fix missing paragraph delimiters
+ s/what/which/ and s/on/one/ where appropriate

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agosystemd: corosync-qdevice can not run without corosync
Ferenc Wágner [Thu, 9 Nov 2017 10:29:58 +0000 (11:29 +0100)]
systemd: corosync-qdevice can not run without corosync

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: corosync-qdevice: fix formatting vs. punctuation
Jan Pokorný [Wed, 8 Nov 2017 11:42:47 +0000 (12:42 +0100)]
man: corosync-qdevice: fix formatting vs. punctuation

Previously, some enumerations were hard to follow, as they were marked
up all at once, including punctuation and connectives. Also mark up
some expressly given defaults.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfigure: kill off INITWRAPPERSDIR
Ferenc Wágner [Mon, 30 Oct 2017 20:55:06 +0000 (21:55 +0100)]
configure: kill off INITWRAPPERSDIR

When configured for systemd, don't install the SysV init scripts at all.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync-qdevice: send startup notification to systemd
Ferenc Wágner [Tue, 8 Nov 2016 21:36:53 +0000 (22:36 +0100)]
corosync-qdevice: send startup notification to systemd

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync-qnetd: send startup notification to systemd
Ferenc Wágner [Mon, 19 Dec 2016 13:27:08 +0000 (14:27 +0100)]
corosync-qnetd: send startup notification to systemd

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoSend corosync-notifyd startup notification to systemd
Ferenc Wágner [Mon, 30 Oct 2017 21:12:14 +0000 (22:12 +0100)]
Send corosync-notifyd startup notification to systemd

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoMake systemd stop corosync-notifyd if corosync is stopped
Ferenc Wágner [Mon, 30 Oct 2017 21:12:09 +0000 (22:12 +0100)]
Make systemd stop corosync-notifyd if corosync is stopped

Otherwise is just exits successfully (which should probably be fixed
eventually), leading to confusion.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync.spec: Add system-devel build requirement
Jan Friesse [Wed, 8 Nov 2017 14:39:34 +0000 (15:39 +0100)]
corosync.spec: Add system-devel build requirement

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Ferenc Wágner <wferi@debian.org>
6 years agoSend corosync startup notification to systemd
Ferenc Wágner [Mon, 30 Oct 2017 21:11:56 +0000 (22:11 +0100)]
Send corosync startup notification to systemd

This enables starting the daemon directly in the service file, because
dependent units won't be started until initialization is complete.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoquorumtool: Use full buffer size in snprintf
Jan Friesse [Tue, 7 Nov 2017 14:55:30 +0000 (15:55 +0100)]
quorumtool: Use full buffer size in snprintf

Thanks Bin Liu <bliu@suse.com> for this patch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocpghum: Mark print/log functions with printf attr
Jan Friesse [Tue, 7 Nov 2017 14:54:36 +0000 (15:54 +0100)]
cpghum: Mark print/log functions with printf attr

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocpg_test_agent: Fix snprintf compiler warnings
Jan Friesse [Tue, 7 Nov 2017 14:53:45 +0000 (15:53 +0100)]
cpg_test_agent: Fix snprintf compiler warnings

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosam: Fix snprintf compiler warnings
Jan Friesse [Tue, 7 Nov 2017 14:53:21 +0000 (15:53 +0100)]
sam: Fix snprintf compiler warnings

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Do not convert empty uid, gid to 0
Jan Friesse [Mon, 6 Nov 2017 08:22:41 +0000 (09:22 +0100)]
coroparse: Do not convert empty uid, gid to 0

When uid (or gid) value was empty string it was incorrectly converted to
0. Solution is to check input string emptines.

Thanks Bin Liu <bliu@suse.com> for reporting the bug.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Bin Liu <bliu@suse.com>
6 years agocmapctl: Add option to clear the stats
Christine Caulfield [Thu, 2 Nov 2017 16:01:36 +0000 (16:01 +0000)]
cmapctl: Add option to clear the stats

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agostats: Don't display errors when reading knet stat
Christine Caulfield [Thu, 2 Nov 2017 13:16:00 +0000 (13:16 +0000)]
stats: Don't display errors when reading knet stat

Only add the knet handle stat keys if we are actually running knet. This
prevents errors occurring when iterating through all of the stats keys

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agomake the output of "corosync-cfgtool -s" more readable (#269)
Bin Liu [Fri, 3 Nov 2017 09:50:29 +0000 (17:50 +0800)]
make the output of "corosync-cfgtool -s" more readable (#269)

6 years agocfg: nodeid should be unsigned int
Bin Liu [Wed, 1 Nov 2017 08:23:41 +0000 (16:23 +0800)]
cfg: nodeid should be unsigned int

nodeid in struct req_lib_cfg_get_node_addrs is "unsigned int",
so the function corosync_cfg_get_node_addrs should have its param
"nodeid" to be unsigned int.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoquorumtool: remove duplicated help message
Bin Liu [Wed, 1 Nov 2017 03:30:54 +0000 (11:30 +0800)]
quorumtool: remove duplicated help message

Option "-p" was included twice, so remove one of them.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: fix cpg_mcast_joined.3.in
Jonathan Davies [Wed, 1 Nov 2017 14:36:40 +0000 (14:36 +0000)]
man: fix cpg_mcast_joined.3.in

Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoman: Add stats.clear keys to the cmap_keys man pg
Christine Caulfield [Tue, 31 Oct 2017 11:47:41 +0000 (11:47 +0000)]
man: Add stats.clear keys to the cmap_keys man pg

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agostats: Add cmap key to clear the various stats.
Christine Caulfield [Tue, 31 Oct 2017 10:54:43 +0000 (10:54 +0000)]
stats: Add cmap key to clear the various stats.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>