]> git.proxmox.com Git - mirror_corosync.git/log
mirror_corosync.git
5 years agoman: Add instructions for adding/removing nodes v2.99.5
Christine Caulfield [Thu, 6 Dec 2018 09:47:04 +0000 (09:47 +0000)]
man: Add instructions for adding/removing nodes

This replaces the 'cmaptool' method previously documented
in cmap_keys.8

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agoconfig: Disallow corosync-cmapctl updates of nodelist
Christine Caulfield [Tue, 4 Dec 2018 15:31:24 +0000 (15:31 +0000)]
config: Disallow corosync-cmapctl updates of nodelist

It didn't work anyway (the config system requires whole links
to be configured at once) and caused crashes.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agoconfig: Report IP addr/nodename parse errors back
Christine Caulfield [Mon, 3 Dec 2018 15:25:05 +0000 (15:25 +0000)]
config: Report IP addr/nodename parse errors back

Corosync used to just ignore parse errors so that un-resolved names
could cause silent failures. We now always check the result from
totemip_parse() and at least print something in syslog.

There's also a little get-out here that allows you to correct
a bad node address without having to destroy and recreate the
whole link. I'm being nice to you.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agocoroparse: Remove unused cs_err initialization
Jan Friesse [Fri, 23 Nov 2018 15:00:00 +0000 (16:00 +0100)]
coroparse: Remove unused cs_err initialization

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agocpghum: Check cpg_local_get return code
Jan Friesse [Fri, 23 Nov 2018 14:47:31 +0000 (15:47 +0100)]
cpghum: Check cpg_local_get return code

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agotestcpg2: Check cpg_dispatch return code
Jan Friesse [Fri, 23 Nov 2018 14:44:15 +0000 (15:44 +0100)]
testcpg2: Check cpg_dispatch return code

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agonotifyd: Delete registered tracking keys v2.99.4
Jan Friesse [Thu, 15 Nov 2018 16:02:22 +0000 (17:02 +0100)]
notifyd: Delete registered tracking keys

Forward port of needle 70fd66767494872b93018949d685f19482cd5bec by Hideo
Yamauchi <renayama19661014@ybb.ne.jp>.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agostats: Fix delete of track
Jan Friesse [Thu, 15 Nov 2018 15:54:47 +0000 (16:54 +0100)]
stats: Fix delete of track

When cmap_track_delete was called to stats map (cmap created with
CMAP_MAP_STATS parameter) result was always ERR_BAD_HANDLE.

It turned out that corosync part of cmap is always calling icmap
function to get user data (where required hdb handle is stored)
instead of generalized map_fns.

After fixing this issue, valgrind showed error about jump depending on
unitialized data in stats_map_track_delete. Solution seems to be always
initialize tracker->events (so not only when track_type is add or
delete).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoinit: Fix init script to work with containers
Jan Friesse [Mon, 3 Sep 2018 15:04:23 +0000 (17:04 +0200)]
init: Fix init script to work with containers

Previously init scripts were not using pid file so pidof was used. This
is usually not a problem, but when containers are used it may result to
killing improper instance when issued on host.

Solution is to always use pidfile.

Also try to use LSB complaint status codes.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agomain: Remove COROSYNC_RUN_DIR
Jan Friesse [Tue, 13 Nov 2018 16:32:43 +0000 (17:32 +0100)]
main: Remove COROSYNC_RUN_DIR

Remove last used environment variable (reasons similar to removal of
COROSYNC_MAIN_CONFIG_FILE).

This environment variable was never documented, so document it properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoman: Describe nodelist.node.name properly
Jan Friesse [Tue, 13 Nov 2018 16:14:17 +0000 (17:14 +0100)]
man: Describe nodelist.node.name properly

Old description is no longer true, because with knet transport name got
new and very important role.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agomain: Remove COROSYNC_TOTEM_AUTHKEY_FILE
Jan Friesse [Tue, 13 Nov 2018 15:53:43 +0000 (16:53 +0100)]
main: Remove COROSYNC_TOTEM_AUTHKEY_FILE

Remove another environment variable (reasons similar to removal of
COROSYNC_MAIN_CONFIG_FILE).

Also properly document both totem.keyfile and totem.key.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agomain: Replace COROSYNC_MAIN_CONFIG_FILE
Jan Friesse [Mon, 12 Nov 2018 17:35:45 +0000 (18:35 +0100)]
main: Replace COROSYNC_MAIN_CONFIG_FILE

COROSYNC_MAIN_CONFIG_FILE environment variable was quite well hidden
and it was never used by init script. It also makes quite hard to debug
possible problems.

Replace it by -c option.

Also patch makes use of configuration file path as a base for uidgid.d
directory, so it's no longer needed to keep uidgid.d in sysconfdir.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agomain: Move sched paramaters to config file
Jan Friesse [Mon, 12 Nov 2018 14:46:14 +0000 (15:46 +0100)]
main: Move sched paramaters to config file

The reason for this change is, that number of corosync CLI options
kind of exploded and scheduler based one are really beter to be kept in
config file.

Nice side-effect of this move is better "integration" with systemd,
because currently used EnvironmentFile should be really used for
environment and not that much for passing extra options to CLI.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoconfigure: move to AC_COMPILE_IFELSE
Jan Friesse [Wed, 7 Nov 2018 14:12:10 +0000 (09:12 -0500)]
configure: move to AC_COMPILE_IFELSE

from AC_PREPROC_IFELSE which is strongly discouraged.

Our detection system was very weak and recent versions of clang did
show that PREPROC_IFELFE (cpp) would enable warning options that
the compiler does not support (clang).

Use a full compilation test to detect what works and what doesn't.

Based on knet patch 88491f27375a9e8aceb946853a1abf4d23ebb8f3.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
5 years agologsys: Make hires timestamp default
Jan Friesse [Fri, 26 Oct 2018 13:54:03 +0000 (15:54 +0200)]
logsys: Make hires timestamp default

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agologsys: Support hires timestamp
Jan Friesse [Fri, 26 Oct 2018 13:39:10 +0000 (15:39 +0200)]
logsys: Support hires timestamp

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agototemconfig: Fix logging of freed string
Jan Friesse [Fri, 26 Oct 2018 13:00:36 +0000 (15:00 +0200)]
totemconfig: Fix logging of freed string

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoconfig: Allow generated nodeis for UDP & UDPU
Christine Caulfield [Thu, 25 Oct 2018 10:29:33 +0000 (11:29 +0100)]
config: Allow generated nodeis for UDP & UDPU

The conversion to the new srp_addr format broke the feature where
UDP/UDPU nodes could get their nodeids generated from the IP address.

A big part of this was the removal of mandatory ring0_addr - it was used
as a placeholder when reading down the nodelist. I replaced this with
nodeid thinking that nodeid was now mandatory, forgetting this use case.
So the compare on "ring0_addr" or "nodeid" is now replaced with a more
robust check that we're only reading keys from the same node_pos once,
this was needed in votequorum.c as well as totemconfig.c

Another tidying side-effect of this patch is that the nodeid generation
is now all in a single routine in totemconfig.c and not shared between
it and totemip.c.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agoconfig example: Migrate to newer syntax
Jan Friesse [Mon, 22 Oct 2018 10:19:13 +0000 (12:19 +0200)]
config example: Migrate to newer syntax

Default config is knet with nodelist so extra udpu example is no longer
needed.

XML variant of corosync config never got expected usage, so delete
example config too.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agolog: Implement support for reopening log files
Jan Friesse [Mon, 8 Oct 2018 13:00:11 +0000 (15:00 +0200)]
log: Implement support for reopening log files

Feature depends on existence of libqb function qb_log_file_reopen.

New function call is added into CFG service API. This function is
used by corosync-cfgtool which now accepts -L parameter.

Finally, logrotate "postrotate" script is calling
corosync-cfgtool -L to notify corosync, instead of using
copytruncate option.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agototemconfig: Replace strcpy by strncpy
Jan Friesse [Tue, 16 Oct 2018 08:28:56 +0000 (10:28 +0200)]
totemconfig: Replace strcpy by strncpy

Formally not needed, because totemip_print should not return string
longer than INET6_ADDRSTRLEN, but static analysis tools are not capable
of such conclusion.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoconfig: Fix crash in reload if new interfaces are added
Christine Caulfield [Mon, 15 Oct 2018 12:52:03 +0000 (13:52 +0100)]
config: Fix crash in reload if new interfaces are added

This is a bug I seem to have introduced in
429209f4aa3c55504a49833e0004489f241e7819 where we compare links
for changes. if a new node was added on an existing link then it
was compared against a non-existant one in the previous configuration.
We now only compare nodes that are in both interfaces.

As I needed min() for this function, I moved it from individual
.c files into util.h so we only have one copy.

And the error message was fixed.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agoman: Fix default knet_pmtud_interval to match code
Jan Friesse [Tue, 2 Oct 2018 15:54:55 +0000 (17:54 +0200)]
man: Fix default knet_pmtud_interval to match code

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agobuild: Remove totempg shared library leftovers
Jan Friesse [Wed, 26 Sep 2018 12:01:45 +0000 (14:01 +0200)]
build: Remove totempg shared library leftovers

Because totempg is not distributed it doesn't make sense to distribute
totem header files. Also pkgconfig file should not be created any more.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agobuild: Do not compile totempg as a shared library
Jan Friesse [Mon, 24 Sep 2018 13:05:02 +0000 (15:05 +0200)]
build: Do not compile totempg as a shared library

Instead of compiling totempg as a shared library, compile all totem code
directly into corosync binary.

Main idea of having totempg which may be
used in other projects was nice, but never really finished (and as far
as I know no project were ever really using it). So at the end of the
day, we've end with huge amount of problems (need to pass new arguments
thru X layers, hard debugging, ...) without any real benefit.

For a future version, we may consider to revisit idea of split totemsrp
into well tested library without unrelated bits like transports/ip/...

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
5 years agoman: Fix typo conains -> contains
Ferenc Wágner [Sun, 23 Sep 2018 19:36:08 +0000 (21:36 +0200)]
man: Fix typo conains -> contains

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
5 years agoman: Fix typo connnections -> connections
Ferenc Wágner [Sun, 23 Sep 2018 19:35:23 +0000 (21:35 +0200)]
man: Fix typo connnections -> connections

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agobuild: Remove NSS dependencies
Jan Friesse [Fri, 14 Sep 2018 11:15:08 +0000 (13:15 +0200)]
build: Remove NSS dependencies

Complete removal of NSS from corosync tree. Most of the changes are
in build system and cpgverify had to be rewritten to use crc32 instead
of sha1.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocts: Remove CTS
Jan Friesse [Thu, 13 Sep 2018 15:50:36 +0000 (17:50 +0200)]
cts: Remove CTS

There are several reasons for removal of CTS:
1. It's not actively maintained
2. It's quite hard to setup
3. It has hard to fix bug in it's design (syslog messages are thrown by
   rsyslog (configurable), journald (configurable) or when rsyslog is
   used together with journald (non configurable)) so test
   can fail just because of lost message.
4. It depends on pacemaker CTS, which is changed quite often
5. CTS itself is great tool for Pacemaker
   (shutdown/startup of the node), but Corosync has a slightly
   different needs
6. Bin Liu <bliu@suse.com> made a heroic effort to port it to Python 3
   (huge thanks), but it's still not fully complete

All and all, if somebody is interested in maintaining CTS code, please
create repository similar to corosync flatiron cts and let us know.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoman: Fix crypto_hash and crypto_cipher defaults
Jan Friesse [Wed, 12 Sep 2018 08:00:50 +0000 (10:00 +0200)]
man: Fix crypto_hash and crypto_cipher defaults

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Fix newly introduced warning
Jan Friesse [Fri, 7 Sep 2018 14:51:18 +0000 (10:51 -0400)]
coroparse: Fix newly introduced warning

Small fix for a problem introduced by "coroparse: Use key_name for error
message" patch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
6 years agoAdd option to force cluster into GATHER state
Chris Walker [Sun, 19 Aug 2018 20:21:07 +0000 (16:21 -0400)]
Add option to force cluster into GATHER state

Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocoroparse: Use key_name for error message
Jan Friesse [Wed, 5 Sep 2018 13:15:33 +0000 (15:15 +0200)]
coroparse: Use key_name for error message

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Add file name and line to error message
Jan Friesse [Thu, 23 Aug 2018 13:55:27 +0000 (15:55 +0200)]
coroparse: Add file name and line to error message

It's just much easier to find out what is happening when message like

parser error: /etc/corosync/corosync.conf:39: Unexpected closing brace

is logged instead of

parser error: Unexpected closing brace

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Be more strict in what is parsed
Jan Friesse [Thu, 23 Aug 2018 13:06:19 +0000 (15:06 +0200)]
coroparse: Be more strict in what is parsed

Corosync parser is not very clever, but it is able to detect more errors
without too much code.

1. Check if section name is not empty (just '{' character)
2. Check if there is no extra characters after opening bracket '{'
3. Check if there is no extra characters after or before closing bracket
   '}'
4. Check if line is opening section, closing section or key/value

So following examples are reported as error:

totem {
    version: 2
}}}}}}}}}}

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Fix remove_whitespace end condition
Jan Friesse [Thu, 23 Aug 2018 12:18:40 +0000 (14:18 +0200)]
coroparse: Fix remove_whitespace end condition

When remove_whitespace function parameter is single character string
with whitespaces (like a:) then colon is not removed. Reason is end
condition end != start, which is valid for empty string, but invalid in
case described above. Solution is to check if *end is '\0'.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Check icmap_set results
Jan Friesse [Thu, 23 Aug 2018 11:49:46 +0000 (13:49 +0200)]
coroparse: Check icmap_set results

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocoroparse: Return error if config line is too long
Jan Friesse [Thu, 23 Aug 2018 11:02:13 +0000 (13:02 +0200)]
coroparse: Return error if config line is too long

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agonotifyd: Propagate error to exit code
Jan Friesse [Mon, 27 Aug 2018 15:47:58 +0000 (17:47 +0200)]
notifyd: Propagate error to exit code

When it's impossible to dispatch cmap/quorum messages exit code of
corosync-notifyd shouldn't be success.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agogit-version-gen: Fail on UNKNOWN version
Jan Friesse [Fri, 31 Aug 2018 12:52:24 +0000 (14:52 +0200)]
git-version-gen: Fail on UNKNOWN version

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agobuild: Support for git archive stored tags
Jan Friesse [Fri, 24 Aug 2018 08:30:13 +0000 (10:30 +0200)]
build: Support for git archive stored tags

Attempt to solve problem with git archive generated tarballs
(used for example by github when release is downloaded) which are no
longer git tree and (in contrast to officially released tarballs) also
doesn't contain .tarball-version file so git-version-gen script simply
cannot obtain valid version info.

Solution is based on using gitattributes which is instructs git to
replace string in the .gitarchivever file by known ref names.
git-version-gen is enhanced to support this file and tries to parse
any string which looks like "tag: v[0-9]+.[0-9]+.[0-9]". If such string
is found it's used as a version. This file is used as a last attempt and
other methods (.tarball-version, git abbrev) have precedence.

Based on idea stated by Jan Pokorný <jpokorny@redhat.com>.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoman: fix cmap key name runtime.config.totem.token
Ferenc Wágner [Mon, 20 Aug 2018 10:00:00 +0000 (12:00 +0200)]
man: fix cmap key name runtime.config.totem.token

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoRemove libcgroup
Jan Friesse [Tue, 14 Aug 2018 11:56:46 +0000 (07:56 -0400)]
Remove libcgroup

Libcgroup is deprecated and not shipping with new distributions
(OpenSuSE is one example). Solution is to have a partial implementation
of required functionality of libcgroup in the corosync code.

Patch uses hardcoded cgroup mount point, because most of the systems are
now systemd and systemd is also using hardcoded mountpoint (see
https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c)

Configuration option --enable-cgroup is gone, because it's not needed
any longer.

Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of
simplified implementation of cgroup management code primitives.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosystemd: prevent redundancy in journal
Jan Pokorný [Mon, 13 Aug 2018 23:18:47 +0000 (01:18 +0200)]
systemd: prevent redundancy in journal

Originating from a dual sink (stderr and syslog).

Annotated example from "journalctl -b --no-hostname -u corosync":

Aug 14 00:27:45 corosync[5203]:  [MAIN  ] Corosync Cluster
Engine ('2.99.3'): started and ready to provide service.
  ^ from syslog source
Aug 14 00:27:45 corosync[5203]: notice  [MAIN  ] Corosync Cluster
Engine ('2.99.3'): started and ready to provide service.
  ^ from stderr source

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoAdd token_warning configuration option
Chris Walker [Fri, 27 Jul 2018 05:06:32 +0000 (01:06 -0400)]
Add token_warning configuration option

Token_warning is used to present information about
when the token was last received.

Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocorosync-notifyd: Rename global local_nodeid
Jan Friesse [Fri, 10 Aug 2018 11:22:34 +0000 (07:22 -0400)]
corosync-notifyd: Rename global local_nodeid

To prevent warning in functions where local_nodeid is also passed as
local parameter.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Add assert into memb_lowest_in_config
Jan Friesse [Fri, 10 Aug 2018 11:14:15 +0000 (07:14 -0400)]
totemsrp: Add assert into memb_lowest_in_config

Add assert when there are no members in token_memb structure so
non-existing member is not accessed (token should always have
at least one member).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemconfig: Enlarge error_string_response
Jan Friesse [Fri, 10 Aug 2018 11:10:39 +0000 (07:10 -0400)]
totemconfig: Enlarge error_string_response

... so error_reason can be fully included into parse error message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoipc_glue: Fix strncpy in pid_to_name function
Jan Friesse [Fri, 10 Aug 2018 11:09:50 +0000 (07:09 -0400)]
ipc_glue: Fix strncpy in pid_to_name function

Trailing zero is always added so there is no need to have a warning
about unterminated destination string.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocmap: Fix strncpy warning in cmap_iter_next
Jan Friesse [Fri, 10 Aug 2018 10:49:43 +0000 (06:49 -0400)]
cmap: Fix strncpy warning in cmap_iter_next

cmap_iter_next in contrast of it's icmap counterpart copies key name
into user preallocated space. In the worst case, key name may be
CMAP_KEYNAME_MAXLEN, so cmap_iter_next then need CMAP_KEYNAME_MAXLEN +
additional byte to store zero. strncpy was copying only
CMAP_KEYNAME_MAXLEN characters so there was possibility of unterminated
string.

Patch solves this by using memcpy and always add trailing zero.
Documentation was improved suggesting minimum size of keyname buffer to
be CMAP_KEYNAME_MAXLEN + 1.

Also sam and quorumtool were using too short buffer so they are fixed too.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoutil: Fix strncpy in setcs_name_t function
Jan Friesse [Fri, 10 Aug 2018 10:42:25 +0000 (06:42 -0400)]
util: Fix strncpy in setcs_name_t function

Trailing zero is always added so there is no need to have a warning
about unterminated destination string.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemknet: Free instance on failure exit
Jan Friesse [Fri, 10 Aug 2018 10:41:48 +0000 (06:41 -0400)]
totemknet: Free instance on failure exit

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agospec: Add explicit gcc build requirement
Jan Friesse [Thu, 9 Aug 2018 14:56:36 +0000 (16:56 +0200)]
spec: Add explicit gcc build requirement

Also remove %clean macro which is not needed for ages.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
6 years agoAdd option for quiet operation to corosync-cmapctl
Chris Walker [Mon, 6 Aug 2018 22:51:02 +0000 (18:51 -0400)]
Add option for quiet operation to corosync-cmapctl

Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemudpu: Pass correct paramto totemip_nosigpipe v2.99.3
Jan Friesse [Thu, 12 Jul 2018 14:27:57 +0000 (16:27 +0200)]
totemudpu: Pass correct paramto totemip_nosigpipe

Fixes compilation on (at least) FreeBSD.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemudpu: Add local loop support
Bin Liu [Thu, 12 Jul 2018 11:44:21 +0000 (13:44 +0200)]
totemudpu: Add local loop support

This patch intends to solve long time ifdown corosync problem. Idea is
to use local socket for sending both unicast and multicast messages if
interface is down.

Together with testing what is current bind state it's possible to keep
pretending existence of old IP address instead of rebinding to localhost
what breaks a lot things badly.

Heavilly based on Yu, Zou <zouyu@shiqichuban.com> work and it's
basically port of UDP patch created by
Jan Friesse <jfriesse@redhat.com>.

(ported from needle 96354fba72b7e7065610f37df0c0547b1e93ad51)

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Fail config validation if not all nodes have all links
Christine Caulfield [Fri, 29 Jun 2018 12:37:01 +0000 (13:37 +0100)]
config: Fail config validation if not all nodes have all links

KNET requires that all links be full-mesh (this may change in the future
but almost certainly not before knet 2.0), so enforce this in the
config.

Also avoid a potential div-by-0 error if the local node is not fully
configured either.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Enforce use of 'name' node attribute in multi-link clusters
Christine Caulfield [Tue, 12 Jun 2018 10:15:58 +0000 (11:15 +0100)]
config: Enforce use of 'name' node attribute in multi-link clusters

If the local host does not have a 'name' attribute and the cluster
has more than one link then fail the validation test.

I'm open to the idea of checking all of the nodes in the nodelist
if necessary. It seems overkill as each node will check its own
entry though.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemconfig: Check for things that cannot be changed on the fly
Christine Caulfield [Tue, 15 May 2018 13:54:13 +0000 (14:54 +0100)]
totemconfig: Check for things that cannot be changed on the fly

There are a few things in the interface that cannot be changed on the
fly. Warn about them and tell the user that these things need to be done
in two steps and why.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix snprintf warnings
Jan Friesse [Wed, 2 May 2018 12:39:07 +0000 (14:39 +0200)]
Fix snprintf warnings

Compiler shows warnings about possible not large enough buffer, so check
snprintf return value properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoinit: Use existing env variable from sysconf
Jan Friesse [Wed, 2 May 2018 09:09:44 +0000 (11:09 +0200)]
init: Use existing env variable from sysconf

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoupstart: Remove notifyd upstart unit
Jan Friesse [Wed, 2 May 2018 09:00:37 +0000 (11:00 +0200)]
upstart: Remove notifyd upstart unit

Hopefully this is last upstart bit.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoknet: Don't try to create loopback interface twice
Christine Caulfield [Mon, 4 Jun 2018 14:51:22 +0000 (15:51 +0100)]
knet: Don't try to create loopback interface twice

It wasn't hardmful, but it generated an annoying message

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoknet: Fix knet log buffer size
Christine Caulfield [Fri, 4 May 2018 09:39:53 +0000 (10:39 +0100)]
knet: Fix knet log buffer size

knet sends log messages as struct knet_log_msg, not a string
of KNET_MAX_LOG_MSG_SIZE (which is only part of that structure).
So we were both losing and corrupting messages.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocpg: Inform clients about left nodes during pause v2.99.2
Jan Friesse [Tue, 24 Apr 2018 15:44:48 +0000 (17:44 +0200)]
cpg: Inform clients about left nodes during pause

Patch tries to fix incorrect behaviour during following test-case:
- 3 nodes
- Node 1 is paused
- Node 2 and 3 detects node 1 as failed and informs CPG clients
- Node 1 is unpaused
- Node 1 clients are informed about new membership, but not about Node 1
  being paused, so from Node 1 point-of-view, Node 2 and 3 failure

Solution is to:
- Remove downlist master choose and always choose local node downlist.
  For Node 1 in example above, downlist contains Node 2 and 3.
- Keep code which informs clients about left nodes
- Use joinlist as a authoritative source of nodes/clients which exists
  in membership

This patch doesn't break backwards compatibility.

I've walked thru all the patches which changed behavior of cpg to ensure
patch does not break CPG behavior. Most important were:
058f50314cd20abe67f5e8fb3c029a63b0e10cdc - Base. Code was significantly
  changed to handle double free by split group_info into two structures
  cpg_pd (local node clients) and process_info (all clients). Joinlist
  was
97c28ea756cdf59316b2f609103122cc678329bd - This patch removed
  confchg_fn and made CPG sync correct
feff0e8542463773207a3b2c1f6004afba1f58d5 - I've tested described
  behavior without any issues
6bbbfcb6b4af72cf35ab9fdb4412fa6c6bdacc12 - Added idea of using
  heuristics to choose same downlist on all nodes. Sadly this idea
  was beginning of the problems described in
  040fda8872a4a20340d73fa1c240b86afb2489f8,
  ac1d79ea7c14997353427e962865781d0836d9fa,
  559d4083ed8355fe83f275e53b9c8f52a91694b2,
  02c5dffa5bb8579c223006fa1587de9ba7409a3d,
  64d0e5ace025cc929e42896c5d6beb3ef75b8244 and
  b55f32fe2e1538db33a1ec584b67744c724328c6
02c5dffa5bb8579c223006fa1587de9ba7409a3d - Made joinlist as
  authoritative source of nodes/clients but left downlist_master_choose
  as a source of information about left nodes

Long story made short. This patch basically reverts
idea of using heuristics to choose same downlist on all nodes.

(ported from needle 9c2a97f4f96b9639d07e2a9fe378c28ab1963191)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoman: Make the manpages reproducible
Chris Lamb [Sat, 21 Apr 2018 06:13:33 +0000 (08:13 +0200)]
man: Make the manpages reproducible

Whilst working on the Reproducible Builds effort [0], we noticed
that corosync could not be built reproducibly.

This is because, whilst it uses SOURCE_DATE_EPOCH[1], the output
varies depending on the current timezone.

(The LC_ALL is not needed as we only use %Y-%m-%d)

This was originally filed in Debian as #896441.

 [0] https://reproducible-builds.org/
 [1] https://reproducible-builds.org/specs/source-date-epoch/
 [2] https://bugs.debian.org/896441

Signed-off-by: Chris Lamb <lamby@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemsrp: Fix leave message regression
Jan Friesse [Thu, 19 Apr 2018 10:52:39 +0000 (12:52 +0200)]
totemsrp: Fix leave message regression

Leave message in totem is just join message where leaving member is
excluded from member list and included in fail list. It also contains
special nodeid in header.nodeid and system_from.nodeid fields.

Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions
were using system_from addresses and not nodeid, which was used only in
one specific case for memb_consensus_set function.

After the patch, addresses are gone and only nodeid is used. Result is,
that leaving node nodeid is not added into local fail list
(my_faillist) so node is unable to reach consensus till token timeout,
which starts new gather process.

Solution is to send valid leaving node nodeid in system_from.nodeid and
handle specific case for memb_consensus_set in memb_join_process.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Log proc/fail lists in memb_join_process
Jan Friesse [Thu, 19 Apr 2018 10:49:12 +0000 (12:49 +0200)]
totemsrp: Log proc/fail lists in memb_join_process

These information are useful and with trace log level they should not be
too much irritating.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Fix srp_addr_compare
Jan Friesse [Wed, 18 Apr 2018 13:34:04 +0000 (15:34 +0200)]
totemsrp: Fix srp_addr_compare

There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch
in srp_addr_compare function. This function should be usable with qsort,
so it should return values less than, equal to or greater than zero. It
was however returning only zero or negation of a zero. Final results
were unable to reach consensus in following test case:
- 3 node cluster
- start nodes 1, 2, 3
- shutdown node 3
- start node 3
- shutdown node 2
- start node 2
- shutdown node 1

After this steps, node 2 and 3 were unable to reach consensus.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agotools: don't distribute what we can easily make
Ferenc Wágner [Fri, 20 Apr 2018 08:44:52 +0000 (10:44 +0200)]
tools: don't distribute what we can easily make

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoDrop all references to SECURITY file
Fabio M. Di Nitto [Mon, 23 Apr 2018 03:31:33 +0000 (05:31 +0200)]
Drop all references to SECURITY file

File was removed by 6bdf0962ad035ac659bcbf36a918fe39931ed75d.
Patch fixes master branch build again.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoSECURITY: Remove SECURITY file
Jan Friesse [Fri, 20 Apr 2018 13:28:13 +0000 (15:28 +0200)]
SECURITY: Remove SECURITY file

Basically no information from SECURITY file is valid.

Library interface and related uidgid are better described in manpages.

LibNSS is not directly used any longer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoNSS_NoDB_Init: the parameter is reserved, must be NULL
Ferenc Wágner [Wed, 18 Apr 2018 16:17:41 +0000 (18:17 +0200)]
NSS_NoDB_Init: the parameter is reserved, must be NULL

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix typo: defualt -> default
Ferenc Wágner [Wed, 18 Apr 2018 17:06:50 +0000 (19:06 +0200)]
Fix typo: defualt -> default

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoFix typo: sucesfully -> successfully
Ferenc Wágner [Tue, 10 Apr 2018 10:47:49 +0000 (12:47 +0200)]
Fix typo: sucesfully -> successfully

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototemsrp: Check join and leave msg length
Jan Friesse [Wed, 11 Apr 2018 14:15:01 +0000 (16:15 +0200)]
totemsrp: Check join and leave msg length

If number of proc_list, failed_list or active members is too high it
may be impossible to put them into message, which is allocated on the
stack what results in stack corruption.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Implement sanity checks of received msgs
Jan Friesse [Wed, 11 Apr 2018 14:12:43 +0000 (16:12 +0200)]
totemsrp: Implement sanity checks of received msgs

Sanity checkers are used to prevent crashing because of
accessing unallocated memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocpg: Handle fragmented message sending interrupt
Rytis Karpuška [Tue, 27 Mar 2018 12:01:36 +0000 (15:01 +0300)]
cpg: Handle fragmented message sending interrupt

It turns out that there are some legitimate cases where fragmented
messages might be interrupted during sending (e.g. CS_ERR_TRY_AGAIN or
as in my case: CS_ERR_INTERRUPT). This creates a situation where
LIBCPG_PARTIAL_FIRST is sent multiple times before receiving
LIBCPG_PARTIAL_LAST.

Solution is to drop incomplete message and start assembly of new message
as libcpg should have reported error during sending of that
incomplete message.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Display IP of sender v2.99.1
Jan Friesse [Wed, 14 Mar 2018 16:23:29 +0000 (17:23 +0100)]
totem: Display IP of sender

To make finding victim of incompatible messages easier, IP of sender is
logged. Propagating IP in layers makes patch slightly larger.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agototemsrp: Add magic and version into header
Jan Friesse [Wed, 14 Mar 2018 15:25:11 +0000 (16:25 +0100)]
totemsrp: Add magic and version into header

Magic number (0xC070) together with version in every packet
is used for detecting that other node is really
Corosync 3.x.

Endian_detector field is removed and magic number is now
used instead.

If received packet magic number differs, guessing is used to show more
about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and
unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted
Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoknet: Fix display of links with unconfigured link0
Christine Caulfield [Fri, 16 Mar 2018 10:09:39 +0000 (10:09 +0000)]
knet: Fix display of links with unconfigured link0

because totemknet always configures link0 as loopback even
if it's not known to corosync, we need to filter it
out when returning the link status, as things get misaligned
in cfg.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agomain: Set errno before calling of strtol
Jan Friesse [Fri, 2 Mar 2018 14:50:02 +0000 (15:50 +0100)]
main: Set errno before calling of strtol

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoquorumtool: Don't set our_flags without v_handle
Jan Friesse [Wed, 3 Jan 2018 14:25:20 +0000 (15:25 +0100)]
quorumtool: Don't set our_flags without v_handle

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agosam_test_agent: Remove unused assignment
Jan Friesse [Tue, 2 Jan 2018 17:17:03 +0000 (18:17 +0100)]
sam_test_agent: Remove unused assignment

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoblackbox: Quote subshell result properly
Jan Friesse [Tue, 2 Jan 2018 17:02:00 +0000 (18:02 +0100)]
blackbox: Quote subshell result properly

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agoinit: Quote subshell result properly
Jan Friesse [Fri, 2 Mar 2018 14:59:58 +0000 (15:59 +0100)]
init: Quote subshell result properly

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
6 years agocfgtool: Don't assume link ID is a single char
Christine Caulfield [Thu, 1 Mar 2018 13:48:39 +0000 (13:48 +0000)]
cfgtool: Don't assume link ID is a single char

For the moment link-ids are a single digit, but that could change and
the tools shouldn't be quite so fragile. So parse the interface_name
properly by looking for the space between the linkID and the IP.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoknet: Always use link0 for loopback
Christine Caulfield [Mon, 26 Feb 2018 16:00:24 +0000 (16:00 +0000)]
knet: Always use link0 for loopback

Even if it's not used for anything else.

Also, make cfgtool show the correct link ID when links are not
contiguous

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Fix debug warnings printed by knet
Christine Caulfield [Mon, 26 Feb 2018 14:05:40 +0000 (14:05 +0000)]
totem: Fix debug warnings printed by knet

Fix crash introduced a couple of commits ago in iface_get

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Allow use of ring0_addr
Christine Caulfield [Thu, 22 Feb 2018 13:57:50 +0000 (13:57 +0000)]
config: Allow use of ring0_addr

Allow ring0_addr to be used in place of 'name' for
backwards compatibility

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agoconfig: Update message when local host isn't found
Christine Caulfield [Thu, 22 Feb 2018 09:56:48 +0000 (09:56 +0000)]
config: Update message when local host isn't found

Make the message more representative of what's going on.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agocfg: Fix cfg_get_node_addrs so that DLM works
Christine Caulfield [Tue, 20 Feb 2018 14:27:51 +0000 (14:27 +0000)]
cfg: Fix cfg_get_node_addrs so that DLM works

Also update copyright dates

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Return interface count correctly
Christine Caulfield [Fri, 16 Feb 2018 09:43:31 +0000 (09:43 +0000)]
totem: Return interface count correctly

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototem: Use nodeid ONLY in srp_addr
Christine Caulfield [Thu, 15 Feb 2018 15:58:41 +0000 (15:58 +0000)]
totem: Use nodeid ONLY in srp_addr

This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.

This then allows us to make ring0 optional and replaceable when running
knet.

It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.

This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[rpm] use rpm macros to identify build distro
Fabio M. Di Nitto [Wed, 14 Feb 2018 08:39:23 +0000 (09:39 +0100)]
[rpm] use rpm macros to identify build distro

thanks Honza for spotting it

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years ago[rpm] fixup corosync.spec.in to build on opensuse
Fabio M. Di Nitto [Wed, 14 Feb 2018 06:05:26 +0000 (07:05 +0100)]
[rpm] fixup corosync.spec.in to build on opensuse

- move dbus-devel and nss-devel BuildRequires to file based depedency.
  Those 2 BR have different names in OpenSUSE vs Fedora/RHEL/Centos.
  This is kind of controversial as most distribution prefers a package
  based build depedency, but the rpm version that supports
  BuildRequires: foo || bar
  is only available in rawhide and tumbleweed (aka no stable releases
  are shipping it yet).
  In order to build rpms in CI and have some level of flexibility
  with upstream spec file, we need to compromise a bit.

- add explicit --docdir
  OpenSUSE does not ship docs in the normal dir and their rpm macro
  does not appear to set it for us.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: Fix corrupted messages
Rytis Karpuška [Fri, 9 Feb 2018 14:00:19 +0000 (16:00 +0200)]
totempg: Fix corrupted messages

Commit 899cb299831fea479ca8bc64d99fb1fce215d795 changed copy_len
to iovec[i].iov_len, assuming,
copy_len is always the same as iovec[i].iov_len under those
circumstances, but it missed the possability of small message being
partly put at the end of packet, which cuts this message in two parts
and therefore making copy_len not equal to iovec[i].iov_len.

This is revert of 899cb299831fea479ca8bc64d99fb1fce215d795

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: use iovec[i].iov_len instead of copy_len
Rytis Karpuška [Wed, 7 Feb 2018 15:16:39 +0000 (17:16 +0200)]
totempg: use iovec[i].iov_len instead of copy_len

To be more explicit that we are copying whole message.

Related to 0ebae6b47d39940c62dcbd9185b9af2f265a47ff.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
6 years agototempg: Fix fragmentation segfault
Rytis Karpuška [Wed, 7 Feb 2018 12:44:30 +0000 (14:44 +0200)]
totempg: Fix fragmentation segfault

The problem was that two or more messages were concatenated
together during fragmentation in mcast_msg() function. In specific case,
message of just short of 1MB was provided for mcast_msg() and it
happened so, that the remainder (212 bytes to be exact) left some free
space in packet, therefore branch

  if ((copy_len + fragment_size) <
    (max_packet_size - sizeof (unsigned short))) {
...

was selected and this was the last mesage in provided iovec.
Then, on the second call, came another big message (about 300KB ) and
during fragmentation mcast.fragmented was set to 1.

On the other end, while receiving messages, due to missing
mcast.fragmentation==0 those two messages were concatenated and
therefore assembly->data array overflowed overwriting linked list
pointers and offset (which happened to be set to 0 and that 300KB
message was being copied from the beginning again).
After whole 300KB message has been sent, mcast.fragmentation==0 arrived
and totempg_deliver_fn() tried to move assembly structure to
assembly_list_free list, but as linked list pointers has been overriden,
segfault occured.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>