git.proxmox.com Git - mirror

Set release date for 2.12.3.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

cirrus: Use FreeBSD 12.2.

FreeBSD 12.1 reached EOL and our builds are failing on Cirrus CI.
Updating to 12.2 - current production release.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

flow: Support extra padding length.

Although not required, padding can be optionally added until
the packet length is MTU bytes. A packet with extra padding
currently fails sanity checks.

Vulnerability: CVE-2020-35498
Fixes: fa8d9001a624 ("miniflow_extract: Properly handle small IP packets.")
Reported-by: Joakim Hindersson <joakim.hindersson@elastx.se>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dist-docs: Include manpages generated from rST.

Some manpages are generated from rST, but these are not included
in 'dist-docs' make target.

Fixes: fd0837a76f4c ("doc: Convert ovs-vlan-test to rST")
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tc: Fix mpls bottom of stack bit mask reporting.

Fix the reported back value of the bos mask used by the revalidator
threads.

Fixes: 34b1695506f8 ("lib/tc: add single mpls match offload support")
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

python: Add 'six' to list of install requirements.

Fixes: 73eb682edb67 ("python: Fix xmlrpclib imports.")
Signed-off-by: Thomas Neuman <thomas.neuman@nutanix.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

github: Don't fail the job if 'apt update' failed.

Some repositories that are enabled in GHA are not stable and lead
to 'apt update' failures:

  E: The repository
     'https://apt.postgresql.org/pub/repos/apt bionic-pgdg Release'
     no longer has a Release file.

This causes the job failure.
In most cases we don't really need any packages from these failed
repositories, so we could try to continue the job.

Previously this kind of failures happened on older branches with
ubuntu 16.04 base image, so we have this workaround already there.
Now it started to fail on bionic images, so fixing there too.

Fixes: 02f76fb42ae9 ("github: Fix Ubuntu package installation.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

vswitchd.xml: Fix supported IPsec tunnels.

'ovs-monitor-ipsec' does not support 'ip6gre' tunnels.

Fixes: 22c5eafb6efa ("ipsec: reintroduce IPsec support for tunneling")
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

conntrack: Fix icmp conntrack state.

ICMP conntrack state should be ICMPS_REPLY after seeing both
side of ICMP traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>

odp-util: Fix abort while formatting nsh actions.

OVS should not exit if it cannot format NSH actions for the user.
It should just report the error like the other formatting functions do.

Credit to OSS-Fuzz.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21509
Fixes: 1fc11c5948cf ("Generic encap and decap support for NSH")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark Gray <mark.d.gray@redhat.com>

Prepare for 2.12.3.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Set release date for 2.12.2.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: do not leak memory on multiple instances of TLVs

Upstream commit:
    commit a8d3c90feca548fc0656d95b5d278713db86ff61
    Date: Tue, 17 Nov 2020 09:28:17 -0500

    lldp: avoid memory leak from bad packets

    A packet that contains multiple instances of certain TLVs will cause
    lldpd to continually allocate memory and leak the old memory.  As an
    example, multiple instances of system name TLV will cause old values
    to be dropped by the decoding routine.

    Reported-at: https://github.com/openvswitch/ovs/pull/337
Reported-by: Jonas Rudloff <jonas.t.rudloff@gmail.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Vulnerability: CVE-2020-27827
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ipf: Avoid accessing to a freed rp.

if there are multiple pkts in the batch, the loop will access a
freed rp, which cause ovs crash.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovs-monitor-ipsec: Fix active connection regex.

Connections are added to IPsec using a connection name
that is determined from the OVS port name and the tunnel
type.

GRE connections take the form:
  <iface>-<ver>
Other connections take the form:
  <iface>-in-<ver>
  <iface>-out-<ver>

The regex '|' operator parses strings left to right looking
for the first match that it can find. '.*' is also greedy. This
causes incorrect interface names to be parsed from active
connections as other tunnel types are parsed as type
GRE. This gives unexpected "is outdated" warnings and the
connection is torn down.

For example,

'ovn-424242-in-1' will produce an incorrect interface name of
'ovn-424242-in' instead of 'ovn-424242'.

There are a number of ways this could be resolved including
a cleverer regular expression, or re.findall(). However, this
approach was taken as it simplifies the code easing maintainability.

Fixes: 22c5eafb6efa ("ipsec: reintroduce IPsec support for tunneling")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1908789
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-util: Fix netlink message overflow with userdata.

Too big userdata could overflow netlink message leading to out-of-bound
memory accesses or assertion while formatting nested actions.

Fix that by checking the size and returning correct error code.

Credit to OSS-Fuzz.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27640
Fixes: e995e3df57ea ("Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>

ovsdb-tool: Fix datum leak in the show-log command.

Fixes: 4e92542cefb7 ("ovsdb-tool: Make "show-log" convert raw JSON to easier-to-read syntax.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>

ofproto-dpif-xlate: Stop forwarding MLD reports to group ports.

According with rfc4541 section 2.1.1, a snooping switch
should forward membership reports only to ports with
routers attached.The current code violates the RFC
forwarding membership reports to group ports as well.
The same issue doesn't exist with IPv4.

Fixes: 06994f879c ("mcast-snooping: Add Multicast Listener Discovery support")
Signed-off-by: XiaoXiong Ding <dingxiaoxiong@huawei.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

github: Fix Ubuntu package installation.

Before trying to install a package, APT cache must be updated to avoid
asking for an unavailable version of a package.

Fixes: 6cb2f5a630e3 ("github: Add GitHub Actions workflow.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix expected condition seqno when changes are pending.

Commit 17f22fe46142 tried to address this but only covered some of the
cases.

The correct way to report the expected seqno is to take into account if
there already is a condition change that was requested to the server but
not acked yet. In that case, the new condition change request will be
sent only after the already requested one is acked. That is, expected
condition seqno when conditions are up to date is db->cond_seqno + 2 in
this case.

Fixes: 17f22fe46142 ("ovsdb-idl: Return correct seqno from ovsdb_idl_db_set_condition().")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-cluster.at: Fix infinite loop in torture tests.

For some reason, while running cluster torture tests in GitHub Actions
workflow, failure of 'echo' command doesn't fail the loop and subshell
never exits, but keeps infinitely printing errors after breaking from
the loop on the right side of the pipeline:

testsuite: line 8591: echo: write error: Broken pipe

Presumably, that is caused by some shell configuration option, but
I have no idea which one and I'm not able to reproduce locally with
shell configuration options provided in GitHub documentation.
Let's just add an explicit 'exit' on 'echo' failure. This will
guarantee exit from the loop and the subshell regardless of
configuration.

Fixes: 0f03ae3754ec ("ovsdb: Improve timing in cluster torture test.")
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix use-after-free when deleting orphaned rows.

It's possible that the IDL client processes multiple jsonrpc updates
in a single ovsdb_idl_run().

Considering the following updates processed in a single IDL run:
1. Update row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - this adds R1 to table A's track_list.
2. Delete row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - because row R2 still refers to row R1, this will create an orphan
     R1.
   - at this point R1 is still in table A's hmap.

When the IDL client calls ovsdb_idl_track_clear() after it has finished
processing the tracked changes, row R1 gets freed leaving a dangling
pointer in table A's hmap.

To fix this we don't free rows in ovsdb_idl_track_clear() if they are
orphan and still referenced by other rows, i.e., the row's 'dst_arcs'
list is not empty.  Later, when all arc sources (e.g., R2) are
deleted, the orphan R1 will be cleaned up as well.

The only exception is when the whole contents of the IDL are flushed,
in ovsdb_idl_db_clear(), in which case it's safe to free all rows.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 932104f483ef ("ovsdb-idl: Add support for change tracking.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix memleak when deleting orphan rows.

Pure IDL orphan rows, i.e., for which no "insert" operation was seen,
which are part of tables with change tracking enabled should also be
freed when the table track_list is flushed.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix memleak when reinserting tracked orphan rows.

Considering the following updates processed by an IDL client:
1. Delete row R1 from table A while R1 is also referenced by row R2 from
   table B:
   - because row R2 still refers to row R1, this will create an orphan
     R1 but also sets row->tracked_old_datum to report to the IDL client
     that the row has been deleted.
2. Insert row R1 to table A.
   - because orphan R1 already existed in the IDL, it will be reused.
   - R1 still has row->tracked_old_datum set (and may also be on the
     table->track_list).
3. Delete row R2 from table B and row R1 from table A.
   - row->tracked_old_datum is set again but the previous
     tracked_old_datum was never freed.

IDL clients use the deleted old_datum values so when multiple delete
operations are received for a row, always track the first one as that
will match the contents of the row the IDL client knew about.

Running the newly added test case with valgrind, without the fix,
produces the following report:

==23113== 327 (240 direct, 87 indirect) bytes in 1 blocks are definitely lost in loss record 43 of 43
==23113==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==23113==    by 0x476761: xmalloc (util.c:138)
==23113==    by 0x45D8B3: ovsdb_idl_insert_row (ovsdb-idl.c:3431)
==23113==    by 0x45B7F9: ovsdb_idl_process_update2 (ovsdb-idl.c:2670)
==23113==    by 0x45AFCF: ovsdb_idl_db_parse_update__ (ovsdb-idl.c:2479)
==23113==    by 0x45B262: ovsdb_idl_db_parse_update (ovsdb-idl.c:2542)
==23113==    by 0x45ABBE: ovsdb_idl_db_parse_update_rpc (ovsdb-idl.c:2358)
==23113==    by 0x4576DD: ovsdb_idl_process_msg (ovsdb-idl.c:865)
==23113==    by 0x457973: ovsdb_idl_run (ovsdb-idl.c:944)
==23113==    by 0x40B7B9: do_idl (test-ovsdb.c:2523)
==23113==    by 0x44425D: ovs_cmdl_run_command__ (command-line.c:247)
==23113==    by 0x44430E: ovs_cmdl_run_command (command-line.c:278)
==23113==    by 0x404BA6: main (test-ovsdb.c:76)

Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

datapath: ovs_ct_exit to be done under ovs_lock

Upstream commit:
    commit 27de77cec985233bdf6546437b9761853265c505
    Author: Tonghao Zhang <xiangxia.m.yue@gmail.com>
    Date:   Fri Apr 17 02:57:31 2020 +0800

    net: openvswitch: ovs_ct_exit to be done under ovs_lock

    syzbot wrote:
    | =============================
    | WARNING: suspicious RCU usage
    | 5.7.0-rc1+ #45 Not tainted
    | -----------------------------
    | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
    |
    | other info that might help us debug this:
    | rcu_scheduler_active = 2, debug_locks = 1
    | ...
    |
    | stack backtrace:
    | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
    | Workqueue: netns cleanup_net
    | Call Trace:
    | ...
    | ovs_ct_exit
    | ovs_exit_net
    | ops_exit_list.isra.7
    | cleanup_net
    | process_one_work
    | worker_thread

    To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
    lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Fixes: cb2a5486a3a3 ("datapath: conntrack: Support conntrack zone limit")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

compat: rcu: Add support for consolidated-RCU reader checking

Upstream commit:
    commit 28875945ba98d1b47a8a706812b6494d165bb0a0
    Author: Joel Fernandes (Google) <joel@joelfernandes.org>
    Date:   Tue Jul 16 18:12:22 2019 -0400

    rcu: Add support for consolidated-RCU reader checking

    This commit adds RCU-reader checks to list_for_each_entry_rcu() and
    hlist_for_each_entry_rcu().  These checks are optional, and are indicated
    by a lockdep expression passed to a new optional argument to these two
    macros.  If this optional lockdep expression is omitted, these two macros
    act as before, checking for an RCU read-side critical section.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    [ paulmck: Update to eliminate return within macro and update comment. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Backport portion of upstream commit for hlist_for_each_entry_rcu() macro
so that it can be used in following bug fix.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix iteration over tracked rows with no actual data.

When idl removes orphan rows, those rows are inserted into the
'track_list'.  This allows iterators such as *_FOR_EACH_TRACKED () to
return orphan rows that never had any data to the IDL user.  In this
case, it is difficult for the user to understand whether it is a row
with no data (there was no "insert" / "modify" for this row) or it is
a row with zero data (columns were cleared by DB transaction).

The main problem with this condition is that rows without data will
have NULL pointers instead of references that should be there according
to the database schema.  For example, ovn-controller might crash:

ERROR: AddressSanitizer: SEGV on unknown address 0x000000000100
       (pc 0x00000055e9b2 bp 0x7ffef6180880 sp 0x7ffef6180860 T0)
The signal is caused by a READ memory access.
Hint: address points to the zero page.
    #0 0x55e9b1 in handle_deleted_lport /controller/binding.c
    #1 0x55e903 in handle_deleted_vif_lport /controller/binding.c:2072:5
    #2 0x55e059 in binding_handle_port_binding_changes /controller/binding.c:2155:23
    #3 0x5a6395 in runtime_data_sb_port_binding_handler /controller/ovn-controller.c:1454:10
    #4 0x5e15b3 in engine_compute /lib/inc-proc-eng.c:306:18
    #5 0x5e0faf in engine_run_node /lib/inc-proc-eng.c:352:14
    #6 0x5e0e04 in engine_run /lib/inc-proc-eng.c:377:9
    #7 0x5a03de in main /controller/ovn-controller.c
    #8 0x7f4fd9c991a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
    #9 0x483f0d in _start (/controller/ovn-controller+0x483f0d)

It doesn't make much sense to return non-real rows to the user, so it's
best to exclude them from iteration.

Test included.  Without the fix, provided test will print empty orphan
rows that was never received by idl as tracked changes.

Fixes: 932104f483ef ("ovsdb-idl: Add support for change tracking.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>

Handle refTable values with setkey()

For columns like QoS.queues where we have a map containing refTable
values, assigning w/ __setattr__ e.g. qos.queues={1: $queue_row}
works, but using using qos.setkey('queues', 1, $queue_row) results
in an Exception. The opdat argument can essentially just be the
JSON representation of the map column instead of trying to build
it.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Add overflow test for the sha1 library.

This is a unit test for the overflow detection issue fixed by commit
a1d2c5f5d9ed ("sha1: Fix algorithm for data bigger than 512 megabytes.")

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Tested-by: Paolo Valerio <pvalerio@redhat.com>

travis: Remove support for Travis CI.

All CI jobs are covered by GitHub Actions now.

What happened to Travis CI:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-November/377773.html

Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

github: Add GitHub Actions workflow.

This is an initial version of GitHub Actions support. It mostly
mimics our current Travis CI build matrix with slight differences.

Minor difference that we can not install 32-bit version of libunbound
since it is not avaialble in repository.

.travis folder renamed to .ci to highlight that it used not only for
Travis CI. Travis CI support will be completely removed soon.

What happened to Travis CI:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-November/377773.html

Build with ernel 3.19 dropped as it's not supported and can not be built
with gcc-7+.

Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Return correct seqno from ovsdb_idl_db_set_condition().

If an IDL client sets the same monitor condition twice, the expected
seqno when the IDL contents are updated should be the same for both
calls.

In the following scenario:
1. Client calls ovsdb_idl_db_set_condition(db, table, cond1)
2. ovsdb_idl sends monitor_cond_change(cond1) but the server doesn't yet
reply.
3. Client calls ovsdb_idl_db_set_condition(db, table, cond1)

At step 3 the returned expected seqno should be the same as at step 1.
Similarly, if step 2 is skipped, i.e., the client calls sets
the condition twice in the same iteration, then both
ovsdb_idl_db_set_condition() calls should return the same value.

Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl: Fix *_is_new() IDL functions.

Currently all functions of the type *_is_new() always return
'false'. This patch resolves this issue by using the
'OVSDB_IDL_CHANGE_INSERT' 'change_seqno' instead of the
'OVSDB_IDL_CHANGE_MODIFY' 'change_seqno' to determine if a row
is new and by resetting the 'OVSDB_IDL_CHANGE_INSERT'
'change_seqno' on clear.

Further to this, the code is also updated to match the following
behaviour:

When a row is inserted, the 'OVSDB_IDL_CHANGE_INSERT'
'change_seqno' is updated to match the new database
change_seqno. The 'OVSDB_IDL_CHANGE_MODIFY' 'change_seqno'
is not set for inserted rows (only for updated rows).

At the end of a run, ovsdb_idl_db_track_clear() should be
called to clear all tracking information, this includes
resetting all row 'change_seqno' to zero. This will ensure
that subsequent runs will not see a previously 'new' row.

add_tracked_change_for_references() is updated to only
track rows that reference the current row.

Also, update unit tests in order to test the *_is_new(),
*_is_delete() functions.

Suggested-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/1883562
Fixes: ca545a787ac0 ("ovsdb-idl.c: Increase seqno for change-tracking of table references.")
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

compat: Fix compile warning.

In ../compat/nf_conntrack_reasm.c nf_frags_cache_name is declared
if OVS_NF_DEFRAG6_BACKPORT is defined. However, later in the patch
it is only used if HAVE_INET_FRAGS_WITH_FRAGS_WORK is defined and
HAVE_INET_FRAGS_RND is not defined. This will cause a compile warning
about unused variables.

Fix it up by using the same defines that enable its use to decide
if it should be declared and avoid the compiler warning.

Fixes: 4a90b277baca ("compat: Fixup ipv6 fragmentation on 4.9.135+ kernels")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

compat: Remove stale code.

Remove stale and unused code left over after support for kernels
older than 3.10 was removed.

Fixes: 8063e0958780 ("datapath: Drop support for kernel older than 3.10")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

tests: Add parse-flow tests for MPLS fields.

Currently "ovs-ofctl parse-flows (NXM)" test doesn't test MPLS fields at all.

This commit adds a test for the the 4 MPLS fields (mpls_label, mpls_tc,
mpls_bos and mpls_ttl) to "ovs-ofctl parse-flows (NXM)" test.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ofp-actions: Fix userspace support for mpls_ttl.

Currently mpls_ttl is ignored when a flow is added because MFF_MPLS_TTL is
not handled in nx_put_raw().

This commit adds the correct handling of MFF_MPLS_TTL in nx_put_raw().

Fixes: bef3f465bcd5 ("openflow: Support matching and modifying MPLS TTL field.")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

python: Don't raise an Exception on failure to connect via SSL.

With other socket types, trying to connect and failing will return
an error code, but if an SSL Stream is used, then when
check_connection_completion(sock) is called, SSL will raise an
exception that doesn't derive from socket.error which is handled.

This adds handling for SSL.SysCallError which has the same
arguments as socket.error (errno, string). A future enhancement
could be to go through SSLStream class and implement error
checking for all of the possible exceptions similar to how
lib/stream-ssl.c's interpret_ssl_error() works across the various
methods that are implemented.

Fixes: d90ed7d65ba8 ("python: Add SSL support to the python ovs client library")
Signed-off-by: Terry Wilson <twilson@redhat.com>
Acked-by: Thomas Neuman <thomas.neuman@nutanix.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: correctly increase discarded count

Upstream commit:
    commit 32f0deeebc9172c3f5f4a4d02aab32e6904947f6
    Date: Sat, 18 Feb 2017 20:11:47 +0100

    lldpd: correctly increase discarded count

    When a frame cannot be decoded but has been guessed, increase the
    discarded count.

    Fix https://github.com/vincentbernat/lldpd/issues/223

Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Co-authored-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Signed-off-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: increase statsTLVsUnrecognizedTotal on unknown TLV

Upstream commit:
    commit 109bcd423cd560545ec7940d73a50c5584aebb0c
    Author: Vincent Bernat <vincent@bernat.ch>
    Date: Sat, 6 Apr 2019 21:17:25 +0200

    This was done for organization TLVs, but not for other TLVs.

    Fix https://github.com/vincentbernat/lldpd/issues/323

Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Signed-off-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: fix a buffer overflow when handling management address TLV

Upstream commit:
    commit a8d8006c06d9ac16ebcf33295cbd625c0847ca9b
    Author: Vincent Bernat <vincent@bernat.im>
    Date: Sun, 4 Oct 2015 01:50:38 +0200

    lldp: fix a buffer overflow when handling management address TLV

    When a remote device was advertising a too large management address
    while still respecting TLV boundaries, lldpd would crash due to a buffer
    overflow. However, the buffer being a static one, this buffer overflow
    is not exploitable if hardening was not disabled. This bug exists since
    version 0.5.6.

Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Reported-by: Jonas Rudloff <jonas.t.rudloff@gmail.com>
Reported-at: https://github.com/openvswitch/ovs/pull/335
Co-authored-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Signed-off-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: Fix size of PEEK_DISCARD_UINT32()

Upstream commit:
    commit a8d8006c06d9ac16ebcf33295cbd625c0847ca9b
    Author: Jonas Johansson <jonasj76@gmail.com>
    Date:   Thu, 21 Apr 2016 11:50:06 +0200

    Fix size of PEEK_DISCARD_UINT32()

Signed-off-by: Jonas Johansson <jonasj76@gmail.com>
Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Reported-by: Jonas Rudloff <jonas.t.rudloff@gmail.com>
Reported-at: https://github.com/openvswitch/ovs/pull/336
Signed-off-by: Fabrizio D'Angelo <fdangelo@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

lldp: validate a bit more received LLDP frames

Upstream commit:
    commit 3aeae72b97716fddac290634fad02b952d981f17
    Author: Vincent Bernat <vincent@bernat.ch>
    Date:   Tue, 1 Oct 2019 21:42:42 +0200

    lldp: validate a bit more received LLDP frames

    Notably, we ensure the order and unicity of Chassis ID, Port ID and
    TTL TLV. For Chassis ID and Port ID, we also ensure the maximum size
    does not exceed 256.

    Fix https://github.com/vincentbernat/lldpd/issues/351

Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

sha1: Fix algorithm for data bigger than 512 megabytes.

In modern systems, size_t is 64 bits. There is a 32 bit overflow check
in sha1_update(), which will not work correctly, because compiler will
do an automatic cast to 64 bits, since size_t type variable is in the
expression. We do want however to lose data, since this is the whole
idea of this overflow check.

Because of this, computation of SHA-1 checksum will always be incorrect
for any data, that is bigger than 512 megabytes, which in bits is the
boundary of 32 bits integer.

In practice it means that any OVSDB transaction, bigger or equal to 512
megabytes, is considered corrupt and ovsdb-server will refuse to work
with the database file. This is especially critical for OVN southbound
database, since it tends to grow rapidly.

Fixes: 5eccf359391f ("Replace SHA-1 library with one that is clearly licensed.")
Signed-off-by: Renat Nurgaliyev <impleman@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-util: Fix overflow of nested netlink attributes.

Length of nested attributes must be checked before storing to the
header.  If current length exceeds the maximum value parsing should
fail, otherwise the length value will be truncated leading to
corrupted netlink message and out-of-bound memory accesses:

  ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6310002cc838
         at pc 0x000000575470 bp 0x7ffc6c322d60 sp 0x7ffc6c322d58
  READ of size 1 at 0x6310002cc838 thread T0
  SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
    #0 0x57546f in format_generic_odp_key lib/odp-util.c:2738:39
    #1 0x559e70 in check_attr_len lib/odp-util.c:3572:13
    #2 0x56581a in format_odp_key_attr lib/odp-util.c:4392:9
    #3 0x5563b9 in format_odp_action lib/odp-util.c:1192:9
    #4 0x555d75 in format_odp_actions lib/odp-util.c:1279:13
    ...

Fix that by checking the length of nested netlink attributes before
updating 'nla_len' inside the header.  Additionally introduced
assertion inside nl_msg_end_nested() to catch this kind of issues
before actual overflow happened.

Credit to OSS-Fuzz.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20003
Fixes: 65da723b40a5 ("odp-util: Format tunnel attributes directly from netlink.")
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

raft: Avoid annoying debug logs if raft is connected.

If debug logs enabled, "raft_is_connected: true" printed on every
call to raft_is_connected() which is way too frequently.
These messages are not very informative and only litters the log.

Let's log only disconnected state in a rate-limited way and only
log positive case once at the moment cluster becomes connected.

Fixes: 923f01cad678 ("raft.c: Set candidate_retrying if no leader elected since last election.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

raft: Fix error leak on failure while saving snapshot.

Error should be destroyed before return.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

travis: Fix kernel download retry.

wget stops retrying to download a file when hitting fatal http errors
like 503.
But if a previous try had resulted in a partially downloaded ${file}, the
next wget call tries to download to ${file}.1.

Example:
+wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.18.tar.xz
--2020-03-18 20:51:42--  https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.18.tar.xz
Resolving cdn.kernel.org (cdn.kernel.org)... 151.101.1.176, 151.101.65.176, 151.101.129.176, ...
Connecting to cdn.kernel.org (cdn.kernel.org)|151.101.1.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 103076276 (98M) [application/x-xz]
Saving to: ‘linux-4.16.18.tar.xz’

linux-4.16.18.tar.x   0%[                    ]  13.07K  --.-KB/s    in 0s

2020-03-18 20:54:44 (133 MB/s) - Read error at byte 13383/103076276 (Connection reset by peer). Retrying.

--2020-03-18 20:54:45--  (try: 2)  https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.18.tar.xz
Connecting to cdn.kernel.org (cdn.kernel.org)|151.101.1.176|:443... connected.
HTTP request sent, awaiting response... 503 first byte timeout
2020-03-18 20:55:46 ERROR 503: first byte timeout.

+wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.18.tar.xz
--2020-03-18 20:55:46--  https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.18.tar.xz
Resolving cdn.kernel.org (cdn.kernel.org)... 151.101.1.176, 151.101.65.176, 151.101.129.176, ...
Connecting to cdn.kernel.org (cdn.kernel.org)|151.101.1.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 103076276 (98M) [application/x-xz]
Saving to: ‘linux-4.16.18.tar.xz.1’

linux-4.16.18.tar.x 100%[===================>]  98.30M   186MB/s    in 0.5s

2020-03-18 20:55:56 (186 MB/s) - ‘linux-4.16.18.tar.xz.1’ saved [103076276/103076276]

Fixes: 048674b45f4b ("travis: Retry kernel download on 503 first byte timeout.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ofp-ed-props: Fix using uninitialized padding for NSH encap actions.

OVS uses memcmp to compare actions of existing and new flows, but
'struct ofp_ed_prop_nsh_md_type' and corresponding ofpact structure has
3 bytes of padding that never initialized and passed around within OF
data structures and messages.

  Uninitialized bytes in MemcmpInterceptorCommon
    at offset 21 inside [0x7090000003f8, 136)
  WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4a184e in bcmp (vswitchd/ovs-vswitchd+0x4a184e)
    #1 0x896c8a in ofpacts_equal lib/ofp-actions.c:9121:31
    #2 0x564403 in replace_rule_finish ofproto/ofproto.c:5650:37
    #3 0x563462 in add_flow_finish ofproto/ofproto.c:5218:13
    #4 0x54a1ff in ofproto_flow_mod_finish ofproto/ofproto.c:8091:17
    #5 0x5433b2 in handle_flow_mod__ ofproto/ofproto.c:6216:17
    #6 0x56a2fc in handle_flow_mod ofproto/ofproto.c:6190:17
    #7 0x565bda in handle_single_part_openflow ofproto/ofproto.c:8504:16
    #8 0x540b25 in handle_openflow ofproto/ofproto.c:8685:21
    #9 0x6697fd in ofconn_run ofproto/connmgr.c:1329:13
    #10 0x668e6e in connmgr_run ofproto/connmgr.c:356:9
    #11 0x53f1bc in ofproto_run ofproto/ofproto.c:1890:5
    #12 0x4ead0c in bridge_run__ vswitchd/bridge.c:3250:9
    #13 0x4e9bc8 in bridge_run vswitchd/bridge.c:3309:5
    #14 0x51c072 in main vswitchd/ovs-vswitchd.c:127:9
    #15 0x7f23a99011a2 in __libc_start_main (/lib64/libc.so.6)
    #16 0x46b92d in _start (vswitchd/ovs-vswitchd+0x46b92d)

  Uninitialized value was stored to memory at
    #0 0x4745aa in __msan_memcpy.part.0 (vswitchd/ovs-vswitchd)
    #1 0x54529f in rule_actions_create ofproto/ofproto.c:3134:5
    #2 0x54915e in ofproto_rule_create ofproto/ofproto.c:5284:11
    #3 0x55d419 in add_flow_init ofproto/ofproto.c:5123:17
    #4 0x54841f in ofproto_flow_mod_init ofproto/ofproto.c:7987:17
    #5 0x543250 in handle_flow_mod__ ofproto/ofproto.c:6206:13
    #6 0x56a2fc in handle_flow_mod ofproto/ofproto.c:6190:17
    #7 0x565bda in handle_single_part_openflow ofproto/ofproto.c:8504:16
    #8 0x540b25 in handle_openflow ofproto/ofproto.c:8685:21
    #9 0x6697fd in ofconn_run ofproto/connmgr.c:1329:13
    #10 0x668e6e in connmgr_run ofproto/connmgr.c:356:9
    #11 0x53f1bc in ofproto_run ofproto/ofproto.c:1890:5
    #12 0x4ead0c in bridge_run__ vswitchd/bridge.c:3250:9
    #13 0x4e9bc8 in bridge_run vswitchd/bridge.c:3309:5
    #14 0x51c072 in main vswitchd/ovs-vswitchd.c:127:9
    #15 0x7f23a99011a2 in __libc_start_main (/lib64/libc.so.6)

  Uninitialized value was created by an allocation of 'ofpacts_stub'
  in the stack frame of function 'handle_flow_mod'
    #0 0x569e80 in handle_flow_mod ofproto/ofproto.c:6170

This could cause issues with flow modifications or other operations.

To reproduce, some NSH tests could be run under valgrind or clang
MemorySantizer. Ex. "nsh - md1 encap over a veth link" test.

Fix that by clearing padding bytes while encoding and decoding.
OVS will still accept OF messages with non-zero padding from
controllers.

New tests added to tests/ofp-actions.at.

Fixes: 1fc11c5948cf ("Generic encap and decap support for NSH")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>

system-userspace-packet-type-aware.at: Wait for ip address updates.

ovs-router module checks for the source ip address of the interface
while adding a new route.  netdev module doesn't request ip addresses
from the system every time, but instead it caches currently assigned
ip addresses and updates the cache on netlink notifications if needed.

So, there is a slight delay between setting ip address on interface
in a system and a moment OVS updates list of ip addresses of this
interface.  If route addition happens within this time frame, it
fails with the following error:

    # ovs-appctl ovs/route/add 10.0.0.0/24 br-p1
    Error while inserting route.
    ovs-appctl: ovs-vswitchd: server returned an error

This makes system tests to fail frequently.

Let's wait until local route successfully added.  This will mean
that OVS finished processing of a netlink event and will use up to
date list of ip addresses on desired interface.

Fixes: 526cf4e1d6a8 ("tests: Added unit tests in packet-type-aware.at")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

netdev-dpdk: Don't set rx mq mode for net_virtio.

Since DPDK 18.11.6, it is not allowed to set any RX mq mode for virtio
driver.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

docs: Add flow control on i40e issue

There is an issue with flow control configuration on i40e devices
and it has a work around. We add this to documentation as known issue
until a permanent solution is developed.

Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>

cirrus: Use FreeBSD 11.4.

Support cycle of 11.3 ends in the end of September 2020,
so we need to upgrade.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>

rhel: Fix reload of OVS_USER_ID on startup.

OVS_USER_ID was being picked up from a previously existing
openvswitch.useropts rendering innefective any configuration change
through sysconfig.

There is no ordering between Exec* and Environment* stanzas of systemd,
full Enviroment* is always loaded before each Exec*. We make
sure that openvswitch.useropts is removed in a first Exec so that a
fresh OVS_USER_ID can be picked up from config in successive Exec*.

Fixes: 94e1e8b ("rhel: run ovn with the same user as ovs")
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

classifier: Fix use of uninitialized value.

Coverity reports use of uninitialized value of cursor.
This happens in cls_cursor_start(), when rule is false,
cursor.subtable is uninitialized. CID 279324.

Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

rhel: Fix logrotate group when dpdk is enabled.

Otherwise logrotate will fail to generate the rotated log files.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-idl.at: Wait all servers to join the cluster.

The test 'Check Python IDL reconnects to leader - Python3
(leader only)' fails sometimes when the first ovsdb-server
gets killed before the others had joined the cluster.

Fix the function ovsdb_cluster_start_idltest to wait them
to join the cluster.

Fixes: c39751e44539 ("python: Monitor Database table to manage lifecycle of IDL client.")
Co-authored-by:: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovs-dpctl-top: Skip "eth()" element.

With commit efde188622ae ("odp-util: Print eth() for Ethernet flows if
packet_type is absent.") "eth()" is printed for Ethernet flows if packet_type
is absent, but this broke "ovs-dpctl-top" since it expects that every
element has a value.

This commit skips the parsing of the empty "eth()" element.

Fixes: efde188622ae ("odp-util: Print eth() for Ethernet flows if packet_type is absent.")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

meta-flow: fix a typo in "MPLS Bottom of Stack Field" paragraph.

In the ovs-fields.7 manual page, the "MPLS Bottom of Stack Field" paragraph
says:
* When mpls_bos is 1, there is another MPLS label following this one,
so the Ethertype passed to pop_mpls should be an MPLS Ethertype. [...]

* When mpls_bos is 0, this MPLS label is the last one, so the Ethertype
passed to pop_mpls should be a non-MPLS Ethertype such as IPv4. [...]

The values 0 and 1 have been swapped: when BOS is 1,
then no more label stack entries follows.

Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and OpenFlow fields.")
Reported-at: https://bugzilla.redhat.com/1842032
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

connmgr: Support changing openflow versions without restarting.

When commit a0baa7dfa4fe ("connmgr: Make treatment of active and passive
connections more uniform") was applied, it did not take into account
that a reconfiguration of the allowed_versions setting would require a
reload of the ofservice object (only accomplished via a restart of OvS).

For now, during the reconfigure cycle, we delete the ofservice object and
then recreate it immediately. A new test is added to ensure we do not
break this behavior again.

Fixes: a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform")
Suggested-by: Ben Pfaff <blp@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1782834
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Numan Siddique <numans@ovn.org>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-offload-dpdk: Fix for broken ethernet matching HWOL for XL710NIC.

This patch introduces a temporary work around to fix
partial hardware offload for XL710 devices. Currently the incorrect
ethernet pattern is being set. This patch will be removed once
this issue is fixed within the i40e PMD.

Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Co-authored-by: Eli Britstein <elibr@nvidia.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

acinclude: Fix build with kernels with prandom* moved to prandom.h.

Recent commit c0842fbc1b18 ("random32: move the pseudo-random 32-bit
definitions to prandom.h") in upstream kernel moved the definition
of prandom_* functions from random.h to prandom.h.  This change was
also backported to stable kernels.

Fixing our configure script to look for these functions in a new
location and avoid build failures:

  datapath/linux/compat/include/linux/random.h:11:19:
    error: redefinition of 'prandom_u32_max'

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-server: Replace in-memory DB contents at raft install_snapshot.

Every time a follower has to install a snapshot received from the
leader, it should also replace the data in memory. Right now this only
happens when snapshots are installed that also change the schema.

This can lead to inconsistent DB data on follower nodes and the snapshot
may fail to get applied.

Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Prepare for 2.12.2.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Set release date for 2.12.1.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-util: Clear padding in the nd_extension.

Silimar to the patch 67eb8110171f ("odp-util: Fix passing
uninitialized bytes in OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV*.")
when change from flow into the netlink format, the tail
padding of nd_extension should be cleared.

this fixes the following warning logs:

|ofproto_dpif_upcall(pmd-...)|WARN|Conflicting ukey for flows:
   ufid:763c7d3b-4d0c-4bff-aafc-fdfb6089c2ba
   <...>,eth(...),eth_type(0x86dd),ipv6(...),icmpv6(type=135,code=0),\
   nd(target=fdbd:dc02:ff:1:1::1,sll=fa:16:3e:75:b3:a9,tll=00:00:00:00:00:00),\
   nd_ext(nd_reserved=0x0,nd_options_type=1)

   ufid:763c7d3b-4d0c-4bff-aafc-fdfb6089c2ba
   <...>,eth(...),eth_type(0x86dd),ipv6(...),icmpv6(type=135,code=0),\
   nd(target=fdbd:dc02:ff:1:1::1,sll=fa:16:3e:75:b3:a9,tll=00:00:00:00:00:00),\
   nd_ext(nd_reserved=0x0,nd_options_type=1)
|ofproto_dpif_upcall(pmd-...)|WARN|upcall_cb failure: ukey installation fails

Fixes: 9b2b84973db7 ("Support for match & set ICMPv6 reserved and options type fields")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

datapath-windows: Update flow key in SET action

The flow key is not updated when process OVS_ACTION_ATTR_SET action.
It will impact follow-up actions, such as, conntrack module cannot
find created conntrack entry if passing old flow key to it.

Reported-by: Rui Cao <rcao@vmware.com>
Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>

odp-util: Fix clearing match mask if set action is partially unnecessary.

While committing set() actions, commit() could wildcard all the fields
that are same in match key and in the set action.  This leads to
situation where mask after commit could actually contain less bits
than it was before.  And if set action was partially committed, all
the fields that were the same will be cleared out from the matching key
resulting in the incorrect (too wide) flow.

For example, for the flow that matches on both src and dst mac
addresses, if the dst mac is the same and only src should be changed
by the set() action, destination address will be wildcarded in the
match key and will never be matched, i.e. flows with any destination
mac will match, which is not correct.

Setting OF rule:

in_port=1,dl_src=50:54:00:00:00:09 actions=mod_dl_dst(50:54:00:00:00:0a),output(2)

Sending following packets on port 1:

  1. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800)
  2. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),eth_type(0x0800)
  3. eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800)

Resulted datapath flows:
  eth(dst=50:54:00:00:00:0c),<...>, actions:set(eth(dst=50:54:00:00:00:0a)),2
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2

The first flow  doesn't have any match on source MAC address and the
third packet successfully matched on it while it must be dropped.

Fix that by updating the match mask with only the new bits set by
commit(), but keeping those that were cleared (OR operation).

With fix applied, resulted correct flows are:
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2
  eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),<...>,
                                    actions:set(eth(dst=50:54:00:00:00:0a)),2
  eth(src=50:54:00:00:00:0b),<...>, actions:drop

The code before commit dbf4a92800d0 was not able to reduce the mask,
it was only possible to expand it to exact match, so it was OK to
update original matching mask with the new value in all cases.

Fixes: dbf4a92800d0 ("odp-util: Do not rewrite fields with the same values as matched")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1854376
Acked-by: Eli Britstein <elibr@mellanox.com>
Tested-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpctl: Fix memory leak in dpctl_dump_flows()

Goto label accurately to avoid memleak.

Fixes: a692410af0f7 ("dpctl: Expand the flow dump type filter")
Cc: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>

ovs-router: Fix flushing of local routes.

Since commit 8e4e45887ec3, priority of 'local' route entries no
longer matches with 'plen'.  This should be taken into account
while flushing cached routes, otherwise they will remain in OVS
even after removing them from the system:

  # ifconfig eth0 11.0.0.1
  # ovs-appctl ovs/route/show
    --- A new route synchronized from kernel route table ---
    Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
  # ifconfig eth0 0
  # ovs-appctl ovs/route/show
    -- the new route entry is still in ovs route table ---
    Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local

CC: wenxu <wenxu@ucloud.cn>
Fixes: 8e4e45887ec3 ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses")
Reported-by: Zheng Jingzhou <glovejmm@163.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-July/373093.html
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>

dpdk: Use DPDK 18.11.9 release.

Modify travis linux build script to use the latest DPDK stable release.
Update docs for latest DPDK stable releases. Update release faq to
reference latest validated DPDK release for each branch.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>

dpif-netdev: Return error code when no mark available.

The max number of mark is (UINT32_MAX - 1), that is
enough to be used. But theoretically, if there are no
mark available, the later different flows will shared
the mark INVALID_FLOW_MARK, that may break the function.
If there are no available mark to be used, return error
code.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netdev: Add check mark to avoid ovs-vswitchd crash.

When changing the pmd interfaces attribute, ovs-vswitchd will
reload pmd and flush offload flows. reload_affected_pmds may
be invoked twice or more. In that case, the flows may been
queued to "dp_netdev_flow_offload" thread again.

For example:
$ ovs-vsctl -- set interface <Interface> options:dpdk-lsc-interrupt=true

ovs-vswitchd main       flow-offload thread
append F to queue       ...
...
append F to queue
...                     del F
...                     del F (crash [1])

[1]:
ovs_assert_failure          lib/cmap.c:922
cmap_replace                lib/cmap.c:921
cmap_remove                 lib/cmap.h:295
mark_to_flow_disassociate   lib/dpif-netdev.c:2269
dp_netdev_flow_offload_del  lib/dpif-netdev.c:2369
dp_netdev_flow_offload_main lib/dpif-netdev.c:2492

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

netdev-offload-dpdk: Fix Ethernet matching for type only.

For OVS rule of the form "eth type is 0x1234 / end", rule is offloaded
in the form of "eth / end", which is incorrect. Fix it.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netdev: Don't use zero flow mark.

Zero flow mark is used to indicate the HW to remove the mark. A packet
marked with zero mark is received in SW without a mark at all, so it
cannot be used as a valid mark. Change the pool range to fix it.

Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-execute: Fix length checking while executing check_pkt_len action.

If dp-packet contains l2 padding or cutlen was applied to it, size will
be larger than the actual size of a payload and action will work
incorrectly.

Ex. Padding could be added during miniflow_extract() if detected.

Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger")
Reported-by: Miroslav Kubiczek <miroslav.kubiczek@adaptivemobile.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050157.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

odp-util.c: Fix dp_hash execution with slowpath actions.

When dp_hash is executed with slowpath actions, it results in endless
recirc loop in kernel datapath, and finally drops the packet, with
kernel logs:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

The root cause is that the dp_hash value calculated by slowpath is not
passed to datapath when executing the recirc action, thus when the recirced
packet miss upcall comes to userspace again, it generates the dp_hash
and recirc action again, with same recirc_id, which in turn generates
a megaflow with recirc action with the recird_id same as the recirc_id in
its match condition, which causes a loop in datapath.

For example, this can be reproduced with below setup of OVN environment:

                         LS1            LS2
                          |              |
                          |------R1------|
        VIF--LS0---R0-----|              |------R3
                          |------R2------|

Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
routes (ECMP) from R3 to the VIF:
R3 -> R1 -> R0
R3 -> R2 -> R0

Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
will hit the R3's datapath which has flows that responds to the ICMP packet
by setting ICMP fields, which requires slowpath actions, and in later flow
tables it will hit the "group" action that selects between the ECMP routes.

By default OVN uses "dp_hash" method for the "group" action.

For the first miss upcall packet, dp_hash value is empty, so the group action
will be translated to "dp_hash" and "recirc".

During action execution, because of the previous actions that sets ICMP fields,
the whole execution requires slowpath, so it tries to execute all actions in
userspace in odp_execute_actions(), including dp_hash action, except the
recirc action, which can only be executed in datapath. So the dp_hash value
is calculated in userspace, and then the packet is injected to datapath for
recirc action execution.

However, the dp_hash calculated by the userspace is not passed to datapath.

Because of this, the packet recirc in datapath doesn't have dp_hash value,
and the miss upcall for the recirced packet hits the same flow tables and
triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!

This time, the new upcall doesn't require any slowpath execution, so both
the dp_hash and recirc actions are executed in datapath, after creating a
datapath megaflow like:

recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)

with match recirc_id equals the recirc id in the action, thus creating a loop.

This patch fixes the problem by passing the calculated dp_hash value to
datapath in odp_key_from_dp_packet().

Fixes: 572f732ab078 ("dpif-netdev: user space datapath recirculation")
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.

Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
(i.e., "monitor_cond_since" method) with the initial monitor condition
MC1.

Assuming the following two transactions are executed on the
ovsdb-server:
TXN1: "insert record R1 in table T1"
TXN2: "insert record R2 in table T2"

If the client's monitor condition MC1 for table T2 matches R2 then the
client will receive the following update3 message:
method="update3", "insert record R2 in table T2", last-txn-id=TXN2

At this point, if the presence of the new record R2 in the IDL triggers
the client to update its monitor condition to MC2 and add a clause for
table T1 which matches R1, a monitor_cond_change message is sent to the
server:
method="monitor_cond_change", "clauses from MC2"

In normal operation the ovsdb-server will reply with a new update3
message of the form:
method="update3", "insert record R1 in table T1", last-txn-id=TXN2

However, if the connection drops in the meantime, this last update might
get lost.

It might happen that during the reconnect a new transaction happens
that modifies the original record R1:
TXN3: "modify record R1 in table T1"

When the client reconnects, it will try to perform a fast resync by
sending:
method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2

Because TXN2 is still in the ovsdb-server transaction history, the
server replies with the changes from the most recent transactions only,
i.e., TXN3:
result="true", last-txbb-id=TXN3, "modify record R1 in table T1"

This causes the IDL on the client in to end up in an inconsistent
state because it has never seen the update that created R1.

Such a scenario is described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22

To avoid this issue, the IDL will now maintain (up to) 3 different
types of conditions for each DB table:
- new_cond: condition that has been set by the IDL client but has
  not yet been sent to the server through monitor_cond_change.
- req_cond: condition that has been sent to the server but the reply
  acknowledging the change hasn't been received yet.
- ack_cond: condition that has been acknowledged by the server.

Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
disconnect):
- if there is a known last_id txn-id the code ensures that new_cond
  will contain the most recent condition set by the IDL client
  (either req_cond if there was a request in flight, or new_cond
  if the IDL client set a condition while the IDL was disconnected)
- if there is no known last_id txn-id the code ensures that ack_cond will
  contain the most recent conditions set by the IDL client regardless
  whether they were acked by the server or not.

When monitor_cond_since/monitor_cond requests are sent they will
always include ack_cond and if new_cond is not NULL a follow up
monitor_cond_change will be generated afterwards.

On the other hand ovsdb_idl_db_set_condition() will always modify new_cond.

This ensures that updates of type "insert" that happened before the last
transaction known by the IDL but didn't match old monitor conditions are
sent upon reconnect if the monitor condition has changed to include them
in the meantime.

Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovs-rcu: Avoid flushing callbacks during postponing.

ovsrcu_flush_cbset() call during ovsrcu_postpone() could cause
use after free in case the caller sets new pointer only after
postponing free for the old one:

------------------  ------------------  -------------------
Thread 1            Thread 2            RCU Thread
------------------  ------------------  -------------------
pointer = A

ovsrcu_quiesce():
  thread->seqno = 30
  global_seqno = 31
  quiesced

read pointer A
postpone(free(A)):
   flush cbset
                                         pop flushed_cbsets
                                         ovsrcu_synchronize:
                                           target_seqno = 31
                     ovsrcu_quiesce():
                      thread->seqno = 31
                      global_seqno = 32
                      quiesced

                     read pointer A
                     use pointer A

                     ovsrcu_quiesce():
                      thread->seqno = 32
                      global_seqno = 33
                      quiesced

                     read pointer A
pointer = B

ovsrcu_quiesce():
  thread->seqno = 33
  global_seqno = 34
  quiesced

                                         target_seqno exceeded
                                         by all threads
                                         call cbs to free A
                     use pointer A
                     (use after free)
-----------------------------------------------------------

Fix that by using dynamically re-allocated array without flushing
to the global flushed_cbsets until writer enters quiescent state.

Fixes: 0f2ea84841e1 ("ovs-rcu: New library.")
Reported-by: Linhaifeng <haifeng.lin@huawei.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371265.html
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb: Fix timeout type for wait operation.

According to RFC 7047, 'timeout' is an integer field:

5.2.6.  Wait
   The "wait" object contains the following members:
      "op": "wait"                        required
      "timeout": <integer>                optional
      ...

For some reason initial implementation treated it as a real number.

This causes a build issue with clang that complains that LLONG_MAX
could not be represented as double:

ovsdb/execution.c:733:32: error: implicit conversion from 'long long'
                           to 'double' changes value from
                           9223372036854775807 to 9223372036854775808
            timeout_msec = MIN(LLONG_MAX, json_real(timeout));
                           ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/limits.h:69:19: note: expanded from macro 'LLONG_MAX'
#define LLONG_MAX       __LLONG_MAX     /* max for a long long */
                        ^~~~~~~~~~~
/usr/include/x86/_limits.h:74:21: note: expanded from macro '__LLONG_MAX'
#define __LLONG_MAX     0x7fffffffffffffffLL    /* max value for a long long */
                        ^~~~~~~~~~~~~~~~~~~~
./lib/util.h:90:21: note: expanded from macro 'MIN'
#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
                     ^  ~

Fix that by changing parser to treat 'timeout' as integer.
Fixes clang build on FreeBSD 12.1 in CirrusCI.

Fixes: f85f8ebbfac9 ("Initial implementation of OVSDB.")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

cirrus: Don't install py27 packages.

py27 packages are no longer available on FreeBSD.
Removing these packages along with python2 itself to avoid CI failures:

  pkg: No packages available to install matching 'py27-sphinx' have
       been found in the repositories
  Exit status: 70

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

classifier: Prevent tries vs n_tries race leading to NULL dereference.

Currently classifier tries and n_tries can be updated not atomically,
there is a race condition which can lead to NULL dereference.
The race can happen when main thread updates a classifier tries and
n_tries in classifier_set_prefix_fields() and at the same time revalidator
or handler thread try to lookup them in classifier_lookup__(). Such race
can be triggered when user changes prefixes of flow_table.

Race(user changes flow_table prefixes: ip_dst,ip_src => none):

  [main thread]             [revalidator/handler thread]
  ===========================================================
                            /* cls->n_tries == 2 */
                            for (int i = 0; i < cls->n_tries; i++) {
  trie_init(cls, i, NULL);
  /* n_tries == 0 */
  cls->n_tries = n_tries;
                            /* cls->tries[i]->feild is NULL */
                            trie_ctx_init(&trie_ctx[i],&cls->tries[i]);
                            /* trie->field is NULL */
                            ctx->be32ofs = trie->field->flow_be32ofs;

To prevent the race, instead of re-introducing internal mutex
implemented in the commit fccd7c092e09 ("classifier: Remove internal
mutex."), this patch makes trie field RCU protected and checks it after
read.

Fixes: fccd7c092e09 ("classifier: Remove internal mutex.")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

ovsdb-server: Fix schema leak while reading db.

parse_txn() function doesn't always take ownership of the 'schema'
passed.  So, if the schema of the clustered db has same version as the
one that already in use, parse_txn() will not use it, resulting with a
memory leak:

7,827 (56 direct, 7,771 indirect) bytes in 1 blocks are definitely lost
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44AD02: xcalloc (util.c:121)
    by 0x40E70E: ovsdb_schema_create (ovsdb.c:41)
    by 0x40EA6D: ovsdb_schema_from_json (ovsdb.c:217)
    by 0x415EDD: ovsdb_storage_read (storage.c:280)
    by 0x408968: read_db (ovsdb-server.c:607)
    by 0x40733D: main_loop (ovsdb-server.c:227)
    by 0x40733D: main (ovsdb-server.c:469)

While we could put ovsdb_schema_destroy() in a few places inside
'parse_txn()', from the users' point of view it seems better to have a
constant argument and just clone the 'schema' if needed.  The caller
will be responsible for destroying the 'schema' it owns.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

compat: Backport ipv6_stub change

A patch backported to the Linux stable 4.14 tree and present in the
latest stable 4.14.181 kernel breaks ipv6_stub usage.

The commit is
8ab8786f78c3 ("net ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup").

Create the compat layer define to check for it and fixup usage in vxlan
and geneve modules.

Passes Travis here:
https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/689798733

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>

netdev-linux: Update LAG in all cases.

In some cases, when processing a netlink change event, it's possible for
an alternate part of OvS (like the IPv6 endpoint processing) to hold an
active netdev interface. This creates a race-condition, where sometimes
the OvS change processing will take the normal path. This doesn't work
because the netdev device object won't actually be enslaved to the
ovs-system (for instance, a linux bond) and ingress qdisc entries will
be missing.

To address this, we update the LAG information in ALL cases where
LAG information could come in.

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Cc: Marcelo Leitner <mleitner@redhat.com>
Cc: John Hurley <john.hurley@netronome.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

metaflow: Fix maskable conntrack orig tuple fields

From man ovs-fields(7), the conntrack origin tuple fields
ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
to be bitwise maskable, but they are not. This patch enables
those fields to be maskable, and adds a regression test.

Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Reported-by: Wenying Dong <wenyingd@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>

raft: Disable RAFT jsonrpc inactivity probe.

With the scale test of 640 nodes k8s cluster, raft DB nodes' jsonrpc
session got closed due to the timeout of default 5 seconds probe.
It will cause disturbance of the raft cluster. Since we already have
the heartbeat for RAFT, just disable the probe between the servers
to avoid the unnecessary jsonrpc inactivity probe.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Zhen Wang <zhewang@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

raft: Fix leak of the incomplete command.

Function raft_command_initiate() returns correctly referenced command
instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
commands because one more reference is in raft->commands list.
raft_handle_execute_command_request__() leaks the reference by not
returning pointer anywhere and not unreferencing incomplete commands.

792 bytes in 11 blocks are definitely lost in loss record 258 of 262
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44BA32: xcalloc (util.c:121)
    by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
    by 0x422E5F: raft_command_initiate (raft.c:2061)
    by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
    by 0x428651: raft_handle_execute_command_request (raft.c:4177)
    by 0x428651: raft_handle_rpc (raft.c:4230)
    by 0x428651: raft_conn_run (raft.c:1445)
    by 0x428DEA: raft_run (raft.c:1803)
    by 0x407392: main_loop (ovsdb-server.c:226)
    by 0x407392: main (ovsdb-server.c:469)

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>

compat: Fix ipv6_dst_lookup build error

The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is
recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree.

This causes travis build failure:
    * https://travis-ci.org/github/openvswitch/ovs/builds/681084038

This patch updates the backport logic to invoke the right function.

Related patch in
    git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of
               ip6_dst_lookup")

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>

cirrus: Force pkg update on FreeBSD.

Seems like FreeBSD ports/images are not well maintained and frequently
causes package installation failures like this:

[1/40] Fetching automake-1.16.1_2.txz: .......... done
pkg: cached package automake-1.16.1_2: size mismatch, fetching from remote
[2/40] Fetching automake-1.16.1_2.txz: .......... done
pkg: cached package automake-1.16.1_2: size mismatch, cannot continue
Consider running 'pkg update -f'

Forced update doesn't increase build time significantly, but helps
to solve at least this one kind of issues.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Revert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"

This reverts commit 5351980b047f4dd40be7a59a1e4b910df21eca0a.

If the ovsdb-server reply to "monitor_cond_since" requests has
"found" == false then ovsdb_idl_db_parse_monitor_reply() calls
ovsdb_idl_db_clear() which iterates through all tables and
unconditionally sets table->cond_changed to false.

However, if the client had already set a new condition for some of the
tables, this new condition request will never be sent to ovsdb-server
until the condition is reset to a different value. This is due to the
check in ovsdb_idl_db_set_condition().

One way to replicate the issue is described in the bugzilla reporting
the bug, when ovn-controller is configured to use "ovn-monitor-all":
https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6

Commit 5351980b047f tried to optimize sending redundant conditional
monitoring updates but the chances that this scenario happens with the
latest code is quite low since commit 403a6a0cb003 ("ovsdb-idl: Fast
resync from server when connection reset.") changed the behavior of
ovsdb_idl_db_parse_monitor_reply() to avoid calling ovsdb_idl_db_clear()
in most cases.

Reported-by: Dan Williams <dcbw@redhat.com>
Reported-at: https://bugzilla.redhat.com/1808125
CC: Andy Zhou <azhou@ovn.org>
Fixes: 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

dpif-netdev: Force port reconfiguration to change dynamic_txqs.

In case number of polling threads goes from exact number of Tx queues
in port to higher value while set_tx_multiq() not implemented or not
requesting reconfiguration, port will not be reconfigured and datapath
will continue using static Tx queue ids leading to crash.

Ex.:
Assuming that port p0 supports up to 4 Tx queues and doesn't support
set_tx_multiq() method.  For example, netdev-afxdp could be the case,
because it could have multiple Tx queues, but doesn't have
set_tx_multiq() implementation because number of Tx queues always
equals to number of Rx queues.

1. Configuring pmd-cpu-mask to have 3 pmd threads.

2. Adding port p0 to OVS.
    At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
    Port reconfigured to have 4 Tx queues successfully.
    dynamic_txqs = (4 < 4) = false;

3. Configuring pmd-cpu-mask to have 10 pmd threads.
    At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
    Since set_tx_multiq() is not implemented, netdev doesn't request
    reconfiguration and 'dynamic_txqs' remains in 'false' state.

4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
    ids that are in range [0, 10] while device only supports 4 leading
    to unwanted behavior and crashes.

Fix that by marking for reconfiguration all the ports that will likely
change their 'dynamic_txqs' value.

It looks like the issue could be reproduced only with afxdp ports,
because all other non-dpdk ports ignores Tx queue ids and dpdk ports
requests for reconfiguration on set_tx_multiq().

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>

dpif-netlink: avoid netlink modify flow put op failed after tc modify flow put op failed.

The tc modify flow put always delete the original flow first and
then add the new flow. If the modfiy flow put operation failed,
the flow put operation will change from modify to create if success
to delete the original flow in tc (which will be always failed with
ENOENT, the flow is already be deleted before add the new flow in tc).
Finally, the modify flow put will failed to add in kernel datapath.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>

conntrack: Reset ct_state when entering a new zone.

When a new conntrack zone is entered, the ct_state field is zeroed in
order to avoid using state information from different zones.

One such scenario is when a packet is double NATed. Assuming two zones
and 3 flows performing the following actions in order on the packet:
1. ct(zone=5,nat), recirc
2. ct(zone=1), recirc
3. ct(zone=1,nat)

If at step #1 the packet matches an existing NAT entry, it will get
translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT.
At step #2 the new tuple might match an existing connection and
pkt->md.ct_zone is set to 1.
If at step #3 the packet matches an existing NAT entry in zone 1,
handle_nat() will be called to perform the translation but it will
return early because the packet's zone matches the conntrack zone and
the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the
translations in zone 5.

In order to reliably detect when a packet enters a new conntrack zone
we also need to make sure that the pkt->md.ct_zone is properly
initialized if pkt->md.ct_state is non-zero. This already happens for
most cases. The only exception is when matched conntrack connection is
of type CT_CONN_TYPE_UN_NAT and the master connection is missing. To
cover this path we now call write_ct_md() in that case too. Remove
setting the CS_TRACKED flag as in this case as it will be done by the
new call to write_ct_md().

CC: Darrell Ball <dlu998@gmail.com>
Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

conntrack: Fix NULL pointer dereference.

Coverity CID 279957 reports NULL pointer derefence when
'conn' is NULL and calling ct_print_conn_info.

Cc: Usman Ansari <uansari@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>

ovsdb-tool: fix memory leak while converting cluster into standalone database

memory leak is reported by valgrind while executing functional test
"ovsdb-tool convert-to-standalone"

==13842== 2,850 (280 direct, 2,570 indirect) bytes in 7 blocks are definitely lost in loss record 20 of 20
==13842==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13842==    by 0x45EE2E: xmalloc (util.c:138)
==13842==    by 0x43E386: json_create (json.c:1451)
==13842==    by 0x43BDD2: json_object_create (json.c:254)
==13842==    by 0x43DEE3: json_parser_push_object (json.c:1273)
==13842==    by 0x43E167: json_parser_input (json.c:1371)
==13842==    by 0x43D6EA: json_lex_input (json.c:991)
==13842==    by 0x43DAC1: json_parser_feed (json.c:1149)
==13842==    by 0x40D108: parse_body (log.c:411)
==13842==    by 0x40D386: ovsdb_log_read (log.c:476)
==13842==    by 0x406A0B: do_convert_to_standalone (ovsdb-tool.c:1571)
==13842==    by 0x406A0B: do_cluster_standalone (ovsdb-tool.c:1606)
==13842==    by 0x438670: ovs_cmdl_run_command__ (command-line.c:223)
==13842==    by 0x438720: ovs_cmdl_run_command (command-line.c:254)
==13842==    by 0x405A4C: main (ovsdb-tool.c:79)

The problem was in do_convert_to_standalone() function which while reading log file
allocate json object which was not deallocated at the end.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovsdb-tool: Convert clustered db to standalone db.

Add support in ovsdb-tool for migrating clustered dbs to standalone dbs.
E.g. usage to migrate nb/sb db to standalone db from raft:
ovsdb-tool cluster-to-standalone ovnnb_db.db ovnnb_db_cluster.db

Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

ovsdb-tool: fix memory leak while running "db-is-standalone" command

problem is reported by valgrind while running functional tests:

==21043== 160 (88 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 8
==21043==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21043==    by 0x45EE2E: xmalloc (util.c:138)
==21043==    by 0x40CB81: ovsdb_log_open (log.c:270)
==21043==    by 0x406B4F: do_db_has_magic.isra.9 (ovsdb-tool.c:563)
==21043==    by 0x438670: ovs_cmdl_run_command__ (command-line.c:223)
==21043==    by 0x438720: ovs_cmdl_run_command (command-line.c:254)
==21043==    by 0x405A4C: main (ovsdb-tool.c:79)

problem was in do_db_has_magic() which opens log file which is never closed.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>

dpif-netdev: Enter quiescent state after each offloading operation.

If the offloading queue is big and filled continuously, offloading
thread may have no chance to quiesce blocking rcu callbacks and
other threads waiting for synchronization.

Fix that by entering momentary quiescent state after each operation
since we're not holding any rcu-protected memory here.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Reported-by: Eli Britstein <elibr@mellanox.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049768.html
Acked-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

pvector: Use acquire-release semantics for size.

Read/write concurrency of pvector library is implemented by a temp vector
and RCU protection. Considering performance reason, insertion does not
follow this scheme.
In insertion function, a thread fence ensures size increment is done
after new entry is stored. But there is no barrier in the iteration
fuction(pvector_cursor_init). Entry point access may be reordered before
loading vector size, so the invalid entry point may be loaded when vector
iteration.
This patch fixes it by acquire-release pair. It can guarantee new size is
observed by reader after new entry stored by writer. And this is
implemented by one-way barrier instead of two-way memory fence.

Fixes: fe7cfa5c3f19 ("lib/pvector: Non-intrusive RCU priority vector.")
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>