]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/commitdiff
dm kcopyd: always complete failed jobs
authorDmitry Fomichev <dmitry.fomichev@wdc.com>
Mon, 5 Aug 2019 23:56:03 +0000 (16:56 -0700)
committerKhalid Elmously <khalid.elmously@canonical.com>
Wed, 4 Sep 2019 20:23:27 +0000 (16:23 -0400)
BugLink: https://bugs.launchpad.net/bugs/1842114
commit d1fef41465f0e8cae0693fb184caa6bfafb6cd16 upstream.

This patch fixes a problem in dm-kcopyd that may leave jobs in
complete queue indefinitely in the event of backing storage failure.

This behavior has been observed while running 100% write file fio
workload against an XFS volume created on top of a dm-zoned target
device. If the underlying storage of dm-zoned goes to offline state
under I/O, kcopyd sometimes never issues the end copy callback and
dm-zoned reclaim work hangs indefinitely waiting for that completion.

This behavior was traced down to the error handling code in
process_jobs() function that places the failed job to complete_jobs
queue, but doesn't wake up the job handler. In case of backing device
failure, all outstanding jobs may end up going to complete_jobs queue
via this code path and then stay there forever because there are no
more successful I/O jobs to wake up the job handler.

This patch adds a wake() call to always wake up kcopyd job wait queue
for all I/O jobs that fail before dm_io() gets called for that job.

The patch also sets the write error status in all sub jobs that are
failed because their master job has failed.

Fixes: b73c67c2cbb00 ("dm kcopyd: add sequential write feature")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
drivers/md/dm-kcopyd.c

index 758bacd449c459e2c382b1907cd8e5989cacf2cc..2f375edb8518bc9f27984042ad72a565df30d806 100644 (file)
@@ -545,8 +545,10 @@ static int run_io_job(struct kcopyd_job *job)
         * no point in continuing.
         */
        if (test_bit(DM_KCOPYD_WRITE_SEQ, &job->flags) &&
-           job->master_job->write_err)
+           job->master_job->write_err) {
+               job->write_err = job->master_job->write_err;
                return -EIO;
+       }
 
        io_job_start(job->kc->throttle);
 
@@ -598,6 +600,7 @@ static int process_jobs(struct list_head *jobs, struct dm_kcopyd_client *kc,
                        else
                                job->read_err = 1;
                        push(&kc->complete_jobs, job);
+                       wake(kc);
                        break;
                }