Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/e2e: on test failures dump server stack strace #23631

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Aug 15, 2024

To debug #22246, not sure if this will work. We likely kill the client before which means the connection is likely closed and we do not see the hang or whatever happens.

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels Aug 15, 2024
Copy link
Contributor

openshift-ci bot commented Aug 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2024
Copy link

We were not able to find or create Copr project packit/containers-podman-23631 specified in the config with the following error:

Packit received HTTP 500 Internal Server Error from Copr Service. Check the Copr status page: https://copr.fedorainfracloud.org/status/stats/, or ask for help in Fedora Build System matrix channel https://matrix.to/#/#buildsys:fedoraproject.org.

Unless the HTTP status code above is >= 500, please check your configuration for:

  1. typos in owner and project name (groups need to be prefixed with @)
  2. whether the project name doesn't contain not allowed characters (only letters, digits, underscores, dashes and dots must be used)
  3. whether the project itself exists (Packit creates projects only in its own namespace)
  4. whether Packit is allowed to build in your Copr project
  5. whether your Copr project/group is not private

@Luap99
Copy link
Member Author

Luap99 commented Aug 15, 2024

cc @edsantiago @mheon just an idea I had but I don't think it works as the client is killed before the server so I still try to figure out if I can switch the other somehow...

@edsantiago
Copy link
Member

I will cherrypick onto #17831 as soon as the current run finishes

@Luap99
Copy link
Member Author

Luap99 commented Aug 15, 2024

I will cherrypick onto #17831 as soon as the current run finishes

No point actually, this commit will do nothing. If we time-out in cleanup ginkgo panics and we do not get any further in Cleanup() as such we never call the server to stop and leak it. I need to rework the code quite a bit to make this work...

To debug containers#22246

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member Author

Luap99 commented Aug 15, 2024

Ok this version might be good enough, but looking at this I really need to fix the cleanup logic there. One command times out and we leak stuff...

@edsantiago
Copy link
Member

HTH

@Luap99
Copy link
Member Author

Luap99 commented Aug 16, 2024

HTH

github.com/containers/podman/v5/libpod.(*ConmonOCIRuntime).UpdateContainerStatus(0xc000210000, 0xc000296000)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/oci_conmon_common.go:231](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/oci_conmon_common.go#L231) +0x40d fp=0xc0003e9920 sp=0xc0003e9730 pc=0x14a4a2d
           github.com/containers/podman/v5/libpod.(*ConmonOCIRuntime).killContainer(0xc000210000, 0xc000296000, 0xf, 0x0, 0x1)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/oci_conmon_common.go:386](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/oci_conmon_common.go#L386) +0x596 fp=0xc0003e9ab0 sp=0xc0003e9920 pc=0x14a6696
           github.com/containers/podman/v5/libpod.(*ConmonOCIRuntime).StopContainer.func1(0x35838?)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/oci_conmon_common.go:415](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/oci_conmon_common.go#L415) +0x38 fp=0xc0003e9b40 sp=0xc0003e9ab0 pc=0x14a6ef8
           github.com/containers/podman/v5/libpod.(*ConmonOCIRuntime).StopContainer(0xc000210000, 0xc000296000, 0xa, 0x0)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/oci_conmon_common.go:460](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/oci_conmon_common.go#L460) +0x196 fp=0xc0003e9ca0 sp=0xc0003e9b40 pc=0x14a6ad6
           github.com/containers/podman/v5/libpod.(*Container).stop(0xc000296000, 0xa)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/container_internal.go:1415](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/container_internal.go#L1415) +0x476 fp=0xc0003e9e38 sp=0xc0003e9ca0 pc=0x1450116
           github.com/containers/podman/v5/libpod.(*Container).StopWithTimeout(0xc000296000?, 0xa?)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/container_api.go:282](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/container_api.go#L282) +0xd7 fp=0xc0003e9eb0 sp=0xc0003e9e38 pc=0x142bc97
           github.com/containers/podman/v5/libpod.(*Container).Stop(...)
           	/var/tmp/go/src/github.com[/containers/podman/libpod/container_api.go:252](https://github.com/containers/podman/blob/c1ffa97d0cc2a568a61856f645f6f3c032b77e02/libpod/container_api.go#L252)
           github.com/containers/podman/v5/libpod.(*Pod).stopWithTimeout.func1()

This seems to be the main cause of the hang, the pod listing hangs because the pod lock is taken by this currently.
The code in UpdateContainerStatus() doesn't look sane to me, there isn't really proper error handling there.

Copy link

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants