Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race: completion: cannot get cgroup path unless container is running #23334

Closed
edsantiago opened this issue Jul 18, 2024 · 4 comments · Fixed by #23341 or #23357
Closed

race: completion: cannot get cgroup path unless container is running #23334

edsantiago opened this issue Jul 18, 2024 · 4 comments · Fixed by #23341 or #23357
Assignees
Labels
flakes Flakes from Continuous Integration

Comments

@edsantiago
Copy link
Member

Another parallel-system-test flake, seen in f39 root:

<+036ms> # # podman pod stats --no-stream --format {{"\n"}}
<+198ms> # Error: cannot get cgroup path unless container e3c650c8438160f0c2728fc4ed589bc6c06546e6abfba38db49d5d15b9325310 is running: container is stopped
<+016ms> # [ rc=125 ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: podman pod stats --format '{{"\n"}}'
         # #| expected: = ''
         # #|   actual:   Error: cannot get cgroup path unless container e3c650c8438160f0c2728fc4ed589bc6c06546e6abfba38db49d5d15b9325310 is running: container is stopped
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jul 18, 2024
@Luap99 Luap99 self-assigned this Jul 19, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 19, 2024

Reproducer
terminal 1

while :; do bin/podman run --rm -d quay.io/libpod/testimage:20240123 sleep 1; sleep 0.1; done

terminal 2

$ bin/podman stats --interval 1
<--shows stats for a bit and then fails with
Error: cannot get cgroup path unless container xxx is running: container is stopped

Likely another case of must ignore the "container is stopped" error when looping over all containers.

edsantiago pushed a commit to edsantiago/libpod that referenced this issue Jul 19, 2024
stats read from the cgroup, and in order to know the cgroup we check the
pid for the cgroup. However there is a window where the pid exited and
podman did not yet updated its internal state. In this case the code
returns ErrCtrStopped so we should ignore this error as well.

Fixes containers#23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member

Luap99 commented Jul 19, 2024

The reproducer failed quickly for me, with my PR it looks stable now (running for 15 mins)

edsantiago pushed a commit to edsantiago/libpod that referenced this issue Jul 20, 2024
stats read from the cgroup, and in order to know the cgroup we check the
pid for the cgroup. However there is a window where the pid exited and
podman did not yet updated its internal state. In this case the code
returns ErrCtrStopped so we should ignore this error as well.

Fixes containers#23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@edsantiago
Copy link
Member Author

Sorry, still happening

@edsantiago edsantiago reopened this Jul 20, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 22, 2024

Oh sorry, I was looking at podman stats all this time, podman pod stats works slightly different and doesn't make use of the "all" option which means the error is not ignored there.

I still fixed a valid issue with podman stats at least but pod stats uses a different code path...

Luap99 added a commit to Luap99/libpod that referenced this issue Jul 22, 2024
Like commit 55749af but for podman *pod* stats not the normal podman
stats. We must ignore ErrCtrStopped here as well as this will happen
when the container process exited.

Fixes containers#23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 22, 2024
Like commit 55749af but for podman *pod* stats not the normal podman
stats. We must ignore ErrCtrStopped here as well as this will happen
when the container process exited.

While at it remove a useless argument from the function as it was always
nil and restructure the logic flow to make it easier to read.

Fixes containers#23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 22, 2024
Like commit 55749af but for podman *pod* stats not the normal podman
stats. We must ignore ErrCtrStopped here as well as this will happen
when the container process exited.

While at it remove a useless argument from the function as it was always
nil and restructure the logic flow to make it easier to read.

Fixes containers#23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration
Projects
None yet
2 participants