Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'failed to get network during CreateEndpoint' #2554

Merged
merged 1 commit into from
Jun 2, 2020

Conversation

xinfengliu
Copy link
Contributor

Fixes docker/for-linux#888

Fix 'failed to get network during CreateEndpoint' during container starting.
Change the error type to libnetwork.ErrNoSuchNetwork, so Start() in docker engine daemon/cluster/executor/container/controller.go will recreate the network.

Signed-off-by: Xinfeng Liu xinfeng.liu@gmail.com

@arkodg
Copy link
Contributor

arkodg commented May 22, 2020

@xinfengliu thanks for solving this hairy issue ! would appreciate it if you could add a test case here or in https://github.com/moby/moby/blob/master/integration/network/network_test.go to recreate it

@arkodg arkodg requested a review from cpuguy83 May 22, 2020 07:45
@xinfengliu
Copy link
Contributor Author

xinfengliu commented May 22, 2020

@arkodg
Thanks for the quick feedback. The problem is that I could not recreate this issue in a predicatable way, so it is difficult to add the test case in CI. For details, you can see my notes in moby/moby#41011 (I did manually verify the fix works as expected) .

@arkodg
Copy link
Contributor

arkodg commented May 22, 2020

you could mimic docker/for-linux#888 (comment) via an integration test
sidenote - had no idea we had a retry logic in the executor, TIL :)

@xinfengliu
Copy link
Contributor Author

@arkodg

Thanks for the test case (docker/for-linux#888 (comment)). Sorry I didn't notice that.

Your use case (restarting a container which is on an attachable overlay network and the container was started via docker run) is different from mine but results to the same error. For your use case, my current fix does not cover it because your use case will go to a different code path.

'docker restart' on an attachable overlay network will cause swarmkit reconcileTaskState to run and remove the network. It causes race condition with starting container.

I'll look into this to see if there's any solutions.

@xinfengliu
Copy link
Contributor Author

@arkodg

I have fixed the error caused by your use case. Please see my updated notes in moby/moby#41011 about scenario 2.

I will try adding tests in moby integration later time.

network.go Outdated
@@ -1181,7 +1181,8 @@ func (n *network) createEndpoint(name string, options ...EndpointOption) (Endpoi
ep.locator = n.getController().clusterHostID()
ep.network, err = ep.getNetworkFromStore()
if err != nil {
return nil, fmt.Errorf("failed to get network during CreateEndpoint: %v", err)
logrus.Errorf("failed to get network during CreateEndpoint: %v", err)
return nil, ErrNoSuchNetwork(n.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(left as a comment on the Moby PR as well)

I'd prefer ep.getNetworkFromStore() to return this error-type.

While it may be true currently that this is always due to the network not being found, it may not be in future. I don't think we should make the assumption here that this is a ErrNoSuchNetwork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking for possible other errors it could return, I noticed there some unused error-handling; opened #2556

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed return err in store.go instead to make it a typed error.

@@ -1181,7 +1181,8 @@ func (n *network) createEndpoint(name string, options ...EndpointOption) (Endpoi
ep.locator = n.getController().clusterHostID()
ep.network, err = ep.getNetworkFromStore()
if err != nil {
return nil, fmt.Errorf("failed to get network during CreateEndpoint: %v", err)
logrus.Errorf("failed to get network during CreateEndpoint: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also left as a comment on the moby PR);

I'm a bit on the fence if we should log this here, given that we're already "handling" the error by returning it. If We want the additional information, perhaps errors.Wrap(err, "failed to get network during createEndpoint") (or something along those lines?).

Or perhaps more consistent with the other errors, and actually put this in the exported (CreateEndpoint)

Open to suggestions though

Fix 'failed to get network during CreateEndpoint' during container starting.
Change the error type to `libnetwork.ErrNoSuchNetwork`, so `Start()` in `daemon/cluster/executor/container/controller.go` will recreate the network.

Signed-off-by: Xinfeng Liu <xinfeng.liu@gmail.com>
@xinfengliu
Copy link
Contributor Author

@arkodg I have added test case for docker/for-linux#888 (comment) in moby/moby#41011 integration test.

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

@arkodg
Copy link
Contributor

arkodg commented Jun 1, 2020

I have reservations about #2554 (comment)
but we can tackle those later :)

@xinfengliu xinfengliu requested a review from thaJeztah June 2, 2020 03:04
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@selansen @arkodg ready to go?

@arkodg arkodg merged commit 868f23b into moby:master Jun 2, 2020
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Jul 8, 2020
full diff: moby/libnetwork@2e24aed...9e99af2

- moby/libnetwork#2548 Add docker interfaces to firewalld docker zone
    - fixes docker/for-linux#957 DNS Not Resolving under Network [CentOS8]
    - fixes moby/libnetwork#2496 Port Forwarding does not work on RHEL 8 with Firewalld running with FirewallBackend=nftables
- store.getNetworksFromStore() remove unused error return
- moby/libnetwork#2554 Fix 'failed to get network during CreateEndpoint'
    - fixes/addresses docker/for-linux#888 failed to get network during CreateEndpoint
- moby/libnetwork#2558 [master] bridge: disable IPv6 router advertisements
- moby/libnetwork#2563 log error instead if disabling IPv6 router advertisement failed
    - fixes docker/for-linux#1033 Shouldn't be fatal: Unable to disable IPv6 router advertisement: open /proc/sys/net/ipv6/conf/docker0/accept_ra: read-only file system

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Jul 13, 2020
full diff: moby/libnetwork@2e24aed...9e99af2

- moby/libnetwork#2548 Add docker interfaces to firewalld docker zone
    - fixes docker/for-linux#957 DNS Not Resolving under Network [CentOS8]
    - fixes moby/libnetwork#2496 Port Forwarding does not work on RHEL 8 with Firewalld running with FirewallBackend=nftables
- store.getNetworksFromStore() remove unused error return
- moby/libnetwork#2554 Fix 'failed to get network during CreateEndpoint'
    - fixes/addresses docker/for-linux#888 failed to get network during CreateEndpoint
- moby/libnetwork#2558 [master] bridge: disable IPv6 router advertisements
- moby/libnetwork#2563 log error instead if disabling IPv6 router advertisement failed
    - fixes docker/for-linux#1033 Shouldn't be fatal: Unable to disable IPv6 router advertisement: open /proc/sys/net/ipv6/conf/docker0/accept_ra: read-only file system

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 219e7e7ddcf5f0314578d2a517fc0832f03622c1
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to get network during CreateEndpoint
3 participants