Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-490: Replace gzip with zstd to improve CPU constraints #649

Open
wants to merge 7 commits into
base: master
from

Conversation

@johnclaus
Copy link
Contributor

@johnclaus johnclaus commented Oct 5, 2020

This is a cursory pass at fixing the CPU constraints described in ISSUE-490 by swapping out calls to gzip with zstd.

@johnclaus johnclaus force-pushed the johnclaus:issue-490-replace-gzip-with-zstd branch from 7800fc0 to 6d3cec4 Oct 5, 2020
@johnclaus johnclaus changed the title Issue 490 replace gzip with zstd ISSUE-490: Replace gzip with zstd to improve CPU constraints Oct 5, 2020
@johnclaus
Copy link
Contributor Author

@johnclaus johnclaus commented Oct 13, 2020

@tashaemery I noticed that you approved this PR, but these tests are still failing. Are these false negatives?

Copy link
Member

@djdefi djdefi left a comment

I believe this would also need to be added into the Dockerfiles ( https://github.com/github/backup-utils/blob/master/Dockerfile.alpine / https://github.com/github/backup-utils/blob/master/Dockerfile).

I'm not super familiar with zstd, so it would be good to make sure that it widely available in common distros (Debian, CentOs etc), and we'll definitely want to call it out as a backup host requirement in the README.

cc: @github/ghes-infrastructure @github/backup-utils any thoughts as to needing to retain gzip as a fallback as well.

@johnclaus johnclaus force-pushed the johnclaus:issue-490-replace-gzip-with-zstd branch from 1b6bca0 to e73e554 Oct 16, 2020
johnclaus added 3 commits Oct 16, 2020
Copy link
Member

@snh snh left a comment

Thank you for the PR @johnclaus!

Unfortunately we do not currently distribute zstd with GitHub Enterprise Server, and as most of these zstd calls are actually occurring on the GHES appliance, rather than the host running backup-utils, these will fail.

Additionally, backup-utils needs to be able to continue to restore older backup snapshots created using gzip, which I don't believe would be possible with this PR as it stands.

@johnclaus
Copy link
Contributor Author

@johnclaus johnclaus commented Oct 28, 2020

@snh Thank you so much for your feedback and insight! I'm assuming adding zstd to the GitHub Enterprise server requires an internal PR? Let's say I change this PR to do the following:

  1. Check if zstd is installed on the appliance. If so, use zstd for all manners of compression. Otherwise, default to gzip
  2. Fix my logic around restoring old backup snapshots using gzip

Do you foresee any other reason for this PR not to land?

@snh
Copy link
Member

@snh snh commented Nov 10, 2020

Thank you so much for your feedback and insight! I'm assuming adding zstd to the GitHub Enterprise server requires an internal PR? Let's say I change this PR to do the following:

@johnclaus That's correct, I've passed this PR on to the Engineering team responsible, who will be able to offer more insight as to our roadmap here.

Regarding backwards compatibility, an approach we have used in the past is to use the remote GitHub Enterprise Server version to determine when to use a new method/approach.

If you call ghe_remote_version_required in the relevant backup-utils scripts, then $GHE_REMOTE_VERSION, along with $GHE_VERSION_MAJOR, $GHE_VERSION_MINOR, and $GHE_VERSION_PATCH will be automatically populated.

For example, say zstd landed in GitHub Enterprise Server 2.25.0:

ghe_remote_version_required "$GHE_HOSTNAME"

if [ "$GHE_VERSION_MAJOR" -eq 2 ] && [ "$GHE_VERSION_MINOR" -ge 25 ] || [ "$GHE_VERSION_MAJOR" -ge 3 ]; then
  compress_command="zstd"
else
  compress_command="gzip"
fi

And then use $compress_command in place of gzip/zstd.

@snh
Copy link
Member

@snh snh commented Nov 10, 2020

Something else that came to mind along the line of what @djdefi raised is that introducing zstd as a dependency on the backup-utils host may not be something we want to do in the short term due to the wide variety of systems out there running backup-utils. Some of these systems are in highly controlled environments where installing an additional package can be a challenge.

With this in mind, it might be worth limiting the initial implementation of zstd to those cases where it is ran on the GHES appliance.

This'd mean moving the routes list uses back to gzip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.