Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: server integration test suite improvements #6951

Closed
sassela opened this issue May 19, 2021 · 5 comments
Closed

RFC: server integration test suite improvements #6951

sassela opened this issue May 19, 2021 · 5 comments
Labels
k/rfc

Comments

@sassela
Copy link
Member

@sassela sassela commented May 19, 2021

Proposed problems and solutions

🙏🏽 But first, a grateful acknowledgement of past & ongoing contributions to an extensive integration test suite and significant CI build & test run time improvements.

🚀 test run time: #7136

it currently takes 15+ mins on a beefy desktop to run the postgres integration test suite; we can expect significant slowdown as we add backends. Coupled with hour-long CI build times (in P95) this is prohibitive to testing and integrating changes quickly.

suggested goal:
reduce local integration test run time from ~15mins to [insert suggestions here. A quick Google indicates that a reasonable run time is 3-5 minutes].

suggested improvements

  • confirm that running the entire suite locally is a common workflow. I only have a few data points including my own experience to suggest this is even an issue.
  • document running tests in parallel as part of the local development process.
  • optimise metadata operations, as suggested in recent work improving integration tests.
  • figure out, or document, how to easily identify relevant tests for a particular feature, such that devs don't have to run the entire suite.
    • combine a haskell code coverage report with the local git diff

🎉 testing new backends: #7137

adding test support for a new backend (e.g. SQL Server, BigQuery) is currently a 4 or 5-step process involving a lot of code and yaml-file duplication, with many minor, manual and error-prone tweaks (see linked examples). This is likely to discourage extensive testing and community contribution.

suggested goal: eliminate/minimise duplication of code and test files for backend-agnostic feature tests and optimise workflow for backend-specific feature testing.

suggested improvements

  • restructure tests into backend-specific and -agnostic tests so the former is clearer to identify and contribute to.
  • introduce some generator or intermediate representation for test setup/teardown.
    • explore synth for alt-backend test data generation given a Postgres data set

❤️ debugging/fixing tests: #7138

our pytest output has a pretty low signal:noise ratio, and it's often difficult to understand the cause of a test failure, if there is one. Example: a 8600-line log for what turned out to be a flaky test that passed on a second run.

suggested improvements

  • there's almost certainly some pytest configuration that would help with this. Adjust circleci config and local testing scripts to reduce verbosity.
  • better isolate (or reduce) stateful tests.

👀 reasoning about test state and coverage: #7139

identifying which tests are actually run and which setup/teardown files support them seems to be a manual lookup of test files to string filepaths. Example. This contributes to cargo-culting redundant files and makes refactoring/unused file cleanup difficult.

goal: unsure, to be honest. Is it enough to find and remove dead code regularly, so that we have more confidence that the existing files are relevant? In which case...

suggested improvements

  • explore test coverage / dead code elimination tools such as vulture and pytest-cov
  • introduce a set of setup/teardown files per test class, following an obvious naming convention

🏗️ consistent setup between CI and local: #7140

Request for comments

  • Hasura team: I think you should be able to edit this description directly, please do so with additions or corrections.
  • Otherwise, please comment below with anything else you'd like to see improved with the server integration test workflow.
  • I've just added emojis to each title as an experiment. If you've experienced some pain in that part of the workflow, but don't want to write anything, just react with that emoji. We may or may not use this to prioritise what to work on 😅

What happens next

I'll leave this open for at least a few weeks to gather feedback and suggestions. However, the team's quite keen to address these kinds of ongoing improvements, so will likely prioritise some of them in our upcoming sprints and update this issue accordingly.

@sassela sassela added the k/rfc label May 19, 2021
@sassela sassela changed the title RFC: server integration test improvements RFC: server integration test suite improvements May 19, 2021
@nicuveo
Copy link
Member

@nicuveo nicuveo commented May 19, 2021

Suggestion for a new problem + solution:

🏗️ consistent setup between CI and local

Different parts of the test suite rely on different assumptions: that there's a version of graphql-engine running, that it was started with some specific environment variables, that the proper environment variables are set... And there isn't one consistent way of running the tests; the CI uses a verbose run-tests.sh, which runs all or parts of the test suite with different configurations, while locally, for devs, dev.sh test --integration is the preferred way. This leads to a lot of complexity and errors:

  • some tests are always skipped locally
  • new tests can work on CI but break local workflows
  • the CI is probably testing more than it needs by running the suite in each configuration
  • the metadata database running separately from the tests means that it is persisted from one group of tests to the next, and that previous test failures can impact other unrelated tests down the line

goal: running the tests should be done the same way on CI and locally, and the test suite itself should be responsible for its own setup

suggested improvements:

  • invert the dependency: it's the test suite that, for each feature, decides how the engine should be started, with which environment flags

@jberryman
Copy link
Collaborator

@jberryman jberryman commented May 19, 2021

Thanks for putting this together Abby!

  • figure out, or document, how to easily identify relevant tests for a particular feature, such that devs don't have to run the entire suite.

We could in theory hack something together with haskell code coverage, for each integration test class/module store a code coverage report, then (waves hands) combine that with the local git diff...

adding test support for a new backend (e.g. SQL Server, BigQuery) is currently a 4 or 5-step process involving a lot of code and yaml-file duplication, with many minor, manual and error-prone tweaks (see linked examples). This is likely to discourage extensive testing and community contribution.

The process linked for adding a new backend doesn't sound too bad to me, but I probably have low expectations... but I definitely agree with the goal of encouraging thorough testing against new backends.

our pytest output has a pretty low signal:noise ratio, and it's often difficult to understand the cause of a test failure, if there is one. Example: a 8600-line log for what turned out to be a flaky test that passed on a second run.

One thing I've noticed is some sets of tests within a class are stateful, so if an earlier test fails the later ones will too. The appropriate thing is I think to just make them one big test (commenting the different phases).

identifying which tests are actually run and which setup/teardown files support them seems to be a manual lookup of test files to string filepaths. Example. This contributes to cargo-culting redundant files and makes refactoring/unused file cleanup difficult.

Ya it's annoying to match setup/teardown specs to tests... I think an improvement would be to have every test class have its own set of setup/teardown files, following some obvious naming convention, even if that meant duplicating some of these files

@sassela
Copy link
Member Author

@sassela sassela commented May 24, 2021

Thanks so much for these suggestions @jberryman, I've included them in the description above.

The process linked for adding a new backend doesn't sound too bad to me

I hope Evie doesn't mind me linking to her PR as an example to illustrate that the process in practice turns out to be much more clunky.

The request format for v1/query (existing Postgres tests) and v2/query (new backend tests) differs ever so slightly, and new backend tests require split setup files for v1/metadata and v2/query requests, which introduces a lot of almost entirely duplicated files, with minor manual alterations.

Combined with unclear test logs, testing new backends in bulk can be difficult due to errors or test failures that might be caused by SQL syntax, .yaml file format, missing setup, stateful tests or any combination thereof. In the linked example, this was prohibitive enough to delay testing of an entire feature set (functions & mutations).

@eviefp
Copy link
Contributor

@eviefp eviefp commented May 24, 2021

For "reasoning about test state and coverage":

goal: unsure, to be honest. Is it enough to find and remove dead code regularly, so that we have more confidence that the existing files are relevant? In which case...

How about unifying the way we do setup/teardown such that there is a single way to do so? The syntax might become slightly more complex, but the advantage is that it'll always be clear how to do it, and there's no more redundancies.

Additionally, I think adding new tests should only rely on creating new files. It's really easy to forget to update the Python classes to add the appropriate method and get PRs accepted.

@sassela
Copy link
Member Author

@sassela sassela commented Jun 28, 2021

I've created separate issues for each of the proposed goals listed above; we'll aim to prioritise these in the server team sprints in the next few weeks, but please consider these individual issues open to feedback, discussion and contribution.

Otherwise closing this RFC with the plan to keep the individual issues updated.

@sassela sassela closed this Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k/rfc
Projects
None yet
Development

No branches or pull requests

4 participants