RFC: server integration test suite improvements #6951

sassela · 2021-05-19T09:39:31Z

Proposed problems and solutions

🙏🏽 But first, a grateful acknowledgement of past & ongoing contributions to an extensive integration test suite and significant CI build & test run time improvements.

🚀 test run time: #7136

it currently takes 15+ mins on a beefy desktop to run the postgres integration test suite; we can expect significant slowdown as we add backends. Coupled with hour-long CI build times (in P95) this is prohibitive to testing and integrating changes quickly.

suggested goal:
reduce local integration test run time from ~15mins to [insert suggestions here. A quick Google indicates that a reasonable run time is 3-5 minutes].

suggested improvements

confirm that running the entire suite locally is a common workflow. I only have a few data points including my own experience to suggest this is even an issue.
document running tests in parallel as part of the local development process.
optimise metadata operations, as suggested in recent work improving integration tests.
figure out, or document, how to easily identify relevant tests for a particular feature, such that devs don't have to run the entire suite.
- combine a haskell code coverage report with the local git diff

🎉 testing new backends: #7137

adding test support for a new backend (e.g. SQL Server, BigQuery) is currently a 4 or 5-step process involving a lot of code and yaml-file duplication, with many minor, manual and error-prone tweaks (see linked examples). This is likely to discourage extensive testing and community contribution.

suggested goal: eliminate/minimise duplication of code and test files for backend-agnostic feature tests and optimise workflow for backend-specific feature testing.

suggested improvements

restructure tests into backend-specific and -agnostic tests so the former is clearer to identify and contribute to.
introduce some generator or intermediate representation for test setup/teardown.
- explore synth for alt-backend test data generation given a Postgres data set

❤️ debugging/fixing tests: #7138

our pytest output has a pretty low signal:noise ratio, and it's often difficult to understand the cause of a test failure, if there is one. Example: a 8600-line log for what turned out to be a flaky test that passed on a second run.

suggested improvements

there's almost certainly some pytest configuration that would help with this. Adjust circleci config and local testing scripts to reduce verbosity.
better isolate (or reduce) stateful tests.

👀 reasoning about test state and coverage: #7139

identifying which tests are actually run and which setup/teardown files support them seems to be a manual lookup of test files to string filepaths. Example. This contributes to cargo-culting redundant files and makes refactoring/unused file cleanup difficult.

goal: unsure, to be honest. Is it enough to find and remove dead code regularly, so that we have more confidence that the existing files are relevant? In which case...

suggested improvements

explore test coverage / dead code elimination tools such as vulture and pytest-cov
introduce a set of setup/teardown files per test class, following an obvious naming convention

🏗️ consistent setup between CI and local: #7140

Request for comments

Hasura team: I think you should be able to edit this description directly, please do so with additions or corrections.
Otherwise, please comment below with anything else you'd like to see improved with the server integration test workflow.
I've just added emojis to each title as an experiment. If you've experienced some pain in that part of the workflow, but don't want to write anything, just react with that emoji. We may or may not use this to prioritise what to work on 😅

What happens next

I'll leave this open for at least a few weeks to gather feedback and suggestions. However, the team's quite keen to address these kinds of ongoing improvements, so will likely prioritise some of them in our upcoming sprints and update this issue accordingly.

nicuveo · 2021-05-19T14:29:08Z

Suggestion for a new problem + solution:

🏗️ consistent setup between CI and local

Different parts of the test suite rely on different assumptions: that there's a version of graphql-engine running, that it was started with some specific environment variables, that the proper environment variables are set... And there isn't one consistent way of running the tests; the CI uses a verbose run-tests.sh, which runs all or parts of the test suite with different configurations, while locally, for devs, dev.sh test --integration is the preferred way. This leads to a lot of complexity and errors:

some tests are always skipped locally
new tests can work on CI but break local workflows
the CI is probably testing more than it needs by running the suite in each configuration
the metadata database running separately from the tests means that it is persisted from one group of tests to the next, and that previous test failures can impact other unrelated tests down the line

goal: running the tests should be done the same way on CI and locally, and the test suite itself should be responsible for its own setup

suggested improvements:

invert the dependency: it's the test suite that, for each feature, decides how the engine should be started, with which environment flags

jberryman · 2021-05-19T22:49:33Z

Thanks for putting this together Abby!

figure out, or document, how to easily identify relevant tests for a particular feature, such that devs don't have to run the entire suite.

We could in theory hack something together with haskell code coverage, for each integration test class/module store a code coverage report, then (waves hands) combine that with the local git diff...

adding test support for a new backend (e.g. SQL Server, BigQuery) is currently a 4 or 5-step process involving a lot of code and yaml-file duplication, with many minor, manual and error-prone tweaks (see linked examples). This is likely to discourage extensive testing and community contribution.

The process linked for adding a new backend doesn't sound too bad to me, but I probably have low expectations... but I definitely agree with the goal of encouraging thorough testing against new backends.

our pytest output has a pretty low signal:noise ratio, and it's often difficult to understand the cause of a test failure, if there is one. Example: a 8600-line log for what turned out to be a flaky test that passed on a second run.

One thing I've noticed is some sets of tests within a class are stateful, so if an earlier test fails the later ones will too. The appropriate thing is I think to just make them one big test (commenting the different phases).

identifying which tests are actually run and which setup/teardown files support them seems to be a manual lookup of test files to string filepaths. Example. This contributes to cargo-culting redundant files and makes refactoring/unused file cleanup difficult.

Ya it's annoying to match setup/teardown specs to tests... I think an improvement would be to have every test class have its own set of setup/teardown files, following some obvious naming convention, even if that meant duplicating some of these files

sassela · 2021-05-24T09:06:01Z

Thanks so much for these suggestions @jberryman, I've included them in the description above.

The process linked for adding a new backend doesn't sound too bad to me

I hope Evie doesn't mind me linking to her PR as an example to illustrate that the process in practice turns out to be much more clunky.

The request format for v1/query (existing Postgres tests) and v2/query (new backend tests) differs ever so slightly, and new backend tests require split setup files for v1/metadata and v2/query requests, which introduces a lot of almost entirely duplicated files, with minor manual alterations.

Combined with unclear test logs, testing new backends in bulk can be difficult due to errors or test failures that might be caused by SQL syntax, .yaml file format, missing setup, stateful tests or any combination thereof. In the linked example, this was prohibitive enough to delay testing of an entire feature set (functions & mutations).

eviefp · 2021-05-24T11:31:33Z

For "reasoning about test state and coverage":

goal: unsure, to be honest. Is it enough to find and remove dead code regularly, so that we have more confidence that the existing files are relevant? In which case...

How about unifying the way we do setup/teardown such that there is a single way to do so? The syntax might become slightly more complex, but the advantage is that it'll always be clear how to do it, and there's no more redundancies.

Additionally, I think adding new tests should only rely on creating new files. It's really easy to forget to update the Python classes to add the appropriate method and get PRs accepted.

sassela · 2021-06-28T15:34:03Z

I've created separate issues for each of the proposed goals listed above; we'll aim to prioritise these in the server team sprints in the next few weeks, but please consider these individual issues open to feedback, discussion and contribution.

Otherwise closing this RFC with the plan to keep the individual issues updated.

sassela added the k/rfc label May 19, 2021

sassela mentioned this issue May 19, 2021

Open RFCs: Give us your input! #4181

Open

sassela changed the title ~~RFC: server integration test improvements~~ RFC: server integration test suite improvements May 19, 2021

sassela mentioned this issue Jun 28, 2021

server: reduce (pytest) integration test run time #7136

Open

sassela mentioned this issue Jun 28, 2021

server/backends: improve testing workflow for alternative data sources #7137

Closed

sassela mentioned this issue Jun 28, 2021

server: improve pytest integration test signal:noise ratio #7138

Open

sassela mentioned this issue Jun 28, 2021

server: streamline integration test setup & teardown #7139

Open

sassela mentioned this issue Jun 28, 2021

server: unify CI and local integration test setup #7140

Open

sassela closed this Jun 28, 2021

hasura / graphql-engine Public

RFC: server integration test suite improvements #6951

RFC: server integration test suite improvements #6951

sassela commented May 19, 2021 •

edited

nicuveo commented May 19, 2021

jberryman commented May 19, 2021 •

edited

sassela commented May 24, 2021 •

edited

eviefp commented May 24, 2021

sassela commented Jun 28, 2021

hasura / graphql-engine Public

RFC: server integration test suite improvements #6951

RFC: server integration test suite improvements #6951

Comments

sassela commented May 19, 2021 • edited

Proposed problems and solutions

🚀 test run time: #7136

🎉 testing new backends: #7137

❤️ debugging/fixing tests: #7138

👀 reasoning about test state and coverage: #7139

🏗️ consistent setup between CI and local: #7140

Request for comments

What happens next

nicuveo commented May 19, 2021

🏗️ consistent setup between CI and local

jberryman commented May 19, 2021 • edited

sassela commented May 24, 2021 • edited

eviefp commented May 24, 2021

sassela commented Jun 28, 2021

sassela commented May 19, 2021 •

edited

jberryman commented May 19, 2021 •

edited

sassela commented May 24, 2021 •

edited