Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Closed
Kouprin opened this issue Feb 16, 2021 · 3 comments
Closed

Comments

@Kouprin
Copy link
Member

Kouprin commented Feb 16, 2021

If any test in CI gets stuck, the CI will be finished after 1 hour by buildkite termination. After that, the only way to understand what's happened is to:

  1. Download stderr.
  2. Find the latest line "test <test_name> ... test <test_name> has been running for over 60 seconds".
  3. <test_name> is the test which got timeout.

This is very inconvenient way to get some basic knowledge about failed tests. I suggest to make tests handle their timeouts like catching_up.rs tests do OR (preferable) terminate them automatically after 60 seconds. I also propose a recommendation "if the test may work >60 seconds, put it to the Nightly, not CI". As we already can understand that test is running >60 seconds, it should be not a big deal to terminate it. :)

Ideally, I'd love to see friendly interface that says which tests failed with no grepping logs for specific substrings. Finding failed tests is routine operation that should be done many times per day, on each unsuccessful run of CI.

@Kouprin Kouprin changed the title Infinite tests execution in CI Handle timed out tests in CI / display reasonable data about failed tests in CI Feb 16, 2021
@bowenwang1996
Copy link
Collaborator

@matklad do you have any good suggestions for this issue?

@matklad
Copy link
Contributor

matklad commented Feb 17, 2021

I have a negative result here: in Rust, only cooperative cancellation is possible. That is, each test should enforce its own timeout (but of course we can abstract this in some kind of library function).

So I'd treat a hanging test as a bug in the test itself, and would fix the test to fail with a timeout error, before fixing the actual bug being exposed.

We can, of course, have some kind of a wrapper process which kills the whole cargo test if it doesn't observe passed/failed tests within a certain timeout, but my gut feeling is that enforcing timeouts from the inside, while being more work, will help with ironing out subtle concurrency bugs.

Separately, +1 for "if the test may work >60 seconds, put it to the Nightly, not CI". There are rust-lang/rust#75752 and rust-lang/rust#64663 (comment) which can help with finding out existing slow tests.

@bowenwang1996 bowenwang1996 removed their assignment Jun 28, 2021
@bowenwang1996
Copy link
Collaborator

Looks like there is no action item here. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants