Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Kouprin · 2021-02-16T11:53:54Z

If any test in CI gets stuck, the CI will be finished after 1 hour by buildkite termination. After that, the only way to understand what's happened is to:

Download stderr.
Find the latest line "test <test_name> ... test <test_name> has been running for over 60 seconds".
<test_name> is the test which got timeout.

This is very inconvenient way to get some basic knowledge about failed tests. I suggest to make tests handle their timeouts like catching_up.rs tests do OR (preferable) terminate them automatically after 60 seconds. I also propose a recommendation "if the test may work >60 seconds, put it to the Nightly, not CI". As we already can understand that test is running >60 seconds, it should be not a big deal to terminate it. :)

Ideally, I'd love to see friendly interface that says which tests failed with no grepping logs for specific substrings. Finding failed tests is routine operation that should be done many times per day, on each unsuccessful run of CI.

bowenwang1996 · 2021-02-17T01:14:27Z

@matklad do you have any good suggestions for this issue?

matklad · 2021-02-17T09:55:27Z

I have a negative result here: in Rust, only cooperative cancellation is possible. That is, each test should enforce its own timeout (but of course we can abstract this in some kind of library function).

So I'd treat a hanging test as a bug in the test itself, and would fix the test to fail with a timeout error, before fixing the actual bug being exposed.

We can, of course, have some kind of a wrapper process which kills the whole cargo test if it doesn't observe passed/failed tests within a certain timeout, but my gut feeling is that enforcing timeouts from the inside, while being more work, will help with ironing out subtle concurrency bugs.

Separately, +1 for "if the test may work >60 seconds, put it to the Nightly, not CI". There are rust-lang/rust#75752 and rust-lang/rust#64663 (comment) which can help with finding out existing slow tests.

bowenwang1996 · 2021-06-28T23:45:10Z

Looks like there is no action item here. Closing

Kouprin assigned bowenwang1996 Feb 16, 2021

Kouprin changed the title ~~Infinite tests execution in CI~~ Handle timed out tests in CI / display reasonable data about failed tests in CI Feb 16, 2021

bowenwang1996 removed their assignment Jun 28, 2021

bowenwang1996 closed this as completed Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Kouprin commented Feb 16, 2021

bowenwang1996 commented Feb 17, 2021

matklad commented Feb 17, 2021

bowenwang1996 commented Jun 28, 2021

Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Handle timed out tests in CI / display reasonable data about failed tests in CI #3963

Comments

Kouprin commented Feb 16, 2021

bowenwang1996 commented Feb 17, 2021

matklad commented Feb 17, 2021

bowenwang1996 commented Jun 28, 2021