Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled #10032

Closed

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented May 21, 2022

Summary: In case of non sequential reads with async_io, FilePRefetchBuffer::TryReadFromCacheAsync can be called for previous blocks with offset < bufs_[curr_].offset_ which wasn't handled correctly resulting wrong data being returned from buffer.

Since FilePRefetchBuffer::PrefetchAsync can be called for any data block, it sets prev_len_ to 0 indicating FilePRefetchBuffer::TryReadFromCacheAsync to go for the prefetching even though offset < bufs_[curr_].offset_ This is because async prefetching is always done in second buffer (to avoid mutex) even though curr_ is empty leading to offset < bufs_[curr_].offset_ in some cases.
If prev_len_ is non zero then TryReadFromCacheAsync returns false if offset < bufs_[curr_].offset_ && prev_len != 0 indicating reads are not sequential and previous call wasn't PrefetchAsync.

  • This PR also simplifies FilePRefetchBuffer::TryReadFromCacheAsync as it was getting complicated covering different scenarios based on async_io enabled/disabled. If for_compaction is set true, it now calls FilePRefetchBufferTryReadFromCache following synchronous flow as before. Its decided in BlockFetcher.cc

Test Plan:

  1. export CRASH_TEST_EXT_ARGS=" --async_io=1"
    make crash_test -j completed successfully locally
  2. make crash_test -j completed successfully locally
  3. Reran CircleCi mini crashtest job 4 - 5 times.
  4. Updated prefetch_test for more coverage.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

// are not sequential and Non sequential reads will be handled there.
if (!enable_ || (offset < bufs_[curr_].offset_ && async_io_ == false)) {
// requested bytes.
if (bufs_[curr_].buffer_.CurrentSize() > 0 && offset < bufs_[curr_].offset_ &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check basically fixes failure.

@akankshamahajan15 akankshamahajan15 changed the title [WIP] Test CircleCI crash test passing Fix stress test failure Corruption: checksum mismatch with async_io enabled May 22, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15 akankshamahajan15 changed the title Fix stress test failure Corruption: checksum mismatch with async_io enabled Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled May 23, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants