Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CBRD-23850] coredump occurs when heap_get_visible_version_from_log() function is called after executing backupdb with -r option. #2575

Merged
merged 2 commits into from
Feb 24, 2021

Conversation

hornetmj
Copy link
Contributor

http://jira.cubrid.org/browse/CBRD-23850

When executing backupdb with the -r option, the archive log volume referenced by the transaction(mvcc) is not considered. So, coredump occurs when the transaction reads the old image of mvcc after performing the backupdb. There is a test scenario for this in the comments of the related issue. Looking at that scenario will help you understand. (I defined this problem as PROBLEM-1 in the jira issue.)

… function is called after executing backupdb with -r option.
@hornetmj hornetmj added the bug label Feb 18, 2021
@hornetmj hornetmj added this to the elderberry milestone Feb 18, 2021
@hornetmj hornetmj self-assigned this Feb 18, 2021
@@ -6177,6 +6179,14 @@ logpb_remove_archive_logs (THREAD_ENTRY * thread_p, const char *info_reason)

last_deleted_arv_num--;

vacuum_first_pageid = vacuum_min_log_pageid_to_keep (thread_p);
Copy link
Contributor

@wrlawodms wrlawodms Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay not to check vacuum_is_safe_to_remove_archives() here? This is used in logpb_remove_archive_logs_exceed_limit() .

Copy link
Contributor Author

@hornetmj hornetmj Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After vacuum_boot() is called, I think there is no problem with calling vacuum_min_log_pageid_to_keep(). Do you have any other opinions on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. logpb_remove_archive_logs_exceed_limit() seems to be also called after vacuume_boot().
  2. In my opinion, if it is missed in logpb_remove_archive_logs(). This function has to be always called when it is guaranteed that vacuum_Data.is_archive_removal_safe is set to true. It seems more stable to check it. Or you can assert() instead of my suggestion if the condition is pre-condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The logpb_remove_archive_logs_exceed_limit() function is called from logpb_archive_active_log().
  • So it can be called even before vacuum_boot().
  • Therefore, I think vacuum_is_safe_to_remove_archives() must be called in order to operate after vacuum_boot().
  • However, the logpb_remove_archive_logs() function does not seem to be a problem because it is a function that is called at runtime after server boot. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replacing all codes (line 6172 ~ 6193) that remove archive log to logpb_remove_archive_logs_exceed_limit() function ?
I think the function is safe, because it have routine such as checking "force remove archive" option, "vacuum min log" and "system crash min log" before removing archive logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhoh3963 Maybe it will work fine I think. However, running 'backupdb -r' will depend on the "log_max_archives" and "force_remove_log_archives" system properties. So, I think this is not a good choice. Nevertheless, if I need to replace it with the logpb_remove_archive_logs_exceed_limit() function, I think we need to reconsider the definition of the -r option. What do other reviewers(@joohok @wrlawodms) think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vacuum_is_safe_to_remove_archives() is better than prm_get_bool_value (PRM_ID_DISABLE_VACUUM) for code consistency and readability. So I used it. Please refer to the review. In addition, the vacuum_disable value is being checked by logpb_is_page_in_archive(), so I don't think any issues we discussed will occur. However, I want to call the function vacuum_is_safe_to_remove_archives() for explicit processing.

src/transaction/log_page_buffer.c Outdated Show resolved Hide resolved
@wrlawodms
Copy link
Contributor

wrlawodms commented Feb 19, 2021

  1. The comment for this function should be modified along.
 * NOTE: Archive that are not needed for system crashes are removed.
 *       That these archives may be needed for media crash recovery.
 *       Therefore, it is important that the user copy these archives
 *       to tape. Check the log information file.
 *
 * TODO: Make sure the removed logs have been processed by vacuum.
  1. Could you let me know the reason there's no need to consider the minimum copied page id by copylogdb here?

@hornetmj
Copy link
Contributor Author

hornetmj commented Feb 22, 2021

  1. The comment for this function should be modified along.
 * NOTE: Archive that are not needed for system crashes are removed.
 *       That these archives may be needed for media crash recovery.
 *       Therefore, it is important that the user copy these archives
 *       to tape. Check the log information file.
 *
 * TODO: Make sure the removed logs have been processed by vacuum.
  1. Could you let me know the reason there's no need to consider the minimum copied page id by copylogdb here?
  1. I'll fix the comment.
  2. Your comments are not included in the purpose of this PR. If necessary, I will process it with another PR.

@hornetmj hornetmj merged commit 61549b9 into CUBRID:develop Feb 24, 2021
hornetmj added a commit to hornetmj/cubrid that referenced this pull request Mar 11, 2021
… function is called after executing backupdb with -r option. (CUBRID#2575)
hornetmj added a commit to hornetmj/cubrid that referenced this pull request Mar 11, 2021
… function is called after executing backupdb with -r option. (CUBRID#2575)
hornetmj added a commit to hornetmj/cubrid that referenced this pull request Mar 11, 2021
… function is called after executing backupdb with -r option. (CUBRID#2575)
hornetmj added a commit that referenced this pull request Mar 12, 2021
hornetmj added a commit that referenced this pull request Mar 12, 2021
hornetmj added a commit that referenced this pull request Mar 12, 2021
@hornetmj hornetmj deleted the CBRD-23850 branch March 30, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants