Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chainHead: Report unique hashes for pruned blocks #3667

Merged
merged 19 commits into from
Apr 17, 2024

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Mar 12, 2024

This PR ensures that the reported pruned blocks are unique.

While at it, ensure that the best block event is properly generated when the last best block is a fork that will be pruned in the future.

To achieve this, the chainHead keeps a LRU set of reported pruned blocks to ensure the following are not reported twice:

	 finalized -> block 1 -> block 2 -> block 3
	
	                      -> block 2 -> block 4 -> block 5
	
	           -> block 1 -> block 2_f -> block 6 -> block 7 -> block 8

When block 7 is finalized the branch [block 2; block 3] is reported as pruned.
When block 8 is finalized the branch [block 2; block 4; block 5] should be reported as pruned, however block 2 was already reported as pruned at the previous step.

This is a side-effect of the pruned blocks being reported at level N - 1. For example, if all pruned forks would be reported with the first encounter (when block 6 is finalized we know that block 3 and block 5 are stale), we would not need the LRU cache.

cc @paritytech/subxt-team

Closes #3658

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv added A1-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). I2-bug The node fails to follow expected behavior. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. labels Mar 12, 2024
@lexnv lexnv self-assigned this Mar 12, 2024
@lexnv lexnv added the R0-silent Changes should not be mentioned in any release notes label Mar 12, 2024
@lexnv lexnv requested review from skunert and davxy March 13, 2024 11:29
@jsdw
Copy link
Contributor

jsdw commented Mar 13, 2024

I had assumed that ensuring pruned hashes were unique was as simple as using a HashSet or something when building the pruned list. That would ensure the blocks in a single pruned list were unique.

Am I right in my understanding that the extra bits in this PR are to also ensure that we don't see the same block hash mentioned again across different pruned lists, too?

@lexnv
Copy link
Contributor Author

lexnv commented Mar 13, 2024

Yep, I initially thought we could get away with a simple HashSet.

However, we might prune a fork that contains another fork:

fork -> A
   |
     -> B -> C

Chain:
        X ... Y ... Z

In this case, when we finalize Y Substrate reports A as pruned. We'll go the path [fork .. A] and report those blocks.
When we finalize Z Substrate reports C as pruned. We'll go again on the path [fork .. B C]. We'd need to ignore the common blocks that we've just reported with [fork .. A].

Copy link
Contributor

@skunert skunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks correct, only some nits in the comments 👍

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Comment on lines 515 to 519
// Best block already generated.
if block_cache == last_finalized {
events.push(finalized_event);
return Ok(events);
}
Copy link
Contributor

@jsdw jsdw Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding, this is all abotu addressing this bit of the spec:

The current best block, in other words the last block reported through a bestBlockChanged event, is guaranteed to either be the last item in finalizedBlockHashes, or to not be present in either finalizedBlockHashes or prunedBlockHashes.

Here it looks like when a new last finalized block is seen, we:

  • Return happily if the best block was finalized.
  • Else, return a new BestBlockChanged event if the current best block has been pruned.
  • Else, return a new BestBlockChanged event if the current best block will be pruned (ie is not a descendant of the new finalized chain)

To satisfy the spec, could the whole thing be simplified to pseudocode like this?:

if let Some(current_best_block) = self.best_block_cache {
    // current best block is in pruned list, so need to emit new BestBlock
    let is_in_pruned_list = pruned_block_hashes.iter().any(|hash| *hash == current_best_block);
    // current best block is not the last one we finalized
    let is_not_last_finalized = current_best_block != last_finalized;

    if is_in_pruned_list || is_not_last_finalized {
        let new_best_block = self.client.info().best_hash;
        events.push(BestBlockChanged { best_block_hash })
    }
}

events.push(finalized_event);
Ok(events)

In any case, I wonder whether we could avoid some of the early returns and such, and separate it so that we first write the logic to decide whether we need to push a new BestBlockChanged event and then do the push, with a single return at the end to hand back the events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense!

I think I was more worried about cases where we've already generated a BestBlockChanged event for a descendant for the finalized block.

However, I think this is safe to regenerate a BestBlockChanged with an older block number, as long as it is the last reported finalized, from this spec statement:

The current best block, in other words the last block reported through a bestBlockChanged event, is guaranteed to either be the last item in finalizedBlockHashes

The suggestion simplifies the code quite a bit, thanks! 🙏

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…que-pruned-blocks

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…que-pruned-blocks

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5843121

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv requested review from skunert and jsdw April 9, 2024 14:43
self.submit_events(&startup_point, stream.boxed(), pruned_forks, sink, sub_data.rx_stop)
.await
// These are the pruned blocks that we should not report again.
for pruned in pruned_forks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand this correctly. When we generate the initial events we consider all blocks that are part of a fork that started below the finalized one as pruned, even if they are technically not pruned yet. So these will never be reported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I can't remember exactly, this was introduced quite a while ago to handle the case where the Finalized event would contain a pruned block not reported by a new block event.

This abuses the purpose of self.pruned_blocks, which now holds:

  • (previously stored in to_ignore) forks that we did not report previously by the Initialized event
  • (the intended purpose of this PR) pruned blocks already reported once

hashes

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@@ -3644,3 +3644,364 @@ async fn chain_head_limit_reached() {
// Initialized must always be reported first.
let _event: FollowEvent<String> = get_next_event(&mut sub).await;
}

#[tokio::test]
async fn follow_unique_pruned_blocks() {
Copy link
Contributor

@jsdw jsdw Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This test is quite long, and looking at it, I wonder whether we can extract a couple of helper fns or something so that it's easier to see what's actually beign tested vs all of the staet we're creating.

Like maybe a couple of helpers like this would be handy, but not sure if there are differences tha would make it not-very-useful:

async fn import_block(client, parent_hash, parent_num) -> block {
        let block = BlockBuilderBuilder::new(&*client)
		.on_parent_block(client.chain_info().genesis_hash)
		.with_parent_block_number(0)
		.build()
		.unwrap()
		.build()
		.unwrap()
		.block;
	let block_1_hash = block_1.hash();
	client.import(BlockOrigin::Own, block_1.clone()).await.unwrap();
        block
}

Copy link
Contributor

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me; good job Alex!

@lexnv lexnv added this pull request to the merge queue Apr 17, 2024
Merged via the queue into master with commit bfbf7f5 Apr 17, 2024
133 of 137 checks passed
@lexnv lexnv deleted the lexnv/chainhead-unique-pruned-blocks branch April 17, 2024 15:58
sandreim pushed a commit that referenced this pull request Apr 18, 2024
This PR ensures that the reported pruned blocks are unique.

While at it, ensure that the best block event is properly generated when
the last best block is a fork that will be pruned in the future.

To achieve this, the chainHead keeps a LRU set of reported pruned blocks
to ensure the following are not reported twice:

```bash
	 finalized -> block 1 -> block 2 -> block 3
	
	                      -> block 2 -> block 4 -> block 5
	
	           -> block 1 -> block 2_f -> block 6 -> block 7 -> block 8
```

When block 7 is finalized the branch [block 2; block 3] is reported as
pruned.
When block 8 is finalized the branch [block 2; block 4; block 5] should
be reported as pruned, however block 2 was already reported as pruned at
the previous step.

This is a side-effect of the pruned blocks being reported at level N -
1. For example, if all pruned forks would be reported with the first
encounter (when block 6 is finalized we know that block 3 and block 5
are stale), we would not need the LRU cache.

cc @paritytech/subxt-team  

Closes #3658

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2024
This PR stabilizes the chainHead API to version 1.

Needs:
- #3667

cc @paritytech/subxt-team

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A1-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I2-bug The node fails to follow expected behavior. R0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RPC-Spec-V2] chainHead: Ensure unique reproted pruned blocks
4 participants