Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign receipt/tx trie generation on block imports #565

Merged

Conversation

gsalgado
Copy link
Collaborator

We used to generate the tries for all block's transactions/receipts
after applying every transaction, but that is way too expensive. Now the
trie data is only generated once, after we've applied all transactions

@gsalgado gsalgado force-pushed the block-import-trie-generation-redesign branch from a43ff3e to 186d898 Compare April 17, 2018 16:04
@@ -132,6 +133,7 @@ def execute_bytecode(self,
# Mining
#
def import_block(self, block):
self.receipts = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stood out to me. What if instead of tracking this state we instead returned the receipt from apply_transaction and accumulated the receipts in the loop here

It looks like this change goes well with the changes within the VMState class. It looks like we could remove the need for that class to be passed the receipts since I think this changes things so that the only thing it needs to know is the gas_used value from the latest receipt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but should we then pass gas_used as an argument to __init__() and keep updating it in add_transaction()? Or did you have something else in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not sure. Nothing specific in mind for exact implementation but starting with gas_used being passed into __init__ seems like a good start. It'd be nice if it didn't need to be mutated but i'm more ok with mutating that value than the receipt storage on the VM instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we need gas_used in BaseState so I'm passing it via __init__() and updating it on BaseState.add_transaction(), but in BaseVM we also need that value in the .state @Property, which is currently looking that up from BaseVM.receipts but we could instead have a BaseVM.gas_used and update that on BaseVM.apply_transaction(). Would you prefer that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds viable. We can continue to iterate on this to I'd recommend just picking something that seems good enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need to keep the receipts somewhere in order to generate the receipt trie

@@ -266,6 +275,8 @@ def validate_block(self, block):
)
)

# FIXME: make_trie_root_and_nodes() is rather expensive, and we already run that once in
# import_block(), so should refactor some of this code to avoid re-generating it here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like we could fix this with some caching on the make_trie_root_and_nodes function (or maybe a wrapper around that function which implements a cache).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess #573 might still be useful!

evm/vm/base.py Outdated
@@ -459,7 +459,7 @@ def get_state_class(cls):
return cls._state_class

@classmethod
def get_state(cls, chaindb, block, prev_hashes, receipts=None):
def get_state(cls, chaindb, block, prev_hashes, gas_used=0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like gas_used shouldn't have a default value.

@@ -77,14 +80,12 @@ def apply_transaction(self, transaction):
"""
Apply the transaction to the vm in the current block.
"""
computation, block, trie_data_dict = self.state.apply_transaction(
computation, block, receipt = self.state.apply_transaction(
Copy link
Contributor

@hwwhww hwwhww Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: the reason why apply_transaction returned trie_data_dict is for the stateless client to generate witness, which was discussed here: #247 (comment)

But since the sharding roadmap was (largely) changed, now the stateless client implementation won't happen in couple months, so I don't have a strong opinion about the change. Hope that we will find a solution to solve in both stateless/non-stateless clients when we reimplement stateless client. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the VM incrementally generate the trie data, I suppose? The problem was that it was re-generating it from scratch for every transaction applied

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except it wouldn't make the VM class stateless, so I guess that wouldn't really buy us anything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwwhww we'll continue to move towards stateless clients but for now going to prioritze full sync and then we'll figure out how to make it work with stateless clients also.

@gsalgado gsalgado force-pushed the block-import-trie-generation-redesign branch 3 times, most recently from 6d1a925 to 8da75f9 Compare April 18, 2018 08:59
@hwwhww
Copy link
Contributor

hwwhww commented Apr 18, 2018

The second thoughts of this PR:

[Previously]

  • The transaction and receipt roots are calculated in State.add_transaction().
  • Each time VM.apply_transaction() is called, the vm.block.header is updated with the latest transaction_root and receipt_root.

[Now]

  • State.add_transaction() returns receipt.
  • The transaction and receipt roots are calculated in VM.import_block(self, block).
  • Each time VM.apply_transaction() is called, the vm.block.header isn't updated with the latest transaction_root and receipt_root.

I took a look at vm.mine_block() which hasn't been changed in this PR:

https://github.com/gsalgado/py-evm/blob/bceb324d7f33b4d41aa1cead97229414021906b7/evm/vm/base.py#L164-L176

I think now the vm.mine_block and chain.mine_block() don't work as what it used to be: return a mined block, because the current block (vm.block) hasn't been updated with correct transaction_root and receipt_root.

The fastest way to solve it may be calculating the roots in vm.mine_block() instead of in vm.import_block(), and we still lost the property of "having vm.block always has the latest roots info".

p.s. I think that's the reason why tests/core/vm/test_vm.py::test_get_cumulative_gas_used went wrong (#564).

@gsalgado
Copy link
Collaborator Author

#573 has an optimization to make_trie_root_and_nodes() that we could do instead of what's suggested here. I'd need to profile it more thoroughly to make sure it's good enough but I'd be happy to go with that if you guys think that's preferable

@hwwhww
Copy link
Contributor

hwwhww commented Apr 18, 2018

Oooooh, correct my own words: for the witness, we don't need trie_data_dict, only need the state trie branches, the main difference between is do we expect state.apply_transaction() returns the block with latest header or not.

If #573 doesn't work well, maybe we could redesign the APIs to be "for importing a block (only generating the root once per block)" and "for creating a block (generating the root per transaction?)". Maybe divide state.apply_transaction into multiple functions.

@gsalgado gsalgado changed the title wip: redesign receipt/tx trie generation on block imports Redesign receipt/tx trie generation on block imports Apr 18, 2018
@gsalgado gsalgado force-pushed the block-import-trie-generation-redesign branch from bceb324 to 283c768 Compare April 18, 2018 17:00
evm/vm/base.py Outdated
@@ -156,6 +159,13 @@ def mine_block(self, *args, **kwargs):
Mine the current block. Proxies to self.pack_block method.
"""
block = self.block
tx_root_hash, tx_kv_nodes = make_trie_root_and_nodes(block.transactions)
receipt_root_hash, receipt_kv_nodes = make_trie_root_and_nodes(self.receipts)
trie_data = merge(tx_kv_nodes, receipt_kv_nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unimportant: for performance optimization, this line could be removed, and just add them separately:

self.chaindb.persist_trie_data_dict(tx_kv_nodes)
self.chaindb.persist_trie_data_dict(receipt_kv_nodes)

We used to generate the tries for all block's transactions/receipts
after applying every transaction, but that is way too expensive. Now the
trie data is only generated once, after we've applied all transactions
It was no longer used anywhere and it doesn't work because our VM is no
longer stateless
@gsalgado gsalgado force-pushed the block-import-trie-generation-redesign branch from 283c768 to acfd862 Compare April 18, 2018 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants