Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored VM, VMState, and Block + new apply_transaction #247

Merged
merged 22 commits into from
Jan 13, 2018

Conversation

hwwhww
Copy link
Contributor

@hwwhww hwwhww commented Jan 3, 2018

What was wrong?

For #195: pure apply_transaction + idea of #236

How was it fixed?

The original goal is to implement:

state_obj', reads, writes = apply_transaction(stateobj, transaction, blockdata, db)

Now we have:

            vm_state = cls.get_state_class()(
                chaindb=witness_db,
                block_header=block.header,
                prev_headers=prev_headers,
                receipts=receipts,
            )
            computation, result_block, _ = vm_state.apply_transaction(
                transaction=transaction,
                block=block,
                is_stateless=True,
            )
  • computation.access_logs.reads and computation.access_logs.reads are the reads and writes sets.
  • witness_db is db.
  • stateobj information would be acquired from vm_state and block
  • blockdata information would be be acquired from block

1. Made Block RLP object dumb

2. Cherry-picked TrackedDB from sharding branch

3. Refactored VM, VMState, and Block

  1. Added VMState forks
  2. Set computation_class in VMState and moved VM.get_computation() to VMState.get_computation()
  3. From Block to VM
    • Moved Block.mine(**kwargs) into VM.pack_block(block, *args, **kwargs)
  4. From Block to VMState
    • Moved VM.execute_transaction(transaction) function into VMState.execute_transaction(transaction)
  5. From Block to VMState
    • Moved Block.make_receipt(transaction, computation) into VMState.make_receipt(transaction, computation)
  6. Refactored and redesigned apply_transaction and add_transaction
    1. Moved VM.apply_transaction(transaction) function into VMState.apply_transactionn(transaction)
    2. Moved Block.add_transaction(transaction, computation) into VMState. add_transaction(vm_state, transaction, computation)
    • I keep old logic for verifying the “stateless” state transition works. Basically, the stateless mode is set on VM._is_stateless flag; if VM.is_stateless is True, use witness data as chaindb; otherwise, use Chain.chaindb as before.
    • Assigned Block.db = BaseChainDB(MemoryDB()).db, only for generating transaction_root and receipt_root in VMState level.
    • VMState.apply_transaction returns the updated block and trie_data to VM.apply_transaction, where trie_data is the key-value list of transactions and receipts data that should be store in VM.chaindb for non-stateless clients.
    • If apply_transaction is tiggered by Chain, the path would be:
Chain.apply_transaction(transaction) -> return computation, self.block
    └─VM.apply_transaction(transaction) 
        └─[is_stateless]
            └─VMState.apply_transaction(vm_state, transaction, block, is_stateless, witness_db) -> return computation, block, trie_data
                └─VMState.execute_transaction(transaction) -> return computation, block_header
                    └─Computation.apply_message(message) or Computation.apply_create_message(message) -> return computation
                └─VMState.add_transaction(vm_state, transaction, computation, block) -> return block, trie_data
        └─[not is_stateless]
            └─VMState.apply_transaction(vm_state, transaction, block, is_stateless, witness_db)
                └─VMState.execute_transaction(transaction) -> return computation, block_header
                    └─Computation.apply_message(message) or Computation.apply_create_message(message) -> return computation
            └─VM.add_transaction(transaction, computation) -> return computation, None, None
                └─VMState.make_receipt(transaction, computation) -> return receipt

4. Add AccessLogs class to hold reads and writes logs from TrackedDB.

5. Create block (unfinished)

  • To verify and demonstrate stateless state transition, there’s a new function @classmethod create_block(transactions, transaction_witnesses, prev_state_root, parent_header, coinbase) to create a block with the given transaction_witness.
  • Please see tests/core/vm/test_vm.py::test_create_block.

6. Added VMState.prev_headers class instance, now all the parent header queries in VMState level are using prev_header data.

7. Storing receipts in VMState.receipts and rebuilding the trie root in every apply_transaction

8. Refactored reward methods: moved from base.py to Frontier

9. Moved VMState.get_parent_header(block_header, db) and VMState.get_prev_headers(last_block_hash, db) into standalone functions

p.s. Sorry for that I reset and squash some commits and then my git mv turns into mv, so it's unclear to diff evm/vm/vm_state.py and evm/vm_state.py.

Cute Animal Picture

cute


TODOs for other PRs

  1. Decouple VMState.block_header the VMState properties (blockhash, timestamp, block_number, difficulty, and gas_limit), because shard chains use period_start_prevhash block properties of main chain in contracts, but VMState.state_db() should use VMState.collation_header's state_root.
  2. Remove is_stateless flag, make it only one route.

I will open new issues later.

@hwwhww hwwhww changed the title Block refactoring and new apply_transaction Refactored VM, VMState, and Block + new apply_transaction Jan 3, 2018
@property
def precompiles(self):
if self._precompiles is None:
return set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a dict instead of a set

evm/db/state.py Outdated
@@ -49,6 +51,7 @@ def __init__(self, db, root_hash=BLANK_ROOT_HASH, read_only=False):
self.db = ImmutableDB(db)
else:
self.db = db
self.db = TrackedDB(self.db)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overwriting self.db here maybe move this up into the previous if/else


class BaseBlock(rlp.Serializable):
db = None

def __init__(self, header, transactions=None, uncles=None):
self.db = BaseChainDB(MemoryDB()).db # for generating transaction_root and receipt_root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the need for this. Where is it used in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In non-stateless mode, the transaction trie is rebuilt via BaseChainDB.add_transaction( block_header, index_key, transaction):

    def add_transaction(self, block_header, index_key, transaction):
        transaction_db = Trie(self.db, root_hash=block_header.transaction_root)
        transaction_db[index_key] = rlp.encode(transaction)
        return transaction_db.root_hash

And it is triggered in VM.add_transaction(transaction, computation):
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm/base.py#L79

For stateless mode, BaseBlock.db is only used for storing transaction trie and receipt trie in VMState level
during create_block because I'd want to remove chaindb from VMState eventually.

In VMState.add_transaction:
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm_state.py#L223

        # Get trie roots and changed key-values.
        tx_root_hash, tx_db = cls.add_trie_node_to_db(
            block.header.transaction_root,
            index_key,
            transaction,
            block.db,
        )
        receipt_root_hash, receipt_db = cls.add_trie_node_to_db(
            block.header.receipt_root,
            index_key,
            receipt,
            block.db,
        )

And then insert the transaction to block.db:
https://github.com/hwwhww/py-evm/blob/ec9ff9c63ff96e484150bc62ab862aed5505a03d/evm/vm_state.py#L247-L254

    @staticmethod
    def add_trie_node_to_db(root_hash, index_key, node, db):
        """
        Add transaction or receipt to the given db.
        """
        trie_db = Trie(db, root_hash=root_hash)
        trie_db[index_key] = rlp.encode(node)
        return trie_db.root_hash, trie_db.db

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can take a different approach to decouple some of this.

The mk_receipt_root function in the pyethereum codebase is probably the utility we need to bring over.

Then, rather than keeping track of the trie for receipts and transactions we could very easily just rebuild it on demand each time we add a new transaction/receipt. Thoughts?

Copy link
Contributor Author

@hwwhww hwwhww Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did rewrite mk_receipt_root for debugging during implementing, it would be much cleaner if we use mk_receipt_root to get root. The trade-offs are:

  1. Since we want to get the "current block" after each time we call VMState.apply_transaction, if import_block is called, the trie-rebuilding will insert all the previous receipts of this block to the trie during each VMState.apply_transaction calling. Would it be more inefficient?

  2. I find that if we keep the current code, maybe we don't need to store VMState.receipts in VMState level. I'm still checking if it's true, right now it seems that we only require maintaining VMState.receipts for rebuilding receipt trie.

If 2 is true, it won't be a big deal to maintain VMState.receipts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with the overhead with rebuilding the trie each time. If the overhead turns out to be significant we can address it at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, you do what you think is best. This idea isn't coming from me having a fully formed solution, but rather trying to further simplify the Block object and remove some coupling and database access. It may very well be that it doesn't provide a worthwhile improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree your point. Either is fine with me for now, let me figure out it after more refactorings of create_block.

evm/vm/base.py Outdated
if unknown_fields:
raise AttributeError(
"Unable to set the field(s) {0} on the `BlockHeader` class. "
"Received the following unexpected fields: {0}.".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {0} here should be {1} to correctly format the string.

evm/vm/base.py Outdated
def create_block(
cls,
transactions,
transaction_witnesses,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is an improvement but...

If I understand correctly, we require that len(transaction) == len(transaction_witnesses). Rather than passing these in as separate lists, it might be good to use a data structure like the following:

transactions_and_witnesses = (
    (transaction_0, transaction_witness_0),
    (transaction_1, transaction_witness_1),
    (transaction_2, transaction_witness_2),
    ...
)

This approach builds in some minimal validation since you can then loop over them like this:

for index, (transaction, transaction_witness) in enumerate(transactions_and_witnesses):
    ...

Not sure this is a huge improvement, but since the two arguments are inherently linked together, it might be nice to accept them already linked.

Copy link
Contributor Author

@hwwhww hwwhww Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do require that len(transaction) == len(transaction_witnesses).
From sharding doc:
Transaction package format

    [
        [nonce, acct, data....],    # transaction body (see below for specification)
        [node1, node2, node3....]   # witness
    ]

I will modify it to the form you suggested, thank you. :)

evm/vm_state.py Outdated
"""
return self.get_block_header_by_hash(block_header.parent_hash)

def is_key_exsits(self, key):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling/typo

evm/vm_state.py Outdated
def get_computation(self, message):
"""Return state object
"""
if self.computation_class is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more readable version of this.

if self.computation_class is None:
    raise AttributeError("No `computation_class` has been set for this VMState")
else:
    return self.computation_class(self, message)

evm/vm_state.py Outdated
vm_state,
transaction,
block,
is_stateless=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a route for us to remove the need for this flag. We can have VM.apply_transaction always be pure/stateless, and then modify Chain.apply_transaction to update the local vm state before returning the computation object.

Assuming there is nothing wrong with that approach I like it a lot more than using a flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! Now the non-stateless mode is only for verifying the stateless mode result in case I screw up. Could we remove the non-stateless mode later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, doing it separately is fine. Mind opening an issue to track it and linking it here.

evm/vm_state.py Outdated
receipt,
block.db,
)
trie_data = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace these three lines (and remove the mutation) with:

trie_data = cytoolz.merge(tx_db.wrapped_db.kv_store, receipt_db.wrapped_db.kv_store)

setup.py Outdated
@@ -25,7 +25,7 @@
"py-ecc==1.4.2",
"rlp==0.4.7",
"eth-keys==0.1.0b3",
"trie>=0.3.2",
"trie==0.3.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the breakage with 1.0.0? I thought it was backwards compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into the root cause. The error message is like:
https://travis-ci.org/hwwhww/py-evm/jobs/324634985

tests/json-fixtures/test_state.py:247: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
evm/vm/base.py:39: in __init__
    self.block = block_class.from_header(header=header, chaindb=self.chaindb)
evm/vm/forks/frontier/blocks.py:94: in from_header
    transactions = chaindb.get_block_transactions(header, cls.get_transaction_class())
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/eth_utils/functional.py:22: in inner
    return callback(fn(*args, **kwargs))
evm/db/chain.py:163: in get_block_transactions
    if transaction_key in transaction_db:
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:397: in __contains__
    return self.exists(key)
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:106: in exists
    return self.get(key) != BLANK_NODE
.tox/py35-native-state-byzantium/lib/python3.5/site-packages/trie/hexary.py:62: in get
    root_node = self._get_node(self.root_hash)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <trie.Trie object at 0x7f6c2cc46908>, node_hash = None
    def _get_node(self, node_hash):
        if node_hash == BLANK_NODE:
            return BLANK_NODE
        elif node_hash == BLANK_NODE_HASH:
            return BLANK_NODE
    
>       if len(node_hash) < 32:
E       TypeError: object of type 'NoneType' has no len()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because Trie.__init__() doesn't upcall -- it just issues a warning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #249

@hwwhww
Copy link
Contributor Author

hwwhww commented Jan 9, 2018

Changelog

  1. Added VMState.prev_headers class instance, now all the parent header queries in VMState level are using prev_header data.
  2. Fixed VM.create_block()
  3. Added more tests of test_vm.py and test_vm_state.py
  4. Use VMState.receipts to rebuild the trie root in every apply_transaction - cleaner!
  5. Refactored reward methods: moved from base.py to Frontier
  6. Added sample ReStructuredText comment: a1e13a4

Seeking for @pipermerriam 's advice and wisdom about mutating VMState & chaindb:

In this commit e1c854c, I’m thinking about making VMState & chaindb mutable would let the code simpler.

Because:

  1. I hope to remove copy.deepcopy(vm_state) and copy.deepcopy(witness_db) from VMState.apply_transaction.
  2. I find that we initialize a new vm_state every time before we call VMState.apply_transaction().
  3. If computation.is_error, the chaindb would revert anyway.
    So I think it seems okay to mutate these vm_state and vm.chaindb (== vm.state.chaindb) in apply_transaction in our current use cases?

But, on the other hand:

  1. If the clients want to generate transaction witness and then broadcast the transaction. package with witness, they have to revert themselves or give a copy.deepcopy(vm.chaindb) before calling vm_state.apply_transaction()?
  2. Alternatively, VMState can accept kv_store dict as parameter, and then initialize the VMState.chaindb in VMState Level. kv_store dict equals transaction_witness in VM.create_block function, the dict won’t be too much in VM.create_block since it’s only transaction witness, but will be huge in VM.apply_transaction(). https://github.com/hwwhww/py-evm/blob/81b4b1ae1c854c97540b63ebd41f196c83a35dad/evm/vm/base.py#L255
  3. It’s not on the table now, but my rough though is that separating vm.chaindbfrom VMState entirely would help for parallelizability.

As I saw in create_block, this change won’t affect on stateless clients too much because they use transaction witnesses. But what do you think of the memory issue or other possible issues for non-stateless client? Thank you.

@pipermerriam
Copy link
Member

Both options seem to have comprable trade-offs so I'd say go with whichever seems best.

Allowing the mutation to remain for the time being is fince with the understanding that we'll be looking for a clean way to remove it in the future. I'd rather give us time to figure out a solution we like rather than push forward with a solution we're not sure about.

@hwwhww
Copy link
Contributor Author

hwwhww commented Jan 9, 2018

@pipermerriam
Thank you for sharing your thought, let's keep it mutable now. 👍

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. There's some basic cleanup that can be done, but nothing that changes the overall structure significantly. Given the size of this PR, I'd like to have @gsalgado give it a brief review to make sure I didn't miss something. I'll try to give the tests one more look in the morning when I'm fresh because I know I rushed through them.

evm/vm/base.py Outdated
return cls.get_block_class().from_header(block_header, db)

@staticmethod
def get_parent_header(block_header, db):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this method is purely a passthrough what is the reasoning for including it on the VM class?

evm/vm/base.py Outdated
return db.get_block_header_by_hash(block_header.parent_hash)

@staticmethod
def get_block_header_by_hash(block_hash, db):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this method is purely a passthrough what is the reasoning for including it on the VM class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I just left them with @classmethod def get_block_by_header(cls, block_header, db). Should I move @staticmethod def get_parent_header(block_header, db) and @classmethod def get_prev_headers(cls, last_block_hash, db) to evm.utils.db?

parent_header,
gas_limit_floor=GENESIS_GAS_LIMIT,
),
timestamp=max(timestamp, parent_header.timestamp + 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that rather than using max here this should throw an exception if timestamp <= parent_header.timestamp. It feels wrong to override the timestamp argument in this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, it was supposed to be checking current time (time.time()) and parent_header.timestamp + 1 if timestamp is None. I'll update it. Thank you.

compute_difficulty,
parent_header,
timestamp,
coinbase=b'\x35' * 20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does b'\x35 * 20` come from and can we get it moved into a constant variable somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.



def generate_header_from_parent_header(
compute_difficulty,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nitpick. Might be a bit clearer that this is a callable if it was named compute_difficulty_fn or something else with a clear indicator that it's a callable.

)
)

# XXX: Should these and some other checks be moved into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open an issue to revisit this separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXX: Should these and some other checks be moved into VM.validate_block(), as they apply to all block flavours?

Ah, these two lines and this function were copied from Block.validate(). VMState.validate_block() became kind of applying to all block flavors now.
Another future issue is that this function would need to be refactored for sharding #256 since there's no Collation.uncle.

evm/vm_state.py Outdated
# def chaindb(self):
# return self._chaindb

def set_chaindb(self, db):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be used anywhere.

# further modifications can occur using the `State` object after
# leaving the context.
state.db = None
state._trie = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self:

There was a decomission() method added to the state database object to remove the need to access this private variable. We should backport that improvement from the sharding branch.

evm/vm_state.py Outdated
with self.state_db() as state_db:
# first revert the database state root.
state_db.root_hash = state_root
# now roll the underlying database back
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment placement seems wrong, assuming I'm correct that it applies to the next statement self._chaindb.revert(checkpoint_id).

"""
Returns the block header by hash.
"""
for value in self.prev_headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we could store self.prev_headers as a mapping from hash => header so that we can do O(1) lookups rather than repeatedly enumerating this list to find the necessary header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that I missed that if we store self.prev_headers as a list is good for access block.blockhash(uint blockNumber) in contract code, we could make VMState.get_ancestor_hash(block_number) get block_hash via block_number directly since the prev_headers should be chained already:

    def get_ancestor_hash(self, block_number):
        """
        Return the hash of the ancestor with the given block number.
        """
        ancestor_depth = self.block_header.block_number - block_number - 1
        if (ancestor_depth >= MAX_PREV_HEADER_DEPTH or
                ancestor_depth < 0 or
                ancestor_depth >= len(self.prev_headers)):            
            return b''
        header = self.prev_headers[ancestor_depth]
        return header.hash

Then we get O(1) for BLOCKHASH opcode.

And since VMState has its block properties, we can make VMState.get_parent_header(block_header) into @property parent_header() for
VMState.validate_block(block):

    @property
    def parent_header(self):
        return self.prev_headers[0]

But, unfortunately, VMState.validate_uncle(uncle) still requires calling VMState.get_block_header_by_hash(uncle.parent_hash):
https://github.com/hwwhww/py-evm/blob/81b4b1ae1c854c97540b63ebd41f196c83a35dad/evm/vm/forks/frontier/vm_state.py#L305

Look on the bright side, MAX_UNCLE_DEPTH is 6.

With these changes, I'd vote for accessing via block_number directly for sharding clients. But for non-sharding clients, they may call validate_uncle more than call BLOCKHASH(many-ancestors-ago).

p.s. EIP 96 would make BLOCKHASH very different: ethereum/EIPs#210

@hwwhww hwwhww requested a review from gsalgado January 11, 2018 04:14
Copy link
Collaborator

@gsalgado gsalgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I think it might make sense to move some BaseVMState classmethods into standalone functions

evm/vm/base.py Outdated
"""Return state object
"""
if chaindb is None:
chaindb = self.chaindb
if block_header is None: # TODO: remove
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe explain (in the comment) why we want to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I was thinking about if I can store block_header in prev_headers in some cases, but it turns out because block_header is more like a container object in VMState, it's more reasonable to leave block_header alone now. I'll remove this TODO tag.

Furthermore, there's another TODO issue I mention in this PR and #256, for sharding clients, we need to decouple VMState.block_header the VMState properties (blockhash, timestamp, block_number, difficulty, and gas_limit), in that case, the parameters of VMState.__init__ may become something like:

def __init__(self, chaindb, state_root, block_header, prev_headers, receipts=[])

Thanks!

evm/vm/base.py Outdated
@@ -234,40 +471,31 @@ def get_state_class(cls):

return cls._state_class

def get_state(self):
def get_state(self, chaindb=None, block_header=None, prev_headers=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No callsites seem to pass a prev_headers argument to this method. Would it make sense to remove it?

old_receipt = _make_frontier_receipt(vm_state, transaction, computation)

receipt = Receipt(
state_root=b'' if computation.is_error else b'\x01',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does b'\x01' mean here? Should it be a constant?

Copy link
Contributor Author

@hwwhww hwwhww Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was copied from evm.vm.forks.byzantium.blocks:
https://github.com/ethereum/py-evm/blob/master/evm/vm/forks/byzantium/blocks.py#L30

This protocol change is from EIP 98/658 Embedding transaction status code in receipts, 1 for success, 0 for failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I support moving these two values into named constants to improve readability.

evm/vm_state.py Outdated
@classmethod
def apply_transaction(
cls,
vm_state,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to have a BaseVMState method that takes a vm_state argument, but maybe I'm missing something?

evm/vm_state.py Outdated
:param vm_state: the VMState object
:param transaction: the transaction need to be applied
:param block: the block which the transaction applies on
:param is_stateless: if is_stateless, call VMState.add_transactionto set block
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually calls cls.add_transaction()

parent_header = copy.deepcopy(prev_headers[0])

computation, block, _ = FrontierVMState.apply_transaction(
vm_state1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I meant; it doesn't feel right to call a classmethod passing an instance of that same class as its first argument. I guess that's because we wanted to make apply_transaction() a pure function, but then maybe we should make it into a standalone function as well, to avoid this weirdness when calling it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True!
Actually, I think after the discussion in #247 (comment) and keeping VMState mutable, another way we can do is to change them back to regular instance method def apply_transaction(self, transaction, block, is_stateless=True). And so do VMState.execute_transaction(...) and VMState.make_receipt(...) methods.

@pipermerriam what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, going back to regular instance methods seems to make the most sense.

1. Moved Block.add_transaction to VM.add_transaction
2. Removed Block.chaindb and moved all related member functions to VM. (Make Block dumb)
1. Added VMState forks
2. Moved VM.get_computation() to VMState.get_computation()
3. Moved `execute_transaction`, `make_receipt`, `validate_block` and `validate_uncle` from `VM` to `VMState`
4. Configured `opcodes` and `precompiles` in `VMState.__init__(...)`
5. Make sure that VMState only do read-only operations on the chaindb
1. Added VMState level apply_transaction and add_transaction.
2. Assigned  `Block.db = BaseChainDB(MemoryDB()).db`, only for generating transaction_root and receipt_root in `VMState` level.
3. Set `computation_class` in `VMState`.
4. Applied witness data as database.
1. Fixed VM.create_block()
2. Added more tests of test_vm.py and test_vm_state.py
3. Using copy.deepcopy to prevent VMState.apply_transaction from
mutating the given witness_db
1. Store receipts in VMState and removed Block.db
2. Rebuild the transaction trie and receipt trie in every apply_transaction
…v_headers(last_block_hash, db) into standalone functions
…et_parent_header(block_header) into "the" VMState.parent_header
@hwwhww
Copy link
Contributor Author

hwwhww commented Jan 12, 2018

Changelog

  1. Squashed some old commits
  2. Passing trie_class to make_trie_root_and_nodes. trie_class could be HexaryTrie or BinaryTrie.
  3. Moved VMState.get_parent_header(block_header, db) and VMState.get_prev_headers(last_block_hash, db) into standalone functions
  4. Refactored VMState.get_ancestor_hash(block_number) and made VMState.get_parent_header(block_header) into "the" VMState.parent_header
  5. Changed some class methods of VMState back to regular instance methods
  6. Clean up

@pipermerriam @gsalgado
Please take a look, thank you!

Copy link
Collaborator

@gsalgado gsalgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@pipermerriam
Copy link
Member

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants