Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Support Arrays #1417

Closed
ponderingdemocritus opened this issue Jan 11, 2024 · 7 comments
Closed

[Core] Support Arrays #1417

ponderingdemocritus opened this issue Jan 11, 2024 · 7 comments
Assignees
Labels
dojo-core cairo core tasks

Comments

@ponderingdemocritus
Copy link
Contributor

Currently Dojo does not natively support storing arrays in models.

This would be an extremely helpful feature.

I am unsure of the best implementation.

@ponderingdemocritus ponderingdemocritus added the dojo-core cairo core tasks label Jan 11, 2024
@Akashneelesh
Copy link

Akashneelesh commented Jan 12, 2024

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

@tarrencev
Copy link
Contributor

tarrencev commented Jan 12, 2024

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

@Akashneelesh
Copy link

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :

  1. Is where we implement the StorageAccess trait for the Array
  2. Have like a Legacymap of the value and have like another variable like a counter

@tarrencev
Copy link
Contributor

@Akashneelesh I think 1. is probably a good solution for now. I wonder if we should do something similar to https://github.com/starkware-libs/cairo/blob/4821865770ac9e57442aef6f0ce82edc7020a4d6/corelib/src/starknet/storage_access.cairo#L665

We will have to serialize array members to a Span first then write them to storage in order to support structs. Perhaps we can start with supporting primitives first though

@glihm
Copy link
Collaborator

glihm commented Jan 13, 2024

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :

1. Is where we implement the StorageAccess trait for the Array

2. Have like a Legacymap of the value and have like another variable like a counter

Hey @Akashneelesh happy to see you around!

To complement the Tarrence's statement about serializing the array member, Dojo database system does not actually use the Store trait of types.
Instead, it relies on types being serializable.

Therefore, the approach to take here should be a bit different than regular storage. Dojo database uses the storage syscalls directly.

Let me break down the process and let me know if you want to clarify some part of it:

  1. Starknet storage can be seen as a ledger of 2^251 storage slots, each slot can store a felt252. To know the address of a storage slot, starknet usually computes a hash on some keys. Lastly, the storage contains slot segments, a contiguous regions of 256 storage slots that can be access by simply adding one to the offset (u8).
    Here note that computing a hash on keys is considered expensive compared to do +1 on an offset, so taking advantage of slot segment is important.
  2. Dojo database relies on storage functions that takes a list of keys to identify the first storage slot (at offset 0 in a segment). Then, the offset is used to actually store the model. Currently using only one slot segment, which causes dojo to be limited to a serialized (and packed) model of 256 felts at most!
  3. To have a more efficient storage, dojo database is packing the values, by using a layout. A layout is no more than the number of bits expected to represent the type, this awareness of length is called introspection. The packing ensures that we use most bits of a felt252.
  4. Finally, as dojo is using a custom compiler plugin, when a model is generated, every fields of the model is serializable, and the layout can be retrieved. This is what arrives on-chain by calling set_entity on the world contract.
    So you can see the dojo database as accepting as input a model data that is completely flat (Span<felt252>) corresponding to the serialized data. And with this, a Span<u8> which contains the layout for each element in the data. The keys are never stored, only used to compute the storage slot where storage must start.

The objective to support array, would be to add the same logic as ByteArray storage, but at the storage level of dojo. Doing so, we will be able to store models of any size.

As Tarrence mentioned, you can make a first version that only supports array of primitives, you've an example here in the introspect module where primitives are handled.

I would break down the path to achieve this as follow:

  • Add array support into introspection (at cairo and rust compiler plugin level).
    This will ensure that the arrays (or spans) can be serialized + their layout can be retrieved to follow the current dojo model.
    You may only support primitives in a first version.

  • Add support for slot segments as ByteArray impl does, but for the dojo storage. The packing should not be affected though.
    Here the thing is that, when you compute the storage slot address by hashing the key, this gives you a slot, that's the first slot of a segment (offset 0). Let's call it the original address.
    Then, you hash a new key (poseidon([original_address, chunk_index, 'DojoChunk'])) when you've fully used the original address segment. This gives you the next storage segment to use to continue the process until all data are stored.

I hope this gives you more context, don't hesitate if you have any question or an alternative approach.

Note aside: In the case of the ByteArray, they don't need packing as it's internal logic already packs the bytes. In a ByteArray, any felt252 that is inside the data field is guaranteed to have 31 bits of data. So at Dojo level, a ByteArray that is serialized can be seen with a layout of 251 bits for each element as it can't be packed anymore. We may omit the variable length of the pending_word a consider it 251 bits layout too for simplicity.

@Akashneelesh
Copy link

Akashneelesh commented Jan 15, 2024

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :

1. Is where we implement the StorageAccess trait for the Array

2. Have like a Legacymap of the value and have like another variable like a counter

Hey @Akashneelesh happy to see you around!

To complement the Tarrence's statement about serializing the array member, Dojo database system does not actually use the Store trait of types. Instead, it relies on types being serializable.

Therefore, the approach to take here should be a bit different than regular storage. Dojo database uses the storage syscalls directly.

Let me break down the process and let me know if you want to clarify some part of it:

  1. Starknet storage can be seen as a ledger of 2^251 storage slots, each slot can store a felt252. To know the address of a storage slot, starknet usually computes a hash on some keys. Lastly, the storage contains slot segments, a contiguous regions of 256 storage slots that can be access by simply adding one to the offset (u8).
    Here note that computing a hash on keys is considered expensive compared to do +1 on an offset, so taking advantage of slot segment is important.
  2. Dojo database relies on storage functions that takes a list of keys to identify the first storage slot (at offset 0 in a segment). Then, the offset is used to actually store the model. Currently using only one slot segment, which causes dojo to be limited to a serialized (and packed) model of 256 felts at most!
  3. To have a more efficient storage, dojo database is packing the values, by using a layout. A layout is no more than the number of bits expected to represent the type, this awareness of length is called introspection. The packing ensures that we use most bits of a felt252.
  4. Finally, as dojo is using a custom compiler plugin, when a model is generated, every fields of the model is serializable, and the layout can be retrieved. This is what arrives on-chain by calling set_entity on the world contract.
    So you can see the dojo database as accepting as input a model data that is completely flat (Span<felt252>) corresponding to the serialized data. And with this, a Span<u8> which contains the layout for each element in the data. The keys are never stored, only used to compute the storage slot where storage must start.

The objective to support array, would be to add the same logic as ByteArray storage, but at the storage level of dojo. Doing so, we will be able to store models of any size.

As Tarrence mentioned, you can make a first version that only supports array of primitives, you've an example here in the introspect module where primitives are handled.

I would break down the path to achieve this as follow:

  • Add array support into introspection (at cairo and rust compiler plugin level).
    This will ensure that the arrays (or spans) can be serialized + their layout can be retrieved to follow the current dojo model.
    You may only support primitives in a first version.
  • Add support for slot segments as ByteArray impl does, but for the dojo storage. The packing should not be affected though.
    Here the thing is that, when you compute the storage slot address by hashing the key, this gives you a slot, that's the first slot of a segment (offset 0). Let's call it the original address.
    Then, you hash a new key (poseidon([original_address, chunk_index, 'DojoChunk'])) when you've fully used the original address segment. This gives you the next storage segment to use to continue the process until all data are stored.

I hope this gives you more context, don't hesitate if you have any question or an alternative approach.

Note aside: In the case of the ByteArray, they don't need packing as it's internal logic already packs the bytes. In a ByteArray, any felt252 that is inside the data field is guaranteed to have 31 bits of data. So at Dojo level, a ByteArray that is serialized can be seen with a layout of 251 bits for each element as it can't be packed anymore. We may omit the variable length of the pending_word a consider it 251 bits layout too for simplicity.

Thanks alot @glihm for enlightening me with the whole process, Based on what you have mentioned I'll take this approach and try implementing it. And if I face a better approach or have any stops , I'll surely shoot a question here.
Thanks.

@glihm glihm self-assigned this Jan 25, 2024
@glihm
Copy link
Collaborator

glihm commented Feb 23, 2024

I will close this one as the first phase was done in #1533. Need to check on the Torii side how this will be indexed.

@glihm glihm closed this as completed Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dojo-core cairo core tasks
Projects
Status: Done
Development

No branches or pull requests

4 participants