[Core] Support Arrays #1417

ponderingdemocritus · 2024-01-11T06:40:14Z

Currently Dojo does not natively support storing arrays in models.

This would be an extremely helpful feature.

I am unsure of the best implementation.

Akashneelesh · 2024-01-12T15:03:58Z

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

tarrencev · 2024-01-12T15:26:35Z

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Akashneelesh · 2024-01-12T19:09:19Z

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :

Is where we implement the StorageAccess trait for the Array
Have like a Legacymap of the value and have like another variable like a counter

tarrencev · 2024-01-12T19:47:27Z

@Akashneelesh I think 1. is probably a good solution for now. I wonder if we should do something similar to https://github.com/starkware-libs/cairo/blob/4821865770ac9e57442aef6f0ce82edc7020a4d6/corelib/src/starknet/storage_access.cairo#L665

We will have to serialize array members to a Span first then write them to storage in order to support structs. Perhaps we can start with supporting primitives first though

glihm · 2024-01-13T17:01:13Z

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :
1. Is where we implement the StorageAccess trait for the Array

2. Have like a Legacymap of the value and have like another variable like a counter

Hey @Akashneelesh happy to see you around!

To complement the Tarrence's statement about serializing the array member, Dojo database system does not actually use the Store trait of types.
Instead, it relies on types being serializable.

Therefore, the approach to take here should be a bit different than regular storage. Dojo database uses the storage syscalls directly.

Let me break down the process and let me know if you want to clarify some part of it:

Starknet storage can be seen as a ledger of 2^251 storage slots, each slot can store a felt252. To know the address of a storage slot, starknet usually computes a hash on some keys. Lastly, the storage contains slot segments, a contiguous regions of 256 storage slots that can be access by simply adding one to the offset (u8).
Here note that computing a hash on keys is considered expensive compared to do +1 on an offset, so taking advantage of slot segment is important.
Dojo database relies on storage functions that takes a list of keys to identify the first storage slot (at offset 0 in a segment). Then, the offset is used to actually store the model. Currently using only one slot segment, which causes dojo to be limited to a serialized (and packed) model of 256 felts at most!
To have a more efficient storage, dojo database is packing the values, by using a layout. A layout is no more than the number of bits expected to represent the type, this awareness of length is called introspection. The packing ensures that we use most bits of a felt252.
Finally, as dojo is using a custom compiler plugin, when a model is generated, every fields of the model is serializable, and the layout can be retrieved. This is what arrives on-chain by calling set_entity on the world contract.
So you can see the dojo database as accepting as input a model data that is completely flat (Span<felt252>) corresponding to the serialized data. And with this, a Span<u8> which contains the layout for each element in the data. The keys are never stored, only used to compute the storage slot where storage must start.

The objective to support array, would be to add the same logic as ByteArray storage, but at the storage level of dojo. Doing so, we will be able to store models of any size.

As Tarrence mentioned, you can make a first version that only supports array of primitives, you've an example here in the introspect module where primitives are handled.

I would break down the path to achieve this as follow:

Add array support into introspection (at cairo and rust compiler plugin level).
This will ensure that the arrays (or spans) can be serialized + their layout can be retrieved to follow the current dojo model.
You may only support primitives in a first version.
Add support for slot segments as ByteArray impl does, but for the dojo storage. The packing should not be affected though.
Here the thing is that, when you compute the storage slot address by hashing the key, this gives you a slot, that's the first slot of a segment (offset 0). Let's call it the original address.
Then, you hash a new key (poseidon([original_address, chunk_index, 'DojoChunk'])) when you've fully used the original address segment. This gives you the next storage segment to use to continue the process until all data are stored.

I hope this gives you more context, don't hesitate if you have any question or an alternative approach.

Note aside: In the case of the ByteArray, they don't need packing as it's internal logic already packs the bytes. In a ByteArray, any felt252 that is inside the data field is guaranteed to have 31 bits of data. So at Dojo level, a ByteArray that is serialized can be seen with a layout of 251 bits for each element as it can't be packed anymore. We may omit the variable length of the pending_word a consider it 251 bits layout too for simplicity.

Akashneelesh · 2024-01-15T18:05:19Z

So

Hello @ponderingdemocritus I would like to take this up, could you please assign this to me. Thanks

Great! Could you please share the approach you are thinking before starting on the work?

Hello @tarrencev, So I was thinking of two approaches :
1. Is where we implement the StorageAccess trait for the Array

2. Have like a Legacymap of the value and have like another variable like a counter
Hey @Akashneelesh happy to see you around!

To complement the Tarrence's statement about serializing the array member, Dojo database system does not actually use the Store trait of types. Instead, it relies on types being serializable.

Therefore, the approach to take here should be a bit different than regular storage. Dojo database uses the storage syscalls directly.

Let me break down the process and let me know if you want to clarify some part of it:

Starknet storage can be seen as a ledger of 2^251 storage slots, each slot can store a felt252. To know the address of a storage slot, starknet usually computes a hash on some keys. Lastly, the storage contains slot segments, a contiguous regions of 256 storage slots that can be access by simply adding one to the offset (u8).
Here note that computing a hash on keys is considered expensive compared to do +1 on an offset, so taking advantage of slot segment is important.

Dojo database relies on storage functions that takes a list of keys to identify the first storage slot (at offset 0 in a segment). Then, the offset is used to actually store the model. Currently using only one slot segment, which causes dojo to be limited to a serialized (and packed) model of 256 felts at most!

To have a more efficient storage, dojo database is packing the values, by using a layout. A layout is no more than the number of bits expected to represent the type, this awareness of length is called introspection. The packing ensures that we use most bits of a felt252.

Finally, as dojo is using a custom compiler plugin, when a model is generated, every fields of the model is serializable, and the layout can be retrieved. This is what arrives on-chain by calling set_entity on the world contract.
So you can see the dojo database as accepting as input a model data that is completely flat (Span<felt252>) corresponding to the serialized data. And with this, a Span<u8> which contains the layout for each element in the data. The keys are never stored, only used to compute the storage slot where storage must start.

The objective to support array, would be to add the same logic as ByteArray storage, but at the storage level of dojo. Doing so, we will be able to store models of any size.

As Tarrence mentioned, you can make a first version that only supports array of primitives, you've an example here in the introspect module where primitives are handled.

I would break down the path to achieve this as follow:

Add array support into introspection (at cairo and rust compiler plugin level).
This will ensure that the arrays (or spans) can be serialized + their layout can be retrieved to follow the current dojo model.
You may only support primitives in a first version.

Add support for slot segments as ByteArray impl does, but for the dojo storage. The packing should not be affected though.
Here the thing is that, when you compute the storage slot address by hashing the key, this gives you a slot, that's the first slot of a segment (offset 0). Let's call it the original address.
Then, you hash a new key (poseidon([original_address, chunk_index, 'DojoChunk'])) when you've fully used the original address segment. This gives you the next storage segment to use to continue the process until all data are stored.

I hope this gives you more context, don't hesitate if you have any question or an alternative approach.

Note aside: In the case of the ByteArray, they don't need packing as it's internal logic already packs the bytes. In a ByteArray, any felt252 that is inside the data field is guaranteed to have 31 bits of data. So at Dojo level, a ByteArray that is serialized can be seen with a layout of 251 bits for each element as it can't be packed anymore. We may omit the variable length of the pending_word a consider it 251 bits layout too for simplicity.

Thanks alot @glihm for enlightening me with the whole process, Based on what you have mentioned I'll take this approach and try implementing it. And if I face a better approach or have any stops , I'll surely shoot a question here.
Thanks.

glihm · 2024-02-23T03:35:20Z

I will close this one as the first phase was done in #1533. Need to check on the Torii side how this will be indexed.

ponderingdemocritus added the dojo-core cairo core tasks label Jan 11, 2024

tarrencev assigned Akashneelesh Jan 12, 2024

glihm self-assigned this Jan 25, 2024

glihm closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Support Arrays #1417

[Core] Support Arrays #1417

ponderingdemocritus commented Jan 11, 2024

Akashneelesh commented Jan 12, 2024 •

edited

Loading

tarrencev commented Jan 12, 2024 •

edited

Loading

Akashneelesh commented Jan 12, 2024

tarrencev commented Jan 12, 2024

glihm commented Jan 13, 2024 •

edited

Loading

Akashneelesh commented Jan 15, 2024 •

edited

Loading

glihm commented Feb 23, 2024

[Core] Support Arrays #1417

[Core] Support Arrays #1417

Comments

ponderingdemocritus commented Jan 11, 2024

Akashneelesh commented Jan 12, 2024 • edited Loading

tarrencev commented Jan 12, 2024 • edited Loading

Akashneelesh commented Jan 12, 2024

tarrencev commented Jan 12, 2024

glihm commented Jan 13, 2024 • edited Loading

Akashneelesh commented Jan 15, 2024 • edited Loading

glihm commented Feb 23, 2024

Akashneelesh commented Jan 12, 2024 •

edited

Loading

tarrencev commented Jan 12, 2024 •

edited

Loading

glihm commented Jan 13, 2024 •

edited

Loading

Akashneelesh commented Jan 15, 2024 •

edited

Loading