-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip metadata fix #1774
base: master
Are you sure you want to change the base?
Strip metadata fix #1774
Conversation
} | ||
} | ||
|
||
fn collect_types(&mut self, metadata: &Metadata, t: &PortableType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this logic is basically recursing through all of the types linked from some type and adding them to the set. But this is the same thing that we do in PortableRegistry::retain
here https://github.com/paritytech/scale-info/blob/master/src/portable.rs#L104; is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, retain is called dead last before updating types, but we need to collect types ourselves to determine which outer type variants are safe to drop/which pallets can be stripped or dropped completetely.
|| !matches!( | ||
pallet, | ||
PalletMetadataInner { | ||
storage: None, | ||
call_ty: None, | ||
event_ty: None, | ||
error_ty: None, | ||
constants: map, | ||
.. | ||
} if map.is_empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make any difference? I can't see why a pallet would exist that had nothing meaningful in it, but even if it did, is there any harm in leaving it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I am starting to see why this is kept but not understanding it yet; you are turning things to None
above in some cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codegen size
@@ -588,7 +584,7 @@ impl<'a> MetadataHasher<'a> { | |||
|
|||
// Get the hashes of outer enums, considering only `specific_pallets` (if any are set). | |||
// If any of the typed that represent outer enums are encountered later, hashes from `top_level_enum_hashes` can be substituted. | |||
let outer_enum_hashes = OuterEnumHashes::new(metadata, self.specific_pallets.as_deref()); | |||
let outer_enum_hashes = OuterEnumHashes::new(metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we no logner care about the possibility of the enums like RuntimeCall
etc being changed, but how can we avoid caring about this?
(either the variants and all of their types still exist, so the hashes are the same, or we have stripped some types and/or variants and then the hashes owuld be different unless we ignore them, wouldn't they?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list of pallets/variants types is completely dependant on the given set of pallets for a particular pallet or unpredictable
metadata/src/utils/retain.rs
Outdated
fn collect_extrinsic_types(&mut self, extrinsic: &ExtrinsicMetadata) { | ||
let mut ids = Vec::from([ | ||
extrinsic.address_ty, | ||
extrinsic.call_ty, | ||
extrinsic.signature_ty, | ||
extrinsic.extra_ty, | ||
]); | ||
|
||
for signed in &extrinsic.signed_extensions { | ||
ids.push(signed.extra_ty); | ||
ids.push(signed.additional_ty); | ||
} | ||
for id in ids { | ||
self.seen_ids.insert(id); | ||
} | ||
} | ||
|
||
if let Some(ty) = pallet.event_ty { | ||
type_ids.insert(ty); | ||
/// Collect all type IDs needed to represent the runtime APIs. | ||
fn collect_runtime_api_types(&mut self, api: &RuntimeApiMetadataInner) { | ||
for method in api.methods.values() { | ||
for input in &method.inputs { | ||
self.seen_ids.insert(input.ty); | ||
} | ||
self.seen_ids.insert(method.output_ty); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions don't recurse through any sub types. While I don't think anything should, if we did decide we wanted to recurse here then I guess these should, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a change that does traverse, a few more types added to retained set on average.
let mut retained_set = TypeSet { | ||
seen_ids: type_set | ||
.seen_ids | ||
.difference(&retained_set.seen_ids) | ||
.copied() | ||
.collect(), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like you are:
- building a set of types to keep based on the pallets we want to keep (
retained_set
) - cloning that set and then adding all of the other types we want to keep (runtime APIs etc) too (
type_set
) - taking the difference between those (ie
type_set - retained_set
? So just the other types without the pallet ones, which we could get by just not cloning above?). This difference is nowretained_set
. - for pallets we want to filter, run
update_filtered_pallet
, which roughly seems to keep things in the pallet if they reference types we've collected outside of the pallet stuff, and remove things if not. We then only discard pallets if no types that they directly reference are used elsewhere outside of pallets.
I might be wrong, but I don't really understand the reason for doing this? since you collected the outer enums, I can see that the first pallet we see that references those would actually be kept (which I guess has the effect of keeping the variant too?), but it feels like quite a long winded way to go about avoiding getting rid of variants if the outer enums are used somewhere, and I don't know if there is any other benefit :D
I also see that update_filtered_pallet
calls retained_set.remove()
each time it checks, so only the first pallet referencing eg RuntimeCall would end up having bits kept I think, and other pallets that reference RuntimeCall
would be removed. Is that intentional?
Overall, I feel like we can remove a lot of this logic and rely on scale_info::PortableRegistry::retain
for the recursive retaining of IDs we care about (because either that works ok already, or doesn't work and should be fixed). Then the logic can be something like:
- Add all of the type IDs from the pallets etc that we want to keep to a set
- For any return types or storage types in these things we are keeping, do a deep scan to see if we find the
RuntimeCall
,RutimeEvent
orRuntimeError
outer enums. If we find one, we can't strip any variants out of it - Now, call
PortableRegistry::retain
to keep all of the type Ids we need (which will vary depending on whether those enums had variants stripped out of them or not and thus fewer IDs to recurse through).
All of that said though, if there is some value I'm not understanding in the way that we are keeping pallets etc here, then probably I'd need to be walked through it and we'd want to add a comment or two somewhere here to explain the approach and reasoning for it for our future selves :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking the difference between those (ie type_set - retained_set? So just the other types without the pallet ones, which we could get by just not cloning above?). This difference is now retained_set.
The difference is the types present in the type_set
and absent in retained_set
. retained_set
is not interesting in this case as types were already present in retained pallet. But i feel like we are talking about the same thing
but it feels like quite a long winded way to go about avoiding getting rid of variants if the outer enums are used somewhere, and I don't know if there is any other benefit :D
Its more about getting rid of pallets and outer enums, but only if they are not referenced anywhere. Its about stripping types that are not interesting to user without accidentally removing something that is required.
I also see that update_filtered_pallet calls retained_set.remove() each time it checks, so only the first pallet referencing eg RuntimeCall would end up having bits kept I think, and other pallets that reference RuntimeCall would be removed. Is that intentional?
I guess i'm misunderstanding something, shouldn't only one pallet with the type be enough or we can safely ignore those refs to runtime call
/runtime error
/runtime event
and remove pallet based purely on storage/constant types? My intuition was to keep them as they might be referencing inner types inside outerenum variant, ie pallet.event_ty
= 365 and metadata.outer_enum.event_ty = 22
All of that said though, if there is some value I'm not understanding in the way that we are keeping pallets etc here, then probably I'd need to be walked through it and we'd want to add a comment or two somewhere here to explain the approach and reasoning for it for our future selves :)
as above: My intuition was to keep them as they might be referencing inner types inside outerenum variant, ie pallet.event_ty
= 365 and metadata.outer_enum.event_ty = 22
I had a look over and think I'm starting to understand the code more, though I still think that it's more complex than we need (but let's chat tomorrow and look at the code more to confirm). Essentially, So, the approach I'd lean towards with this is more like:
For working out whether some type contains an outer enum, you might end up with a recursive check which is a simplified version of yours, and doesn't need to allocate anything like: fn find_tys_in_id(id: u32, types: &PortableRegistry, unseen: &mut BTreeSet<u32>) {
unseen.remove(id)
let ty = types.resolve(id);
match ty.type_def {
TypeDef::Composite(TypeDefComposite { fields }) => {
for field in fields {
find_tys_in_id(field.ty.id, types, unseen)
}
}
TypeDef::Variant(TypeDefVariant { variants }) => {
for variant in variants {
for field in &variant.fields {
find_tys_in_id(field.ty.id, types, unseen)
}
}
}
TypeDef::Array(TypeDefArray { len: _, type_param })
| TypeDef::Sequence(TypeDefSequence { type_param })
| TypeDef::Compact(TypeDefCompact { type_param }) => {
find_tys_in_id(type_param.id, types, unseen)
}
TypeDef::Tuple(TypeDefTuple { fields }) => {
for field in fields {
find_tys_in_id(field.id, types, unseen)
}
}
TypeDef::Primitive(_) | TypeDef::BitSequence(_) => {
// End of recursion.
}
}
} I def think that the existing code can be made clearer w.r.t what steps it takes, and hopefully this extra check to see if we can strip outer enum types handles the edge case we've faced where we strip variants even though we need the fully types to decode something in Subxt. Note: If the outer enum types are inputs to some runtime API etc, we can still strip them I think. All this means is that users won't be able to express things about pallets that have been stripped when constructing such inputs. Everything should still encode ok. |
Description:
See #1659
This fixes the error described inside the issue, however it will strip types in a bit different way which will result in slightly less savings on average.
Biggest difference between the new and old approaches is that:
outerenum
enums, unless there are some variants that do not reference any types we intend to retain.Size comparison with the previous stripping approach:
size reduction Comparison:
Comparison for old and new stripping methods. be warned, it's 1K lines