-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: enabling grammar typing for sequences via deserialize #578
Comments
I'm always open in increasing consistency of serializer and deserializer. I suspect, however, that theoretical constructions are very hard to understand, so the best way is to check your theories by trying to implement your suggestions. I must warn that it will not always be possible. quick-xml already has a large set of cases (which I was able to imagine) in struct X;
struct Wrap {
element: X,
} <wrap>
<!-- serialized X (with name "element") inserted into <wrap> element -->
<element/>
</wrap>
<!-- serialized Wrap (with name "wrap") -->
<wrap>
<element/>
</wrap> |
The parts of the deserializer I'm looking to touch are specific to tags on the same level in a sequence. Composability should still be done in the same way via the non-sequence deserializer associated functions, but for untagged collections I'm of the opinion that they should be deserialized from the same level, as is required by the implementation, and done by the serializer. The plan is to extend this behaviour to be a recursive flattening for untagged sequences.
Absolutely agreed. It's far easier to talk to an example implementation rather than in theory what it could do. Mainly what I'm looking for from this issue are cases of undesirable behaviour in this area as I'd noticed places like: Lines 783 to 785 in 642de0a
I don't expect to be able to exhaustively cover much until a PR is submitted, but any general criteria for expected behaviour / behaviour that would yield a rejection would be very useful. An example reusing the preamble in the original issue description #[derive(Serialize, Deserialize, PartialEq, Eq, Debug)]
struct Root2 {
#[serde(rename = "$value")]
content: (Vec<ABChoice>, Vec<ABChoice>)
}
let root2 = Root2 {
content: (
vec![ABChoice::A],
vec![ABChoice::B, ABChoice::A],
),
};
let serialized2 = to_string(&root).unwrap();
let deserialized2 = from_str(&serialized).unwrap(); Could deserialize to Root2 {
content: (
vec![ABChoice::A, ABChoice::B],
vec![ABChoice::A],
),
} If there's been a previous design decision on ambiguities such as this, it would be helpful to have some context for it. |
The reason is very simple: in XML you cannot nest one sequence into another directly except the case when the second sequence is an
I would say, that if that worked that way, the only possible output would Root2 {
content: (
vec![ABChoice::A, ABChoice::B, ABChoice::A],
vec![], // always empty
),
} As for your initial example (side note: the playground with quick-xml), the deserialization error is due to #567. If worked, I think, is should be deserialized to (when Root {
content: (
vec![ABChoice::A, ABChoice::B, ABChoice::A],
vec![CDChoice::C],
None, // not sure, maybe will be Some(vec![])
),
}; This is fundamental restriction of XML -- not all roundtrips are possible. So probably your desired behavior is already implemented, but with bugs 😃 I did some preliminary investigation of that bug, I think that the problem in passing the root deserializer in the line 2723 here: Lines 2715 to 2725 in 642de0a
|
I think this is effectively the crux of what I wanted to discuss by opening this issue. The feature request is a proposal to standardise how this could be represented, drawing from how it is represented by the serializer as both reference and inspiration. XML, without a tag for the collection, can only represent such a collection as a single sequence, as it has no reserved marker for marking collection start and end points. As such, serializing such a collection with arbitrary, nested, untagged collections will likely be inherently lossy. However, there is a way to be potentially more robust about this in the form of using untagged collections as a form of grammar specification for the sequence, where tuples indicate required order, options indicate an optional match for the grammar, and vectors represent non-zero length, unbounded, variable length sequences of the nested type. Potentially for the above vectors can be read as anything that implements FromIterator / IntoIterator, but this may break determinism for the serialized xml string. A further addition to the robustness of specifying these sequences by grammar, could be to add a proc_macro derive as an opt-in feature to check the robustness of the grammar such that it would create a non-public const that runs a compile-time check for grammar ambiguities, but I believe that to be well beyond the scope of this issue. I would envision that this would effectively solve the problem of the deserializer not always being able to mirror the serializer, if this check passes. |
This issue is to gauge interest in being able to validate the variadic layout of a sequence via tuple elements that are
Option
I have a use case to generate schema definitions with sequences defined by a grammar. Given that it's already possible to enforce order, and serializing has builtin flattening of options and sequences, there are two key features for deserializing missing that would fully enable building types to validate a sequence grammar.
Deserializing end of sequence for a tuple should try to deserialize as visiting
None
for each remaining element of the tuple sequence.Deserializing a sequence of sequences (of sequences...*) should recursively flatten.
Both these features are already present implicitly in the serializer, this would just be making the deserializer match.
This would allow for examples like below to work
Currently serializing this produces the string
Mirroring this in the deserializer is desirable.
I've managed to get a hacky implementation of this to work sending all cases that would stop iteration to first try and deserialize as the
DeserializeSeed
with a deserializer that can only call visit_none, however this would lead to the case where a vector of Options produces an infinite loop so there's still some design work to be done there.With regards to recursive sequence flattening, I suspect the offender is
<SeqItemDeserializer as de::Deserializer>::deserialize_seq
, however just making this recursive seems to have other potential ramifications associated. As such, design input would be appreciated there as well.I'm very happy to do the work myself. However, I would like to consult, firstly, if this would be something that can be upstreamed, and secondly if there are any pitfalls in this area I should be aware of from a design standpoint, particularly that would prevent it getting upstreamed.
The text was updated successfully, but these errors were encountered: