Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IntoPyObject conversion trait #4060

Merged
merged 18 commits into from
Aug 3, 2024
Merged

Add IntoPyObject conversion trait #4060

merged 18 commits into from
Aug 3, 2024

Conversation

Icxolu
Copy link
Contributor

@Icxolu Icxolu commented Apr 8, 2024

This implements a proof of concept of a new fallible conversion trait.


Design

IntoPyObject

This introduces a new trait IntoPyObject to perform the conversion of implemented type into Python. Currently the trait is generic so that the same type can have multiple conversions to different Python types. In this version I used it to provide an additional typed conversion for the chrono types when the unlimited API is enabled. There might be more situations where this could be useful. I changed the generic to an associated type as outlined in #4060 (comment)

The Error is specified as an associated type, since every conversion should have a single applicable error (or Infallible if it can't fail).

The return type is specified to Bound<T> to guarantee that we convert to a Python type.

pub trait IntoPyObject<'py>: Sized {
    type Target;
    type Error;

    fn into_pyobject(self, py: Python<'py>) -> Result<Bound<'py, Self::Target>, Self::Error>;
}

IntoPyObjectExt

Additionally to IntoPyObject this also adds IntoPyObjectExt. The sole purpose of this trait is to allow specifying the desired output type using the turbo fish syntax:

obj.into_pyobject::<PyDateTime, _>(py)

This makes it much nicer to work with types that implement multiple conversions and need type hinting. This would probably be the API most users would interact with. This is only beneficial in the case of a generic trait.

Macros

To use the new conversion on return values of #[pyfunction] and friends without breaking current code, a variant of autoref deref specialization is implemented. It prefers the implementation of IntoPyObject over IntoPy<PyObject>. (Similar to the current approach the conversion to PyAny needs to be available.) This means there should be no changes required by users. The macros will pick up the fallible IntoPyObject implementation automatically, if available. For the migration we could then just deprecate IntoPy (and ToPyObject) and advise to implement IntoPyObject instead, removing the old conversions in a future release. Additionally there is another layer of specialization for the () return type in #[pymethods] and #[pyfunctions]. This allows converting them into PyNone as part of the macro code, because that's the Python equivalent of a function returning "nothing", while still using PyTuple as the IntoPyObject::Target.

Open Questions

  • Do we need IntoPyObject be generic, or would an associated type also work? Maybe implementing it on some more types will give us more insight whether the generic is needed.
    From my initial implementation I have the feeling that the generic version introduces a lot of complexity while not really providing a big benefit.
  • What to do with APIs that are generic over IntoPy and ToPyObject? I assume we need to introduce new variants that are generic over IntoPyObject, similar to what we did with the _bound constructors.

There are probably some edge cases that don't work correctly, but I thought I'll gather some feedback first whether this goes into the right direction before I chase those down.

TODOs

  • rename and seal AnyBound, possible candidates:
    • BoundObject
    • BoundType
    • IntoPyObjectOutput
    • BoundMethods
  • lift type restrictions of Either impl

Followups

  • add implementation to references of container types like &HashMap, &BTreeSet or &Vec
  • optimize implementation for PyRef/PyRefMut, possibly by storing a Borrowed in them
  • work out the migration of IntoPy<PyObject> and ToPyObject
    • rework implementations that currently rely on APIs with the old bounds
    • we can only provide a blanket impl for one of them...
  • work out PyBytes specialization ([RFC] standardize PyBytes <-> Vec<u8> or &[u8] or Cow<[u8]> #4182)
  • add BoundObject to the prelude?

Xref #4041

Copy link

codspeed-hq bot commented Apr 8, 2024

CodSpeed Performance Report

Merging #4060 will not alter performance

Comparing Icxolu:into-pyobject (3be2433) with main (3bf2f1f)

🎉 Hooray! codspeed-rust just leveled up to 2.6.0!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

✅ 68 untouched benchmarks

@Icxolu
Copy link
Contributor Author

Icxolu commented Apr 15, 2024

I've played around with this some more, and added more conversions for experimentation. (I will remove most of the impls again from here, once we reached some consensus, since many of them warrant their own PRs)

I have the feeling that an associated type is the more appropriate choice:

  • In pretty much all cases there is one clear Python type to turn a Rust type into.
  • If there would be a case for multiple different conversions, I think having wrapper types for each conversion would a much clearer option.
  • The API is more ergonomic
    • We don't need a separate implementation for PyAny, which is nearly all cases just forwards to a different typed impl.
    • Stricter type safety - because there is only one target type, choosing most appropriate one (instead of falling back to PyAny for the macros) involves less boilerplate.
    • There is no need for an additional extension trait.

For these points I switched the generic for an associated type.

The only type that's a bit special is (), which should have Target = PyTuple, but still turn into None as a return type of #[pyfunction] and #[pymethods]. Since that is all inside macro code, I added another specialization layer to handle that.

@davidhewitt
Copy link
Member

I just wanted to comment that although I haven't managed to actually reply or give thought in detail here, I'm really excited for this and want to circle back to it ASAP. I think at this point it might be that this is too big of a change for landing in 0.22, but I'd be keen to see it land in 0.23 if we can make it happen!

@Icxolu
Copy link
Contributor Author

Icxolu commented May 17, 2024

I'm really excited for this and want to circle back to it ASAP.

I'm excited too! I hope this can both enable new use cases, that we previously just didn't support, while also reducing the complexity to implement Python conversions. Having a single trait in each direction is hopefully making it much clearer how everything is wired together.

I think at this point it might be that this is too big of a change for landing in 0.22, but I'd be keen to see it land in 0.23 if we can make it happen!

Yes, I agree. Given how close we are to releasing 0.22 I think it makes sense to target 0.23 here 🤞I think there is also some more work following this, I included a lot of new impls in here for demontration purposes, but in general I think it would make sense to land them seperately, so we can discuss them individually and make reviewing easier. So no need to rush 😄

@Icxolu
Copy link
Contributor Author

Icxolu commented Jul 6, 2024

I just rebased this again, now that 0.22 is shipped. It seems like that the deref specialization used here prevents the nicer error message that we introduced in #4220 for the missing conversion trait case (or I haven't found the correct place yet).

We should also think again about whether and how we want to integrate the ideas from #4182 here.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just rebased this again, now that 0.22 is shipped. It seems like that the deref specialization used here prevents the nicer error message that we introduced in #4220 for the missing conversion trait case (or I haven't found the correct place yet).

We should also think again about whether and how we want to integrate the ideas from here.

Awesome! I am heavily in favour of moving forward with this trait for 0.23, and I think this PR is probably the correct thing to land as a first step more or less in this form. We can then iterate towards the 0.23 release.

For this PR:

Follow-ups:

  • I think we will need to solve the migration story for how to re-implement ToPyObject and IntoPy<PyObject> (plus other forms) in terms of this new API, and figure out how we expect users to need to adjust code.
  • Ideas from [RFC] standardize PyBytes <-> Vec<u8> or &[u8] or Cow<[u8]> #4182 - yes I think we should do these. It might be worth first solving the above bullet so that ToPyObject and IntoPy behave consistently with whatever we add to this trait.
  • I wonder if we should also adjust FromPyObject to have the type Error associated type, for consistency (and possible optimisations).

src/conversion.rs Outdated Show resolved Hide resolved
@Icxolu
Copy link
Contributor Author

Icxolu commented Jul 7, 2024

I will give this another try later today or tomorrow, but since there is now more that one trait involved, and we are intentionally vague about which one the compiler should use, I feel like there is no clear choice for the compiler to say "This one is not implemented" instead we now have a set of traits from which any implementation would be fine. Therefore the error message is more generic in that case (or at least that would be my explanation). I can also prepare a PR removing the gil-ref deprecations from the macros to see if that helps.

  • I'll seek to give a full review at earliest availability.

Awesome, much appreciated! Are you fine with me leaving all the trait implementations in for the review, or should I remove them here we land them separately?

  • I think we will need to solve the migration story for how to re-implement ToPyObject and IntoPy<PyObject> (plus other forms) in terms of this new API, and figure out how we expect users to need to adjust code.

I will probably look at that next 👍

@davidhewitt
Copy link
Member

Are you fine with me leaving all the trait implementations in for the review, or should I remove them here we land them separately?

I think all implementations are fine to be in this PR 👍

@Icxolu Icxolu force-pushed the into-pyobject branch 3 times, most recently from 45968b9 to 4b52b58 Compare July 9, 2024 22:04
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like it's shaping up nicely and I'm optimistic that we've got a good and flexible enough formulation here. The trait definition is now a fair bit more complex than ToPyObject or IntoPy, but I think that can be handled by good documentation and examples.

Reading through more and more of the implementations I find myself tempted to scrutinise each one, so I'm beginning to wonder how we can land this in chunks without making this PR get stuck in review hell. I think answering that question will probably give us an incremental pathway that also works for users, so I'll keep musing on that point.

I haven't yet looked too hard at the specialization in the macros, I will try to study that soon.

src/conversion.rs Outdated Show resolved Hide resolved
src/conversion.rs Outdated Show resolved Hide resolved
src/conversions/either.rs Outdated Show resolved Hide resolved
Comment on lines +71 to +74
dict.set_item(
k.into_pyobject(py)?.into_bound(),
v.into_pyobject(py)?.into_bound(),
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it would be nice to just have dict.set_item(k, v) and ideally the fact that K and V implement IntoPyObject would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess this ties into how do the migration from IntoPy/ToPyObject. Maybe this is something to solve in a followup once we have put some thoughts into how that can and should work? This implementation is kind of workaround for that missing piece.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think a followup makes sense.

@@ -51,6 +54,29 @@ where
}
}

impl<'py, K, V, H> IntoPyObject<'py> for hashbrown::HashMap<K, V, H>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's a question here, should we implement this for &HashMap too? (Subject to &K and &V implementing IntoPyObject.)

Similar for a lot of other containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be desirable to provide these, since there is no real reason why one should not be able to continue using such a container after the conversion to Python.

I vaguely remember that I tried to add them at some point but ran into lots of HRTBs, I just tried again and it worked like I would have expected so either the implementation evolved and made it simpler, or I just did it wrong last time 🤷.

Since this is already quite a lot I suggest to land them separately.

where
T: IntoPyObject<'py>,
{
type Target = PyList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's a relevant discussion point here if [u8; N] should be treated as a byte slice and become PyBytes (in which case this here might be PyAny, unless we can find a way to do a type-level specialization on this associated type).

Comment on lines +832 to +478
impl<'py, T: PyClass> IntoPyObject<'py> for PyRef<'py, T> {
type Target = T;
type Output = Bound<'py, T>;
type Error = Infallible;

fn into_pyobject(self, _py: Python<'py>) -> Result<Self::Output, Self::Error> {
Ok(self.inner.clone())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if somehow we can manually decrement the borrow count and then just return self.inner to avoid the clone here (same for the PyRefMut case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these are suboptimal... I haven't figured out a way to move the inner because of the Drop impl. But with the complete remove of the gil-refs now, maybe we can also now store a Borrowed in PyRef/PyRefMut, which should solve that issue I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflecting on Borrowed in PyRef / PyRefMut, it creates the downside we already saw with types like Vec<&'a str> where they can't easily be extracted. (Because a PyRef<'a, 'py, T> containing a borrowed won't be able to be stored by iterating a list, for example.)

But I think on balance that might actually be a good thing, because it helps prevent locking objects to read/write for longer than desired. (The migration solution would be encouraging users to extract Vec<Bound<'py, T>> and then .borrow() for just the needed span of code.)

@davidhewitt
Copy link
Member

I've been thinking hard about the migration options, in particular what we want users to have to do when upgrading.

I think if we finish the change to FromPyObject to be replaced by FromPyObjectBound then already in 0.23 users will be making changes to trait implementations. So we could present 0.23 as an adjustment to traits for performance and flexibility and document the migration work needed in both the from-python and to-python directions.

I have been thinking hard about a blanket implementation of IntoPyObject from IntoPy<Py<PyAny>> or the reverse. Both seem to have downsides:

  • impl<T: IntoPy<Py<PyAny>>>> IntoPyObject for T - this would mean we could change all PyO3 APIs to use IntoPyObject and migrate trait implementations incrementally. Users migrate from IntoPy<Py<PyAny>> to IntoPyObject implementations. I think the biggest pain users will face is that .into_py() calls will immediately break when replacing IntoPy<Py<PyAny>> with IntoPyObject implementations (and we wouldn't be allowed both because of the blanked).
  • impl<T: IntoPyObject> IntoPy<Py<PyAny>> for T - this would be potentially breaking in many ways, if we have small behavioural differences between IntoPy<Py<PyAny>> and IntoPyObject for e.g. the bytes cases. I also think this doesn't actually create a seamless migration unless we continue to use IntoPy<Py<PyAny>> as the generic bound on PyO3 APIs for now, which defeats the point of doing the hard work to use the new trait.

The third option is to adopt no blanket implementation and just switch all PyO3 APIs to use IntoPyObject. The proc-macros could potentially prefer IntoPyObject but fall back to IntoPy where necessary. (I also wondered whether we could have some sort of temporary trait to make the same thing work in generic code, but I can't see a way to actually make that work without some kind of negative reasoning.)

I think on balance my currently opinion is that we should take this third option and add neither blanket. This means that code again gets duplicated, similar to the Bound migration. At least then new infrastructure can be built cleanly without compatibility hacks, and no existing code will be broken (aside from places where types need to implement the new traits.)

We can at least reduce the amount of duplication by having a (public?) helper macro to create an IntoPy<Py<PyAny>> implementation from an IntoPyObject implementation.

@Icxolu
Copy link
Contributor Author

Icxolu commented Jul 12, 2024

I have been thinking hard about a blanket implementation of IntoPyObject from IntoPy<Py<PyAny>> or the reverse. Both seem to have downsides:

I agree, we want to change the APIs to use the new IntoPyObject otherwise we kind of turn in circles. So I think only this blanked impl<T: IntoPy<Py<PyAny>>>> IntoPyObject for T would make sense. But there are also a fair number of APIs the use ToPyObject as their bound and since we can't provide both blankets it's probably better to not provide a blanket.

At least then new infrastructure can be built cleanly without compatibility hacks, and no existing code will be broken (aside from places where types need to implement the new traits.)

Yeah I think this is important so that we have a solid foundation for the future.

The proc-macros could potentially prefer IntoPyObject but fall back to IntoPy where necessary. (I also wondered whether we could have some sort of temporary trait to make the same thing work in generic code, but I can't see a way to actually make that work without some kind of negative reasoning.)

For the proc-macros this is what I implemented here already (or did you mean something different?) As for generic code, I will give this a thought, but my initial feeling says the same as you.

We can at least reduce the amount of duplication by having a (public?) helper macro to create an IntoPy<Py<PyAny>> implementation from an IntoPyObject implementation.

Maybe yes, will explore that.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations all look good. I think we should proceed to merge this so that we can then iterate in smaller PRs.

I will briefly have a go now at the missing_intopy error regression...

A::Item: IntoPyObject<'py>,
PyErr: From<<A::Item as IntoPyObject<'py>>::Error>,
{
type Target = PyList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible question if this will become PyAny for SmallVec<u8> creating bytes.

@davidhewitt
Copy link
Member

@Icxolu I pushed a commit which adjusted the autoderef specialization mechanism to make it a bit easier for me to reason about and managed to add fallback cases which means that the diagnostic actually emits hints that IntoPyObject are not satisfied (rather than IntoPy<PyObject>) 🎉

It's a little janky and this branch might need fixups / rebase, but otherwise please feel free to merge this (I'm out of time for today).

@Icxolu Icxolu marked this pull request as ready for review August 3, 2024 13:49
@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 3, 2024

Thanks for the review! Your autoderef specialization mechanism is much easier to read than what I originally implemented, I really like that 🚀 . Also keeping the more descriptive error message is really nice 👍 . I rebased this again, fixed the clippy warnings and added a missing implementation for #[classattr]. I'm excited to see this land 🎉 and will continue to work on the points that we left for followups here.

@Icxolu Icxolu enabled auto-merge August 3, 2024 13:55
@Icxolu Icxolu added this pull request to the merge queue Aug 3, 2024
Merged via the queue into PyO3:main with commit 0a185cd Aug 3, 2024
39 of 42 checks passed
@Icxolu Icxolu deleted the into-pyobject branch August 3, 2024 14:45
@davidhewitt davidhewitt changed the title Experiment: Fallible conversion trait Add IntoPyObject conversion trait Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants