Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: language support for async sequences #261

Closed
gafter opened this issue Feb 5, 2015 · 279 comments
Closed

Proposal: language support for async sequences #261

gafter opened this issue Feb 5, 2015 · 279 comments

Comments

@gafter
Copy link
Member

gafter commented Feb 5, 2015

Both C# and VB have support for iterator methods and for async methods, but no support for a method that is both an iterator and async. Its return type would presumably be something like IObservable<T> or IAsyncEnumerable<T> (which would be like like IEnumerable but with a MoveNext returning Task<bool>).

This issue is a placeholder for a feature that needs design work.

@HaloFour
Copy link

HaloFour commented Feb 5, 2015

I'd personally would love to see support for both IObservable<T> and IAsyncEnumerable<T>. The former interface at least already exists in the BCL and there is fantastic support for it in Rx. The latter interface has already been defined in the sister-project Ix (and under System.Collections.Generic no less) so would this language feature involve taking a dependency on that project or duplicating that interface in the BCL?

Ultimately switching between the two would be pretty easy (and already supported in Rx/Ix), but from a sequence producer point of view they would behave a little differently in that yield return for IObservable<T> would likely continue executing immediately whereas for IAsyncEnumerable<T> it would wait until the next invocation of MoveNext().

Also, if considering support for IObservable<T> you might want to consider requiring that the generator method accept a CancellationToken which would indicate when the subscriber has unsubscribed.

From the consumer point of view they should probably behave the same way. Observable.ForEach allows the action to execute concurrently and I think that it would probably be pretty unintuitive to allow the foreach body to have multiple concurrent threads (assuming that they're not being dispatched through a SynchronizationContext). If the implementation is similar to how await works whatever intermediary (SequenceAwaiter, etc.) could handle the details of buffering the results from an IObservable<T> or an extension method could just turn it into an IAsyncEnumerable<T>.

@scalablecory
Copy link

@HaloFour Observable.Create already provides an optimal implementation of this that language extensions wouldn't add any value to.

IAsyncEnumerable, however, has no optimal way to generate a sequence other than implementing the interface manually. It's fairly easy to make something that emulates yield return but it is super inefficient so this is badly needed.

@HaloFour
Copy link

HaloFour commented Feb 5, 2015

I don't disagree. Rx is awesome like that. I advocate for it mostly to bring Rx closer to the BCL so that people are more aware that it exists, and also because those core interfaces are at least a part of the BCL whereas IAsyncEnumerable<T> is brand new to the BCL (and duplicates Ix).

@MgSam
Copy link

MgSam commented Feb 5, 2015

I'm not familiar with Ix, so I can't comment on any existing IAsyncEnumerable, but I would rather the team start fresh when thinking about async enumerables rather than try to build off IObservable. Rx was an interesting project, but it was designed mostly before async existed and then later tried to bolt the two concepts together with varying success. Present-day Rx also has a very cluttered API surface area with poor documentation all around.

async/await enables code that looks almost identical to synchronous code- I'd like to be able to work with asynchronous sequences as effortlessly as you can work with IEnumerable today. I've definitely wanted to mix yield return and async/await before so this is a feature that would be very helpful.

@HaloFour
Copy link

HaloFour commented Feb 5, 2015

Indeed, there is a lot of duplication between the two because they were developed independently and Rx never had the resources that BCL/PFX had. I also don't think that Rx/Ix could be merged into the BCL as is.

The Ix IAsyncEnumerable<T> interface is exactly as described here, basically identical to IEnumerable<T> except that MoveNext() returns Task<bool>. As mentioned the big difference between something like IObservable<T> and IAsyncEnumerable<T> is that the latter is still a pull-model as the generator really couldn't continue until the consumer called MoveNext() again. In my opinion this would make it less suitable for certain concurrent processing scenarios since the producer code isn't running between each iteration. An IObservable<T> async iterator could continue executing immediately after yielding a value.

In my opinion supporting both would be worthwhile. The compiler could generate different state machines depending on the return type of the async iterator.

@thomaslevesque
Copy link
Member

I've been wishing for this feature ever since C# 5 came out. Being able to write something like yield return await FooAsync() would be very useful; currently when I have an async method that returns a collection, I just return a Task<IReadOnlyCollection<T>>, because implementing lazyness has too much overhead.

I noticed that Roslyn already has an IAsyncEnumerable<T> interface here. That's pretty much the design I had in mind, although I had forgotten about cancellation. To make it really useful, we would also need an async version of foreach (including a way to pass a CancellationToken to MoveNextAsync).

@paulomorgado
Copy link

@thomaslevesque, the Roslyn link is 404.

@thomaslevesque
Copy link
Member

@thomaslevesque, the Roslyn link is 404.

Uh... looks like it's no longer there. A search for IAsyncEnumerable returns nothing (the name only appears in a comment). Perhaps it was moved and renamed to something else, or it was just removed.

@anpete
Copy link

anpete commented Apr 21, 2015

Entity Framework uses the IAsyncEnumerable pattern to enable async database queries. In EF6 we had our own version of the interface, but in EF7 we have taken a dependency on IX-Async.

@HaloFour
Copy link

@anpete Seems to me that if async streams depends specifically on a new BCL IAsyncEnumerable<T> interface that not only will it not be a very usable feature until more projects more to the newer frameworks but there will also be a lot of confusion between the different-yet-identical interfaces that already exist.

Perhaps the compiler could support the different interfaces by convention, or have an easy way to unify them through a common utility extension method. But if, for whatever reason, they need to be converted back to their proper specific interface that would still pose problems.

I believe quite strongly that not at least trying to integrate the BCL and the Roslyn languages with Rx/Ix is a massive wasted opportunity.

@tpetricek
Copy link

Just to provide some additional background, this can already be done in F# (because F# "computation expressions", which is a mechanism behind both iterators and asyncs is flexible enough). So, the C# design might learn something useful from the F# approach to this. See:

Probably the most interesting consideration here is what is the programming model:

  • F# async sequences are pull-based. You ask for a next item and then a callback is called (async) when the value is available. Then you ask for a next item, until the end.
  • Rx is push-based. You ask it to start, and then it keeps throwing values at you.

You can convert between the two, but going from Rx to AsyncSeq is tricky (you can either drop values when the caller is not accepting them, or cache values and produce them later).

The thing that makes AsyncSeq nicer from sequential programming perspective (i.e. when you write statement-based method) is that it works well with things like for loops. Consider:

asyncSeq { 
  for x in someAsyncSeqSource do
    do! Async.Sleep(1000)
    processValue x }

Here, we wait 1 second before consuming the next value from someAsyncSeqSource. This works nicely with the pull-mode (we just ask for the next value after 1 second waiting), but it would be really odd to do this based on Rx (are you going to start the loop body multiple times in parallel? or cache? or drop values?)

So, I think that if C# gets something like asynchronous sequences (mixing iterators and await), the pull-based design that is used by F# asyncSeq is a lot more sensible. Rx works much better when you use it through LINQ-style queries.

EDIT: (partly in reply to @HaloFour's comment below) - I think that it makes sense to support the async iterator syntax for IAsyncEnumerable<T> (*), but not for IObservable<T>, because you would end up with very odd behavior of foreach containing await on Task<T>.


(*) As a side-note, I find IAsyncEnumerable<T> quite odd because it lets you call MoveNext repeatedly without waiting for the completion of the first - this is probably never desirable (and AsyncSeq<T> in F# does not make that possible).

@HaloFour
Copy link

@tpetricek The difference in behavior between IAsyncEnumerable<T> and IObservable<T> is exactly why I think async iterators should support both, it gives the programmer the capacity to decide whether it's a push- or pull-model and abstracts the difference to the consumer. I think a lot of scenarios benefit from a push-model, such as launching a bunch of operations simultaneously and wanting to process the results as they are available.

Beyond that hopefully both interfaces will enjoy support of all of the common LINQ operators plus those operators that apply to asynchronous streams.

@dsyme
Copy link

dsyme commented Apr 22, 2015

@tpetricek - The FSharp.Control.AsyncSeq documentation has been clarified to use the terminology "asynchronous pull", rather than just "pull", i.e. a pull operation that returns asynchronously, Async<T>. I'll leave it to others to debate what exactly the difference is between an "asynchronous pull" and a "synchronous push" :)

@radekm
Copy link

radekm commented Apr 25, 2015

It would be nice if the reading from async sequence had constant stack usage and simple associative operations like concatenation had decent performance no matter whether left- or right-associated. Eg. reading from IEnumerables constructed by following functions

        static IEnumerable<int> LeftAssocEnum(int i)
        {
            var acc = Enumerable.Empty<int>();
            while (i > 0)
            {
                acc = Enumerable.Concat(acc, new int[] { i });
                i--;
            }
            return acc;
        }

        static IEnumerable<int> RightAssocEnum(int i)
        {
            var acc = Enumerable.Empty<int>();
            while (i > 0)
            {
                acc = Enumerable.Concat(new int[] { i }, acc);
                i--;
            }
            return acc;
        }

causes StackOverflowException for sufficiently large i and both IEnumerables have quadratic complexity.

@vladd
Copy link

vladd commented Apr 25, 2015

@radekm For your kind of usage (sequence is materialized, size is known in advance) you can already use List<int>.

    static IEnumerable<int> LeftAssocEnum(int i)
    {
        var acc = new List<int>(i);
        while (i > 0)
        {
            acc.Add(i);
            i--;
        }
        return acc;
    }

Does your request mean that all possible implementations of IEnumerable<T> (including immutable and lazy ones) should behave like List<T>?

@radekm
Copy link

radekm commented Apr 25, 2015

@vladd It was only a simple example, you can take for instance Fib()

        static IEnumerable<BigInteger> Fib()
        {
            return Fib(BigInteger.Zero, BigInteger.One);
        }

        static IEnumerable<BigInteger> Fib(BigInteger a, BigInteger b)
        {
            yield return a;
            foreach (var x in Fib(b, a + b))
            {
                yield return x;
            }
        }

which has the same problems. What I want is to compose complex asynchronous sequences from very simple and reusable parts. To do this the operators like concatenation must be efficient. Since I don't know how to do this in C# I'll give a few examples in Scala with scalaz-stream.

  1. Recursion can be used to define streams:
def fib(a: BigInt = 0, b: BigInt = 1): Process[Nothing, BigInt] =
    emit(a) ++ fib(b, a + b)

There is no risk of stack overflow and reading the first n items takes O(n) not O(n^2) (assuming that a + b is computed in constant time which is not true).
Note: fib(b, a + b) is passed by name so the above code terminates.

  1. Even transformations of streams are easily composable:
process1.take[Int](5).filter(_ > 0) ++ process1.id

This applies the filter only to the first 5 integers of the stream. You can use it with operator |>

Process(1, 2, -3, -4, -5, -6, -7) |> (process1.take[Int](5).filter(_ > 0) ++ process1.id)

and it gives you 1, 2, -6, -7.

@LeeCampbell
Copy link

I think that it would be wise to have language parity with F# for supporting async pull sequences (e.g. IAsyncEnumerble<T>, AsyncSeq<'T>).
@tpetricek and @dsyme make very valid points here and the links are excellent and well worth reading as there appears to be confusion between when it is appropriate to use async pull vs IObservable<T>.

That leads me on to making some comments about Rx and why I dont think it needs any language support (right now).

  1. IObservable<T> is in the BCL. Fine, so people know about it.
  2. Being a library, it can have a faster release cadence than the language. This has been particularly positive for the adoption and improvement of the library. As we speak Rx 3.0 is in development, and it may have breaking changes. Let's not mix libraries with languages. You can also see this now happening at the framework level.
  3. Yup we need better doc's and education. I tried my best to do my part IntroToRx.com

@thomaslevesque says "I've been wishing for this feature ever since C# 5 came out.".
It seems to me that his example is a great candidate for Rx (async, lazy and support for cancellation).

@HaloFour "Observable.ForEach" shudder.
Please don't use this method.
It needs to be removed.
It has no cancellation support, nor does it have any error handling/OnError

@HaloFour
Copy link

HaloFour commented May 5, 2015

@LeeCampbell I'd largely be happy if the C# team did the same thing they did with await and provided a pattern that could be used to describe anything as an asynchronous stream. Then Rx could easily support that pattern, probably through an extension method that would describe the correct back-pressure behavior.

I think that there is a massive amount of information for Rx out there, but if nobody knows to look it might as well not exist. I think that it needs the same kind of campaign from MS that LINQ and asynchrony received. Some kind of inclusion in the languages pushes that point. I've been doing a lot of Java dev lately and it annoys me how much excitement there seems to be building around Rx that I don't see on the .NET side.

@LeeCampbell
Copy link

I am interested to see how you would see this work. I think the way you work with and AsyncEnum and the way you work with an IObservable sequence are quite different. The former you poll and pull from until complete and then you move on to the next statement.

IAsyncEnumerable<int> sequence = CreateAsynEnumSeq();
Output.WriteLine("Awaiting");
await sequence.ForEachAsync(Output.WriteLine);
Output.WriteLine("Done");

The later you set up a subscription providing call backs and then move on immediately. The callbacks for an Observable sequence are called at some future point in time.

IObservable<int> sequence = CreateObservableSeq();
Output.WriteLine("Subscribing");
sequence.Subscribe(Output.WriteLine, ()=>Output.WriteLine("Done"));
Output.WriteLine("Only Subscribed to, but not necessarily done.");

With this in mind, they (to me at least) are totally different things, so I am not sure why or how language support would help here. Would like to see a sample of your ideas. I can see some usefulness for language support of AsynEnum sequences, again, at least to get language parity with F#

@HaloFour
Copy link

HaloFour commented May 7, 2015

@LeeCampbell

To give you an idea, I already currently have an extension method for IObservable<T> called GetAsyncEnumerator which returns my own IAsyncEnumerator<T> implementation:

public IObservable<int> Range(int start, int count, int delay) {
    return Observable.Create(async observer => {
        for (int i = 0; i < count; i++) {
            await Task.Delay(delay);
            observer.OnNext(i + start);
        }
    });
}

public async Task TestRx() {
    Random random = new Random();
    IObservable<int> observable = Range(0, 20, 1000);
    using (IAsyncEnumerator<int> enumerator = observable.GetAsyncEnumerator()) {
        while (await enumerator.MoveNextAsync()) {
            Console.WriteLine(enumerator.Current);
            await Task.Delay(random.Next(0, 2000));
        }
    }
}

There are several overloads to GetAsyncEnumerator depending on if/how you want to buffer the observed values. By default it creates an unbounded ConcurrentQueue into which the observed values are collected and MoveNextAsync polls for the next value available in that queue. The other three options are to use a bounded queue, to have IObservable<T>.OnNext block until there is a corresponding call to MoveNextAsync or to have IObservable<T>.OnNext not block but have MoveNextAsync return the latest available value, if there is one. There are also overloads that accept a CancellationToken, of course, and IAsyncEnumerator<T>.Dispose unsubscribes the observer.

I hope that kind of answers your question. It's early and I didn't get much sleep last night. Basically, I am treating the IObservable<T> as an IAsyncEnumerable<T> and bridging between the two isn't all that difficult. The big difference is that the observable can continue to emit values and not have to wait for someone to poll.

@scalablecory
Copy link

Guys who are interested in IObservable support -- can you describe the benefit integrating this into the language would bring?

@HaloFour
Copy link

HaloFour commented May 7, 2015

@scalablecory

  1. The interface is already in the BCL and has been since .NET 3.5.
  2. Rx already provides an amazing degree of support for working with IObservable<T> and marries the existing language concepts of asynchrony with LINQ beautifully.
  3. I think that "push" model asynchronous enumeration is very useful for a lot of scenarios, specifically when you need to dispatch a series of asynchronous requests at once and then process them as the results become available. This is currently pretty obnoxious to handle with async methods alone.
  4. Volta needs MS love.

To Devil's Advocate my own arguments:

  1. Probably moot as we'd likely need a streaming analog to GetAwaiter so we're stuck waiting on a BCL change anyway.
  2. Someone's bound to write all of the same LINQ methods to work against IAsyncEnumerable<T>, despite being a pretty massive duplication of effort. Rx already has, it would be silly to do it again.
  3. I'm sure that IAsyncEnumerable<T> can wrap a "push" notification source. I'm already doing it.
  4. Java clearly loves Volta more. 😉

Now, given the probability of Devil's Advocate point 1, some streaming analog to GetAwaiter, support for IObservable<T> from the consuming side could be accomplished by extension methods within Rx, and I'd be perfectly happy with that.

Now, for my arguments from the generating side, I'd like to revisit my use case of dispatching a bunch of asynchronous operations. This is something that the current project I work on does incredibly frequently, basically n+1 operations against web services where the first response provides a bunch of IDs that then need to be resolved individually*. If async streams return IAsyncEnumerable<T> where the coroutine isn't continued until the consumer asks for the next value then you don't really have the facility to perform the operations in parallel.

public async IAsyncEnumerable<User> QueryUsers(int organizationId, CancellationToken ct) {
    Organization organization = await ws.GetOrganization(organizationId, ct);
    foreach (int userId in organization.UserIds) {
        User user = await ws.GetUser(userId);
        yield return user; // can't continue until consumer calls IAsyncEnumerator.MoveNext
    }
}

Granted, there could be BCL methods to make this a little easier, but it feels like something that can be supported out of the box:

public async IObservable<User> QueryUsers(int organizationId, CancellationToken ct) {
    Organization organization = await ws.GetOrganization(organizationId, ct);
    foreach (int userId in organization.UserIds) {
        User user = await ws.GetUser(userId);
        yield return user; // Effectively the same as calling IObserver.OnNext(user)
    }
}

@bbarry
Copy link

bbarry commented Sep 26, 2016

while (TryGetNext(out var b) is var x && b) { ... }

I kinda like that. x being readonly is an interesting bonus also. It is unfortunate that this call is backwards compared to every other TryGet* method out there though.

I don't think handwritten consumption shapes should weigh heavily on the design here.

@HaloFour
Copy link

@alrz

That has two problems. First it binds this proposal to the fate of another must-less-certain proposal, one that would likely require CLR and BCL changes in order to accomplish. Two, it still results in the loss of variance since structs can't be variant.

@ljw1004
Copy link
Contributor

ljw1004 commented Sep 26, 2016

Pattern: familiar vs efficient

(I'm summing up some currently-under-discussion design points)...

What should the async foreach pattern be like? First option is to be familiar like IEnumerable. Here's an example type that would satisfy the pattern:

interface IAsyncEnumerable<out T> {
   IAsyncEnumerator<T> GetAsyncEnumerator();
}

interface IAsyncEnumerator<out T> {
   T Current {get;}
   Task<bool> MoveNextAsync();
}

The other option is to be more efficient. We can be more efficient in a few ways: (1) by avoiding heap allocation and returning just structs which include state machine and method builder and enumerator, so the caller can allocate them all on the stack; (2) by avoiding the double-interface dispatch; (3) by having a tight non-async inner loop. There's been lots of discussion on fine-tuning the precise best way to achieve these efficiency goals, but here are some simplistic versions:

// (1) avoiding heap allocation entirely
// Declaration side:
async iterator MyAsyncEnumerable<int, struct StateMachine> GetStream() { ... }
// Consumption side:
foreach (var x in GetStream()) { ... }

// (2) avoiding the double-interface-dispatch
interface IAsyncEnumerator<out T> {
   Task<Tuple<T,bool>> TryGetNextAsync()
}

// (3) avoiding async when working through the batch...
while (await enumerator.GetNextChunkAsync())
{
   while (enumerator.TryMoveNext(out value)) { ... }
}

_As the discussion has progressed, I've seen the "efficient" versions become steadily less appealing..._

Heap allocations. Why do we care about avoiding heap allocations entirely? I see that eliminating heap allocation is desirable for in-memory data structures that you iterate over with IEnumerable. But when it comes to async streams, if the entire stream ever gets a cold await, then it will necessarily involve allocation. The only folks who will benefit from heap-free async streams are those who write middleware, e.g. something that can sit equally on top of either a MemoryStream or a NetworkStream, and still be as efficient as possible in the first case. (We did indeed see such a need when we introduced async).

In all, the heavy-weight language work needed to support heap-free async streams seems way disproportionate to the benefit. (That language work might be struct SM like I wrote in the above code, or the ability for a method to have var as its return type, or the ability to declare something that looks like a struct and also a method).

Heap-free streams as we've envisaged them will only apply to consumption by foreach: as soon as you use the LINQ extension methods, then you get boxing. @MadsTorgersen and @CyrusNajmabadi spent some time exploring LINQ alternatives that avoid boxing. The gist of it was that you could pass the type of the enumerable and enumerator. This is clever, but looks pretty heavyweight.

// This is familiar LINQ extension method
static void Select<T,U>(this IEnumerable<T> src, Func<T,U> lambda)

// We could plumb more information through to avoid boxing
static void Select<T,Table, Tator, ...>(this IEnumerable<T,Table,Tator> src, ...)

At this point we waved our hands and said "We've discussed and investigated escape analysis in the past -- something where some part of the infrastructure can see that the IEnumerable doesn't escape from the method even despite its use in LINQ queries. If we had such escape analysis then it would benefit everyone, including the folks who need it most, rather than being an efficiency oddity specific to async streams that requires everyone to rewrite their code."

Conclusion: should give up on heap-free async streams.

Avoid double interface dispatch. Sometimes we believe that interface dispatch onto a struct is one of the slowest parts of .NET. Other times we think it's cached and so pretty fast. We haven't benchmarked this yet. There's no point pursuing it for efficiency's sake unless it's been benchmarked.

The downside of "avoid double interface dispatch" is that it doesn't work nicely with covariance. And covariance is more important. The only way to retain covariance in just a single interface dispatch is something like this: T TryGetNext(out bool succeeded). That's hard to stomach.

One attractive feature of a TryGetNext method is that it makes the enumerator atomic. (When it's split into a MoveNext method call followed by a Current property fetch, that can't be atomic, and will lead to race conditions of two threads try to consume an enumerator at the same time). Atomicity is nice. It means, for instance, that async streams could be consumed by a "choice" language primitive (similar to the choice operator in CCS and pi calculus, similar to what Go uses).

Avoid async when working through the batch. Let's work through concretely how this would work when you iterate over an async iterator method. There are subtleties here that aren't at first obvious...

async iterator IAsyncEnumerable<int> GetStream()
{
   while (true)
   {
      await buf.GetNextByteAsync();
      yield return buf.NextByte;
   }
}

var enumerator = GetStream().GetEnumerator();
while (await enumerator.GetNextChunkAsync())
{
   while (enumerator.TryMoveNext(out value)) { ... }
}

The question is, what granularity do the chunks come in?

The easy answer is that GetNextChunkAsync() will progress as far as the next yield return, and then TryMoveNext will succeed exactly once. This is easy to implement but it defeats most of the value-prop of chunking in the pattern -- because each chunk will be exactly one item big.

A more complicated answer is that TryMoveNext will execute as much of the method as possible. It will have the ability even to kick off an await by calling GetAwaiter() and then awaiter.IsCompleted. Only when it comes to the end of the async iterator method or to a cold await will it finally return false. There's something a little hairy about this, about having the await be finished by a different method from the one that started it. Does it also have implications for IAsyncDisposable? Not sure.

Conclusion: we should of course benchmark the single-interface-dispatch and the buffers. But they would have to show dramatic wins to be worth the concommitant ugliness.

@ljw1004
Copy link
Contributor

ljw1004 commented Sep 26, 2016

Cancellation and ConfigureAwait

(I'm summing up some currently-under-discussion design points)...

We'd talked about cancellation being done in two ways:

// way 1
using (var ator = able.GetAsyncEnumerator(token))

// way 2
foreach (await var x in xs.WithCancellation(token))

To avoid the compiler+language having to know about cancellation, we could define the first method with a default parameter: IAsyncEnumerator<int> GetAsyncEnumerator(CancellationToken cancel = default(CancellationToken)).

We'd talked about ConfigureAwait being done only in the second way:

foreach (await var x in xs.ConfigureAwait(false))

We'd talked about a shorthand .Configure() method to let easily both provide a cancellation token and configure the await.

We'd talked about how when you obtain an obejct that implements IAsyncDisposable, then knowledge of how it should await/cancel has already been stored inside the object, and so a consumer need only await it.

_QUESTIONS_.

Q1. Why do we need both "way1" and "way2"? Can't we just do it with "way2"?

Q2. Normally you can do foreach (var x in xs) and common datatypes give you back a struct type for your enumerator, e.g. List<T>.Enumerator. Is this still possible with the .ConfigureCancellation() approach?

Q3. Does .ConfigureCancellation() get defined in IAsyncEnumerable? Or is it an extension method on IAsyncEnumerable? Or is it left to be defined by any concrete types that wants to offer it?

Q4. Does .ConfigureCancellation return an IAsyncEnumerable that can be composed further? Or does it return a ICancelledAsyncEnumerable which can't be used by the LINQ combinators but which does satisfy the foreach pattern?

Q5. For folks like ServiceFabric who wish to force you to provide a cancellation token, would we make it so IAsyncEnumerable itself doesn't satisfy the async foreach pattern? -- Answer: no. This can be done by an analyzer. This enforcement doesn't belong in IAsyncEnumerable.

@ljw1004
Copy link
Contributor

ljw1004 commented Sep 26, 2016

Home of IAsyncEnumerable

(I'm summing up some currently-under-discussion design points)...

Where is the home of IAsyncEnumerable? There's one defined in IX (which doesn't quite satisfy our needs because it has cancellation as a parameter of MoveNextAsync). There's one defined in Azure ServiceFabric (which again isn't quite right).

It feels like the interface type IAsyncEnumerable itself should be defined in System or mscorlib or somewhere central, similar to IObservable. But the LINQ library methods should be provided in IX.

Not sure about the extension method IAsyncEnumerable<T> AsAsyncEnumerable<T>(this IEnumerable<T> src).

What do folks @onovotny think about this?

@clairernovotny
Copy link
Member

clairernovotny commented Sep 26, 2016

I think that it makes sense to have the interface itself live somewhere central so that methods from mscorlib or System could potentially return it as a signature with its own internal implementation.

For the home of the LINQ implementation, I would recommend IX. We would of course welcome any and all contributions from Microsoft and other teams. IX would adapt and use whatever the final signature is for the IAsyncEnumerable, so I'm not too concerned about the current implementation having a MoveNext with a cancellation token. We would need to coordinate with partner teams, like EF, on how/when to make that breaking change, but a major version increment should cover that scenario with good justification.

I would think the extension methods should go along with IX as well -- basically, if you want to do IX Async, that's the real library to reference for the main logic.

@MgSam
Copy link

MgSam commented Sep 27, 2016

@ljw1004 I think new extension methods for IAsyncEnumerable should be part of the BCL, not IX/RX. The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.

Personally, I also find RX really hard to use because the documentation is spotty / dated. We shouldn't tie a brand-new language feature to an old library designed for a world in which the language feature didn't yet exist.

@thomaslevesque
Copy link
Member

The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.

And that's even more true of IX

@HaloFour
Copy link

HaloFour commented Sep 27, 2016

@MgSam @thomaslevesque

I completely agree. However I think the reason for that failing is largely due to the lack of attention Rx/Ix received from Microsoft. My opinion has always been that since Rx is so far along with its implementation of LINQ on both push and pull streams that it makes a great deal of sense for Microsoft to take advantage of their own work and build async streams on top of that. If that entailed bringing Rx under the BCL proper, which I imagine would require a bit of a refactoring, I would not be opposed to that. It likely requires it just from a consistency point of view since Rx and TPL diverged quite a bit in common areas.

It makes me sad that Rx seems to be getting more love on the JVM these days than on .NET where it was born. And here we are arguing over the details of reimplementing much of it.

@MgSam
Copy link

MgSam commented Sep 28, 2016

@HaloFour By the time C# 7+1 ships Rx/Ix will likely be over 10 years old. It is filled with outdated paradigms and the web filled with outdated documentation.

I think a much better tact would be adding a new library for IAsyncEnumerable to the BCL, and taking the parts of Rx/Ix that fit with the new paradigm and are worth keeping and adding them in.

  • Rx/Ix never went through the same kind of rigorous API review that BCL methods do. Adding the good parts to the BCL will ensure that they do.
  • Importantly, the new library should also not be called Rx/Ix. Rx/Ix have had too many changes over time, and too much outdated documentation exists. This makes Google searches for the library a minefield of bad info. In addition, many devs that I've spoken to have given up when trying to use Rx, so it likely has a negative connotation for many.

@clairernovotny
Copy link
Member

This thread has raised a several good points about Ix and Rx.

Few thoughts here:

Some of the comments have addressed Rx as opposed to Ix. Ix, despite living in the Rx repository, can version and release independently of Rx. For the purposes of this thread and disccussion, I think it would be helpful to focus on Ix rather than Rx. Of course, we'd be more than happy to have discussions about the current state and future of Rx over on the Rx repo.

There was some disccussion here around the lack of discoverability around Ix. What do you think could help fix that? Ix today already has the System.Interactive.Async NuGet package name to better reflect its core framework nature. Is it something that can be addressed by having more direct references to it from the BCL teams?

When it comes to documentation, we wholeheartedly agree that it could be made better. That's also an area that can and will be improved. We would very interested in what kinds of changes would make the documentation better.

About suggestions that Ix hasn't had the same sort of API review and was designed for a time pre-async and has outdated techniques, I would respectfully disagree. The API of Ix has been carefully thought out by the team and treated with the same level of review that an "in-box" library would have. Furthermore, Ix Async itself was rewritten this summer to use a modern approach based on what System.Linq does today. The resulting code has fewer allocations and is much easier to reason about and debug. Code coverage for its tests is over 95%.

I would suggest that Ix overall is hardened and time-tested with very thorough test coverage and review practices. The location of the code should not matter, but lest that be a blocker, the code could split into its own repo should there be a need.

Overall, the path to implementing whatever shape of the interface resuls from this discussion is far shorter and mostly done aleady. That's a huge boost.

I would also like to state that community contributions are very welcome. In fact, it was a community member that did the initial refactor to introduce async and await to Ix. We're very open to external contributions.

@svick
Copy link
Contributor

svick commented Oct 1, 2016

@onovotny

We would very interested in what kinds of changes would make the documentation better.

My main issue with the documentation of Rx, is that the only API reference on MSDN is horribly out of date.

The situation seems to be even worse for Ix: there's nothing on MSDN or anywhere else, and sometimes methods don't even have XML documentation comments.

For example, when I search for "AsyncEnumerable.Buffer" (or "IAsyncEnumerable Buffer"), I find:

@jnm2
Copy link
Contributor

jnm2 commented Oct 12, 2016

Interesting to compare notes with the concurrent development in JavaScript. http://www.2ality.com/2016/10/asynchronous-iteration.html

@divega
Copy link

divega commented Nov 30, 2016

@onovotny I agree with your comments about Ix being carefully designed and implemented. As you know, we were able to switch vNext of EF Core (currently on our dev branch) to the new version that contains the reimplementation of the operators, and we appreciate the advantages of the new implementations.

That said there are still a few issues we should discuss with regards to the API aligment with the idiomatic patterns used across .NET for async. In particular we believe that query operators that return a single T should use the Async suffix. The fact that they don't is one the reasons we have so far kept the current IAsyncEnumerable<T> type mostly hidden from application developers using EF Core.

In summary, I actually don't have a strong opinion on where the LINQ operators for IAsyncEnumerable<T> should live, but if a future major version of Ix adopted the new definition of the type in the BCL and switched to names that align better with the async pattern that would help a lot.

cc @anpete

@clairernovotny
Copy link
Member

@divega AFAIK, the reason the Async suffix isn't used is so that the operators work with the LINQ syntax conventions. If updating the LINQ syntax to resolve *Async versions is in scope, then that would answer that, but I'm not sure that's under consideration.

@divega
Copy link

divega commented Dec 1, 2016

@onovotny are you referring to VB query comprehension syntax? Otherwise do you know which operators map to C# comprehension syntax for which this would be a problem? As far as I remember VB provides sugar for many more operators but is strict about their return types. If I am remembering correctly, this means the current naming probably doesn't help.

To clarify, I said previously that it was about IAsyncEnumerable<T> operators that return T, but it is actually applicable to all awaitable operators, including things like ToListAsync().

BTW, this could be an interesting criteria to decide where to place operators. Let's say there are two groups:

  • Those that facilitate the consumption of async collections
  • Those that facilitate query composition

I believe the second group is for more advanced/less common scenarios and having to include an extra dependency to use them wouldn't hurt too much.

@clairernovotny
Copy link
Member

clairernovotny commented Dec 1, 2016

I'm specifically referring to the LINQ syntax from, select, groupby, that uses a duck pattern match to find matching methods/overloads. It doesn't care about the return types and works today with Ix Async :)

I agree there seems to be two groups as you describe, and there's no keyword equiv of ToListAsync, FirstAsync or SingleOrDefaultAsync, etc. Those could easily be renamed as part of the version that implements the new interface design.

@divega
Copy link

divega commented Dec 1, 2016

@onovotny ok, I was referring specifically to the ones that can be awaited. For from, select, where, groupby, join, etc. I agree the names should remain the same.

@gafter
Copy link
Member Author

gafter commented Mar 28, 2017

This is now tracked at dotnet/csharplang#43

@gafter gafter closed this as completed Mar 28, 2017
@Andrew-Hanlon
Copy link
Contributor

Here's my take on async enumerators/sequences using C# 7's Task-Like types. The approach could potentially act as a playground for the language feature.

@jnm2
Copy link
Contributor

jnm2 commented Jul 9, 2017

@Andrew-Hanlon AsyncEnumerator<> looks like the approach I've taken so far.
The thing I'm afraid of is that finally blocks (using statements) won't run if the enumeration is incomplete, even though the enumerator is disposed.

@Andrew-Hanlon
Copy link
Contributor

Andrew-Hanlon commented Jul 9, 2017

@jnm2 That's a good point. Making the AsyncEnumerator<> itself disposable could work, and simply set and bury a specific exception on the inner tasks, thereby triggering any inner disposables. I'll add this to the repo. Thanks.

@paulomorgado
Copy link

Shouldn't it be IDIsposableAsync? 😄

@gzak
Copy link

gzak commented Oct 23, 2017

Just wanted to add my vote for enabling await within linq queries. In the proposal, there's a statement: "and this isn't a particularly high-value thing to invest in right now", but I disagree. It would make real-life interaction with APIs sooo much more elegant. There are many situations where you need to make several async API calls, each referring to results from previous API calls. Stringing them all together with nested select or let clauses in a single linq query would make that sort of code much more pleasant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests