Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List.groupBy performance #1345

Closed
eirikm opened this issue May 20, 2016 · 6 comments
Closed

List.groupBy performance #1345

eirikm opened this issue May 20, 2016 · 6 comments

Comments

@eirikm
Copy link

eirikm commented May 20, 2016

List.groupBy is performing really bad compared to the java.util-counterpart.

I've made a testcase to show the magnitude in difference of performance:

package integration.eventstore;

import javaslang.Function1;
import javaslang.collection.HashMap;
import javaslang.collection.List;
import javaslang.collection.Map;
import javaslang.collection.Stream;
import org.junit.Assert;
import org.junit.Test;

import java.util.stream.Collectors;

public class Spike {

    @Test
    public void compare() {
        List<Integer> ns = List.of(100, 1_000, 10_000, 100_000);

        ns.forEach(n -> {
            Stream<Integer> range = Stream.range(1, 10);
            List<Integer> javaslangInts = Stream.repeat(range)
                    .flatMap(is -> is)
                    .take(n)
                    .toList();
            java.util.List<Integer> javaInts = javaslangInts.toJavaList();

            System.out.println("Running group by on " + n + " elements");
            long startJava = System.currentTimeMillis();
            java.util.Map<Integer, java.util.List<Integer>> groupByJava = javaInts.stream().collect(Collectors.groupingBy(i -> i));
            long endJava = System.currentTimeMillis();
            long totalJava = endJava - startJava;
            System.out.println("Total time: ");
            System.out.println("  Java stream: " + totalJava + " ms");

            long startWrapper = System.currentTimeMillis();
            Map<Integer, List<Integer>> groupByWrapper = groupByWrapper(javaslangInts, i -> i);
            long endWrapper = System.currentTimeMillis();
            long totalWrapper = endWrapper - startWrapper;
            System.out.println("  Wrapper: " + totalWrapper + " ms");

            long startJavaslang = System.currentTimeMillis();
            Map<Integer, List<Integer>> groupByJavaslang = javaslangInts.groupBy(i -> i);
            long endJavaslang = System.currentTimeMillis();
            long totalJavaslang = endJavaslang - startJavaslang;
            System.out.println("  Javaslang: " + totalJavaslang + " ms");

            Assert.assertEquals(translateGroupByMap(groupByJava), groupByJavaslang);
            Assert.assertEquals(groupByWrapper, groupByJavaslang);
        });
    }

    public <K, T> Map<K, javaslang.collection.List<T>> groupByWrapper(List<T> list, Function1<T, K> groupByFunc) {
        return translateGroupByMap(list.toJavaStream().collect(Collectors.groupingBy(groupByFunc)));
    }

    public <K, T> Map<K, List<T>> translateGroupByMap(java.util.Map<K, java.util.List<T>> input) {
        return HashMap.ofAll(input).mapValues(l -> List.ofAll(l));
    }
}

A typical result of running this would be:

Running group by on 100 elements
Total time:
Java stream: 6 ms
Wrapper: 21 ms
Javaslang: 7 ms
Running group by on 1000 elements
Total time:
Java stream: 1 ms
Wrapper: 2 ms
Javaslang: 18 ms
Running group by on 10000 elements
Total time:
Java stream: 1 ms
Wrapper: 4 ms
Javaslang: 415 ms
Running group by on 100000 elements
Total time:
Java stream: 7 ms
Wrapper: 21 ms
Javaslang: 28752 ms

kind regards
Eirik Meland

@danieldietrich
Copy link
Contributor

Hi Eirik,

thank you for your investigation. Indeed, Javaslang takes at least quadratic time to group the elements.
We need to understand how Java's Stream is implemented - it seems to me that it is lazy, even the Map that is returned. I will have a look.

Using a wrapper seems to be a viable option!

Greets,
Daniel

@l0rinc
Copy link
Contributor

l0rinc commented May 23, 2016

The problem is mostly that append was used for Stream, instead of prepend and reverse.
Will provide a PR shortly with something like

@Override
default <C> Map<C, Iterator<T>> groupBy(Function<? super T, ? extends C> classifier) {
    Objects.requireNonNull(classifier, "classifier is null");
    if (isEmpty()) {
        return HashMap.empty();
    } else {
        final java.util.Map<C, Collection<T>> results = new java.util.HashMap<>();
        for (T value : this) {
            final C key = classifier.apply(value);
            Collection<T> values = results.get(key);
            if (values == null) {
                values = new ArrayList<T>();
                results.put(key, values);
            }
            values.add(value);
        }
        return HashMap.ofAll(results).map((c, ts) -> Tuple.of(c, List.ofAll(ts).iterator()));
    }
}

which is only <2x slower than the Java impl

@danieldietrich
Copy link
Contributor

Very nice!

@l0rinc
Copy link
Contributor

l0rinc commented May 23, 2016

@eirikm, @danieldietrich, please review :)

@povder
Copy link

povder commented Jun 27, 2016

Hi @danieldietrich! Any reasons why fix for this is not scheduled to be included in a bugfix release?

@danieldietrich
Copy link
Contributor

@povder no - just overseen it. Thanks for the hint!

@danieldietrich danieldietrich modified the milestones: 2.0.3, 2.1.0 Jun 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants