Reimplement DB collections for mirrors, repos and snapshots #766

smira · 2018-08-06T22:11:12Z

Collections were relying on keeping in-memory list of all the objects
for any kind of operation which doesn't scale well the number of
objects in the database.

With this rewrite, objects are loaded only on demand which might
be pessimization in some edge cases but should improve performance
and memory footprint significantly.

This doesn't touch PublishedRepoCollection as it relies on list of
all the objects in many places to implement unique checks, proper
cleanup.

Checklist

unit-test added (if change is algorithm)
functional test added/updated (if change is functional)
man page updated (if applicable)
bash completion updated (if applicable)
documentation updated
author name in AUTHORS

codecov · 2018-08-06T22:27:49Z

Codecov Report

Merging #766 into master will increase coverage by 0.34%.
The diff coverage is 89.24%.

@@            Coverage Diff             @@
##           master     #766      +/-   ##
==========================================
+ Coverage   63.71%   64.05%   +0.34%     
==========================================
  Files          50       50              
  Lines        6308     6326      +18     
==========================================
+ Hits         4019     4052      +33     
+ Misses       1797     1778      -19     
- Partials      492      496       +4

Impacted Files	Coverage Δ
deb/snapshot.go	`76.16% <87.14%> (+9.34%)`	⬆️
deb/local.go	`84.48% <90.9%> (-0.84%)`	⬇️
deb/remote.go	`64.11% <90.9%> (+0.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb5985b...699323e. Read the comment docs.

sliverc

LGTM in general just one comment above concerning tests. Maybe it would also be good to do some performance and memory measurements with a large repo.

Just to see what is the memory usage gain and performance hit. The code does get more complicated with this change so I guess it would be good to prove that it is actually decreases memory usage with a acceptable performance hit.

sliverc · 2018-08-07T08:06:40Z

deb/local.go

+		return nil, err
+	}
+
+	r := &LocalRepo{}


this doesn't seem to be covered by a test (https://codecov.io/gh/aptly-dev/aptly/pull/766/diff#D2-210)

I think as this is a success path it would be good to have a unit test for it. There are some similar code bits uncovered in the other collections too.

yep, thanks, I will check it!

I've improved test coverage and added some benchmarks for ByUUID() and ByName(). They exercise "worst case": collection is created and one lookup is performed. On master branch this always leads to loading all the objects.

For ByUUID() in the new approach loading is fast, as object is looked up directly (loading only single element). ByName() still requires scanning whole collection, in the worst case it is as bad as the old approach, on average it's 50% better.

ForEach() doesn't cache objects in memory, so this should help #761 as objects would be GCed as soon as they're scanned. ByUUID() is used a lot to lookup source of published repositories.

On master:

BenchmarkSnapshotCollectionByUUID-8 500 2953932 ns/op 1352168 B/op 30743 allocs/op BenchmarkSnapshotCollectionByName-8 500 2922043 ns/op 1352504 B/op 30747 allocs/op

This branch:

BenchmarkSnapshotCollectionByUUID-8 300000 4492 ns/op 1792 B/op 39 allocs/op BenchmarkSnapshotCollectionByName-8 1000 1433058 ns/op 533994 B/op 14870 allocs/op

Looks good. I guess we can merge this and try to get some feedback of the users with a specific problem in #761 whether this fix helps.

I also had an idea to implement very simple "index" to lookup things by name (which is really frequent lookup): just a key in DB which is like name -> UUID. ByName() could use this index optimistically - if it's missing or doesn't point to right entry, ByName() falls back to full scan.

This could be a way. I think we should see first how these changes are in terms of performance in a actual use case before we make the code even more complex.

See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.

smira requested a review from a team August 6, 2018 22:22

smira added this to the 1.4.0 milestone Aug 6, 2018

sliverc suggested changes Aug 7, 2018

View reviewed changes

smira force-pushed the 761-more-lazy branch from 435a051 to 20a333f Compare August 20, 2018 22:00

smira force-pushed the 761-more-lazy branch from 20a333f to 699323e Compare August 20, 2018 22:08

sliverc approved these changes Aug 21, 2018

View reviewed changes

smira merged commit 72ff71f into master Aug 21, 2018

smira deleted the 761-more-lazy branch August 21, 2018 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement DB collections for mirrors, repos and snapshots #766

Reimplement DB collections for mirrors, repos and snapshots #766

smira commented Aug 6, 2018

codecov bot commented Aug 6, 2018 •

edited

Loading

sliverc left a comment

sliverc Aug 7, 2018

smira Aug 9, 2018

smira Aug 20, 2018

sliverc Aug 21, 2018

smira Aug 21, 2018

sliverc Aug 21, 2018

Reimplement DB collections for mirrors, repos and snapshots #766

Reimplement DB collections for mirrors, repos and snapshots #766

Conversation

smira commented Aug 6, 2018

Checklist

codecov bot commented Aug 6, 2018 • edited Loading

Codecov Report

sliverc left a comment

Choose a reason for hiding this comment

sliverc Aug 7, 2018

Choose a reason for hiding this comment

smira Aug 9, 2018

Choose a reason for hiding this comment

smira Aug 20, 2018

Choose a reason for hiding this comment

sliverc Aug 21, 2018

Choose a reason for hiding this comment

smira Aug 21, 2018

Choose a reason for hiding this comment

sliverc Aug 21, 2018

Choose a reason for hiding this comment

codecov bot commented Aug 6, 2018 •

edited

Loading