Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes log deduplication when mutating Labels using LogQL #5289

Merged
merged 17 commits into from
Feb 4, 2022

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Feb 1, 2022

What this PR does / why we need it:

Labels for each sample and log were used to find and remove duplicates when iterating over them. However since those labels can be mutated by the LogQL engine for optimization or by the users ( label_format, sum()) the deduplication wasn't consistent anymore.

This introduce a new StreamHash function that contains the original stream label hash and can be used for deduplication purpose. This ensure the deduplication is consistent and independent of the queries made.

This PR is built on #5281 which separate the sorting logic to the dedupe logic allowing to have different behaviour, this was required to make sharding work.

Which issue(s) this PR fixes:
Fixes #4500

Example of metric queries with duplicates before and after:

image

The last hour is still buggy because I did not deploy the ingesters during my test.

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena requested a review from a team as a code owner February 1, 2022 12:33
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
pkg/iter/entry_iterator.go Outdated Show resolved Hide resolved
pkg/iter/entry_iterator.go Show resolved Hide resolved
pkg/chunkenc/memchunk.go Show resolved Hide resolved
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work!

pkg/iter/sample_iterator_test.go Outdated Show resolved Hide resolved
@cyriltovena cyriltovena merged commit e3b7415 into grafana:main Feb 4, 2022
KMiller-Grafana pushed a commit to KMiller-Grafana/loki that referenced this pull request Feb 4, 2022
* Changes the iterator interface to include labelshash.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Improve unused args and fixes build

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Implement the new hash in the memchunk.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Add the hash in logproto

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Insert the hash from ingester-querier

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Rename LabelsHash to StreamHash

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Working through fixing tests.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Refactor HeapIterator into merge and sort Iterator.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* lint

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Order alphabetically when ordering samples.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Do not JSON encode the hash for legacy model.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Add more regression tests.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Update CHANGELOG.md

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Review feedback
@cyriltovena cyriltovena mentioned this pull request Feb 24, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deduping with label_format or metrics aggregation can be wrong
5 participants