Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[promtail] Add support for journal matches (only conjuntions) #6656

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

carlospeon
Copy link
Contributor

@carlospeon carlospeon commented Jul 11, 2022

What this PR does / why we need it:
Do not process all journal logs. Add an option to add matches to the journal reader.

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@CLAassistant
Copy link

CLAassistant commented Jul 11, 2022

CLA assistant check
All committers have signed the CLA.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@DylanGuedes DylanGuedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey thanks for your contribution, this is fantastic. Added a nit and a question, lmk if they are too confusing.

CHANGELOG.md Outdated Show resolved Hide resolved
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@KMiller-Grafana KMiller-Grafana added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jul 18, 2022
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @carlospeon 👍
Added a few nits for clarity and robustness

docs/sources/clients/promtail/scraping.md Outdated Show resolved Hide resolved
@@ -159,6 +159,9 @@ type JournalTargetConfig struct {
// Path to a directory to read journal entries from. Defaults to system path
// if empty.
Path string `yaml:"path"`

// Journal matches to filter. Disjunctions (+) not supported.
Matches string `yaml:"matches"`
Copy link
Contributor

@dannykopping dannykopping Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pluralised but it only seems to take a single entry.
Instead of using strings.Fields can we just allow a list to be specified in YAML?

Copy link
Contributor Author

@carlospeon carlospeon Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about it before submitted this PR, but if "github.com/coreos/go-systemd/sdjournal" support logical OR (some work can be seen in their source code) it looked a bit weird to me to add a single + character as a list item.

So finally I decided to go with a string and the idea to set here the same arguments that you can use with journalctl, for example "_TRANSPORT=kernel + PRIORITY=3", but I don't have a strong opinion which two options is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail 👍
I think we shouldn't let the implementation detail of a library we use bubble up to the end-user, so perhaps we can specify the matches as a list of conditions (I think this is the most ergonomic / idiomatic "YAML way"), and then provide them to the library in the way it requires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is precisely the implementation detail of the library that promtail uses. It uses and array of matches and this decision limits the way you could set AND/OR matches. Maybe that's the cause that this library doesn't implement it at all taking into account that Journal supports AND/OR matches, and the library is just a binding of journald code.

Journal support filters with logical AND and/or logical OR, so I just though how to support that expression of filters in promtail without bringing a no ideal (IMO) decision of the library to promtail.

Anyway, the change proposed is quite simple and could be easily coded although I think it should not be done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i've taken a look at the manpage to re-familiarise myself with the specifics of filtering.

Here's one example they cite:

   To show all fields emitted by a unit and about the unit, option
   -u/--unit= should be used.  journalctl -u name expands to a
   complex filter similar to

       _SYSTEMD_UNIT=name.service
         + UNIT=name.service _PID=1
         + OBJECT_SYSTEMD_UNIT=name.service _UID=0
         + COREDUMP_UNIT=name.service _UID=0 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1

If a user were to try write a filter expression like the above in a promtail config, might it look like this?

  ...
  matches:
    - _SYSTEMD_UNIT=name.service
    - UNIT=name.service _PID=1
    - OBJECT_SYSTEMD_UNIT=name.service _UID=0
    - COREDUMP_UNIT=name.service _UID=0 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
  ...

In the documentation we could specify that each matches item is a disjunction (logical OR), and the logical AND is combining multiple filters in a single entry separated by space.

It seems the library supports adding disjunctions, but I'm not familiar enough with the code to know how much effort this will be.

I understand that you've put in a lot of effort into this already, so I'm happy to merge this as-is, but in a perfect world I think we should follow the Principle of Least Surprise here to make the filtering work as a user would expect. If journal filtering allows for disjunctions, then we should support it, too.

If you don't have the time to add disjunction support, that's ok - we can always add it as an enhancement for later; I don't want to block this useful feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation we could specify that each matches item is a disjunction (logical OR), and the logical AND is combining multiple filters in a single entry separated by space.

Absolutely. Logical AND have preference and are evaluated first, so IMO that's the way to express the filters. Or going further in the "yaml way" a two levels nested list:

  ...
  matches:
    - _SYSTEMD_UNIT=name.service
    - - UNIT=name.service
      - _PID=1
    - - OBJECT_SYSTEMD_UNIT=name.service
      - _UID=0
    - - COREDUMP_UNIT=name.service
      - _UID=0 
      - MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
  ...

In the short time, in order to user current go-systemd functionality there should be only one list item:

  ...
  matches:
    - COREDUMP_UNIT=name.service _UID=0 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
  ...

or

  ...
  matches:
    - - COREDUMP_UNIT=name.service
      - _UID=0 
      - MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
  ...

which is a bit weird and we must explain it in the docs. So we have a trade off between being ready to support future functionality and a weird (IMO) configuration or have a more simple configuration and no-compatibility with future functionality.

It seems the library supports adding disjunctions, but I'm not familiar enough with the code to know how much effort this will be.

Yes, it is ready to use disjuntions, but the reader does not uses them at all. IMO the library should mimic journalctl add_matches.

My main concern here is that proposing a modification in coreos/go-systemd reader to support disjunctions will need a different JournalReaderConfig struct, maybe replacing the match list with a two levels nested list of matches, and this will break compatibility of anyone using it, like promtail.

I will reach you in your public slack so we can talk about next steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlospeon heads up, I'm on PTO from tomorrow and back on the 22nd Aug.
I'm happy for this to be merged as-is once all the comments are addressed (except for this one)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think given the complexity of this we can stick with what we have,
Let's leave this comment unresolved so we can reference it later.
Thank you for the great discussion!

@@ -159,6 +159,9 @@ type JournalTargetConfig struct {
// Path to a directory to read journal entries from. Defaults to system path
// if empty.
Path string `yaml:"path"`

// Journal matches to filter. Disjunctions (+) not supported.
Matches string `yaml:"matches"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail 👍
I think we shouldn't let the implementation detail of a library we use bubble up to the end-user, so perhaps we can specify the matches as a list of conditions (I think this is the most ergonomic / idiomatic "YAML way"), and then provide them to the library in the way it requires.

docs/sources/clients/promtail/scraping.md Outdated Show resolved Hide resolved
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlospeon feel free to engage me in our public Slack if you want to discuss this in a higher-bandwidth way.

I'm happy to proceed with this as-is, but I've shared my thoughts below.

@@ -159,6 +159,9 @@ type JournalTargetConfig struct {
// Path to a directory to read journal entries from. Defaults to system path
// if empty.
Path string `yaml:"path"`

// Journal matches to filter. Disjunctions (+) not supported.
Matches string `yaml:"matches"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i've taken a look at the manpage to re-familiarise myself with the specifics of filtering.

Here's one example they cite:

   To show all fields emitted by a unit and about the unit, option
   -u/--unit= should be used.  journalctl -u name expands to a
   complex filter similar to

       _SYSTEMD_UNIT=name.service
         + UNIT=name.service _PID=1
         + OBJECT_SYSTEMD_UNIT=name.service _UID=0
         + COREDUMP_UNIT=name.service _UID=0 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1

If a user were to try write a filter expression like the above in a promtail config, might it look like this?

  ...
  matches:
    - _SYSTEMD_UNIT=name.service
    - UNIT=name.service _PID=1
    - OBJECT_SYSTEMD_UNIT=name.service _UID=0
    - COREDUMP_UNIT=name.service _UID=0 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1
  ...

In the documentation we could specify that each matches item is a disjunction (logical OR), and the logical AND is combining multiple filters in a single entry separated by space.

It seems the library supports adding disjunctions, but I'm not familiar enough with the code to know how much effort this will be.

I understand that you've put in a lot of effort into this already, so I'm happy to merge this as-is, but in a perfect world I think we should follow the Principle of Least Surprise here to make the filtering work as a user would expect. If journal filtering allows for disjunctions, then we should support it, too.

If you don't have the time to add disjunction support, that's ok - we can always add it as an enhancement for later; I don't want to block this useful feature.

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @carlospeon!
Please fix the conflicts and then we can merge this

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kavirajk kavirajk merged commit 171a09b into grafana:main Aug 12, 2022
lxwzy pushed a commit to lxwzy/loki that referenced this pull request Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants