-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retention doc #3775
Retention doc #3775
Changes from 3 commits
aca7bf8
36bdc1e
aac61d1
0c04077
6d75a96
33255cb
b442038
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,162 @@ title: Retention | |
--- | ||
# Loki Storage Retention | ||
|
||
Retention in Loki is achieved through the [Table Manager](../table-manager/). | ||
Retention in Loki is achieved either through the [Table Manager](#table-manager) or the [Compactor](#Compactor). | ||
|
||
Which one should you use ? | ||
cyriltovena marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Retention through the [Table Manager](../table-manager/) is achieved by relying on the object store TTL feature, and will work for both [boltdb-shipper](../boltdb-shipper) store and chunk/index store. However retention through the [Compactor](../boltdb-shipper#compactor) is supported only with the [boltdb-shipper](../boltdb-shipper) store. | ||
|
||
The [Compactor](#Compactor) retention will become the default and have long term support. While this retention is still **experimental**, it supports more granular retention policies on per tenant and per stream use cases. | ||
|
||
## Compactor | ||
|
||
The [Compactor](../boltdb-shipper#compactor) can deduplicate index entries. It can also apply granular retention. When applying retention with the Compactor, the [Table Manager](../table-manager/) is unnecessary. | ||
|
||
> Run the compactor as a singleton (a single instance). | ||
|
||
Compaction and retention are idempotent. If the compactor restarts, it will continue from where it left off. | ||
|
||
The Compactor loops to apply compaction and retention at every `compaction_interval`, or as soon as possible if running behind. | ||
|
||
The compactor's algorithm to update the index: | ||
|
||
- For each table within each day: | ||
- Compact the table into a single index file. | ||
- Traverse the entire index. Use the tenant configuration to identify and mark chunks that need to be removed. | ||
- Remove marked chunks from the index and save their reference in a file on disk. | ||
- Upload the new modified index files. | ||
|
||
The retention algorithm is applied to the index. Chunks are not deleted while applying the retention algorithm. The chunks will be deleted by the compactor asynchronously when swept. | ||
|
||
Marked chunks will only be deleted after `retention_delete_delay` configured is expired because: | ||
|
||
1. boltdb-shipper indexes are refreshed from the shared store on components using it (querier and ruler) at a specific interval. This means deleting chunks instantly could lead to components still having reference to old chunks and so they could fails to execute queries. Having a delay allows for components to refresh their store and so remove gracefully their reference of those chunks. | ||
|
||
2. It gives you a short period to cancel chunks deletion in case of mistakes. | ||
|
||
Marker files (containing chunks to deletes), should be store on a persistent disk, since this is the sole reference to them anymore. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For lines 36 thru 42, why does the reader of the Loki docs need to know this info? It feels like the internals of implementation to me. Also, if the user configures this, when would #2 ever occur or be relevant? Do users have a way of knowing if there were chunks mistakenly marked for deletion? Another issue: if a mistake was made, how would the user cancel chunk deletion? That is not specified here. Are the marker files visible to the user? How does the user identify the disk to use for marker files? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to explain why Setting a too low delay could lead to unexpected behaviour. The default is 2h and I don't think it should ever be lower. Maybe we should leave it to that, since we don't yet have a way to rebuild back the index. As for 2, someone could have made a wrong configuration, and the compactor could have ran, it's not easy to rebuild the index (we don't have out of the box solution) but if they ever had any super important data, it's not deleted. To cancel the deletion you just need to delete the marker files generated, they are store on the disk of their choice using WDYT ? I still want to express why there's a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now, let's fix some grammar, and consider in the future how to make this better. In the future, we should either explain how to cancel the deletion or provide a link within the docs to the place where it is explained.
cyriltovena marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Retention Configuration | ||
|
||
This compactor configuration example activates retention. | ||
|
||
```yaml | ||
compactor: | ||
working_directory: /data/retention | ||
shared_store: gcs | ||
compaction_interval: 10m | ||
retention_enabled: true | ||
retention_delete_delay: 2h | ||
retention_delete_worker_count: 150 | ||
schema_config: | ||
configs: | ||
- from: "2020-07-31" | ||
index: | ||
period: 24h | ||
prefix: loki_index_ | ||
object_store: gcs | ||
schema: v11 | ||
store: boltdb-shipper | ||
storage_config: | ||
boltdb_shipper: | ||
active_index_directory: /data/index | ||
cache_location: /data/boltdb-cache | ||
shared_store: gcs | ||
gcs: | ||
bucket_name: loki | ||
``` | ||
|
||
> Note that retention is only available if the index period is 24h. | ||
|
||
Set `retention_enabled` to true. Without this, the Compactor will only compact tables. | ||
|
||
Define `schema_config` and `storage_config` to access the storage. | ||
|
||
The index period must be 24h. | ||
|
||
`working_directory` is the directory where marked chunks and temporary tables will be saved. | ||
|
||
`compaction_interval` dictates how often compaction and/or retention is applied. If the Compactor falls behind, compaction and/or retention occur as soon as possible. | ||
|
||
`retention_delete_delay` is the delay after which the compactor will delete marked chunks. | ||
|
||
`retention_delete_worker_count` specifies the maximum quantity of goroutine workers instantiated to delete chunks. | ||
|
||
#### Configuring the retention period | ||
|
||
Retention period is configured within the [`limits_config`](./../../../configuration/#limits_config) configuration section. | ||
|
||
There are two ways of setting retention policies: | ||
|
||
- `retention_period` which is applied globally. | ||
- `retention_stream` which is only applied to chunks matching the selector | ||
|
||
> The minimum retention period is 24h. | ||
|
||
This example configures global retention: | ||
|
||
```yaml | ||
... | ||
limits_config: | ||
retention_period: 744h | ||
retention_stream: | ||
- selector: '{namespace="dev"}' | ||
priority: 1 | ||
period: 24h | ||
per_tenant_override_config: /etc/overrides.yaml | ||
... | ||
``` | ||
|
||
Per tenant retention can be defined using the `/etc/overrides.yaml` files. For example: | ||
|
||
```yaml | ||
overrides: | ||
"29": | ||
retention_period: 168h | ||
retention_stream: | ||
- selector: '{namespace="prod"}' | ||
priority: 2 | ||
period: 336h | ||
- selector: '{container="loki"}' | ||
priority: 1 | ||
period: 72h | ||
"30": | ||
retention_stream: | ||
- selector: '{container="nginx"}' | ||
priority: 1 | ||
period: 24h | ||
``` | ||
|
||
A rule to apply is selected by choosing the first in this list that matches: | ||
|
||
1. If a per-tenant `retention_stream` matches the current stream, the highest priority is picked. | ||
2. If a global `retention_stream` matches the current stream, the highest priority is picked. | ||
3. If a per-tenant `retention_period` is specified, it will be applied. | ||
4. The global `retention_period` will be selected if nothing else matched. | ||
5. If no global `retention_period` is specified, the default value of `744h` (30days) retention is used. | ||
|
||
Stream matching is uses the same syntax as Prometheus label matching: | ||
cyriltovena marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- `=`: Select labels that are exactly equal to the provided string. | ||
- `!=`: Select labels that are not equal to the provided string. | ||
- `=~`: Select labels that regex-match the provided string. | ||
- `!~`: Select labels that do not regex-match the provided string. | ||
|
||
The example configurations will, set these rules: | ||
cyriltovena marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- All tenants except `29` and `30` in the `dev` namespace will have a retention period of `24h` hours. | ||
- All tenants except `29` and `30` that are not in the `dev` namespace will have the retention period of `744h`. | ||
- For tenant `29`: | ||
- All streams except those in the container `loki` or in the namespace `prod` will have retention period of `168h` (1 week). | ||
- All streams in the `prod` namespace will have a retention period of `336h` (2 weeks), even if the container label is `loki`, since the priority of the `prod` rule is higher. | ||
- Streams that have the container label `loki` but are not in the namespace `prod` will have a `72h` retention period. | ||
- For tenant `30`: | ||
- All streams except those having the container label `nginx` will have the global retention period of `744h`, since there is no override specified. | ||
- Streams that have the label `nginx` will have a retention period of `24h`. | ||
|
||
## Table Manager | ||
|
||
In order to enable the retention support, the Table Manager needs to be | ||
configured to enable deletions and a retention period. Please refer to the | ||
[`table_manager_config`](../../../configuration#table_manager_config) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line completely. We don't need the question, since the sentence above it implies that we'll be describing both.