From b5af3acb0a7e956ce431e6a1ea7d8ab743db7eba Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Mon, 22 Feb 2021 17:47:56 -0500 Subject: [PATCH 1/6] distributor overview --- docs/sources/architecture/_index.md | 2 ++ docs/sources/architecture/distributor.md | 39 ++++++++++++++++++++++++ 2 files changed, 41 insertions(+) create mode 100644 docs/sources/architecture/distributor.md diff --git a/docs/sources/architecture/_index.md b/docs/sources/architecture/_index.md index ce8683679428..63089507c0a3 100644 --- a/docs/sources/architecture/_index.md +++ b/docs/sources/architecture/_index.md @@ -64,6 +64,8 @@ and to ensure that it is within the configured tenant (or global) limits. Valid chunks are then split into batches and sent to multiple [ingesters](#ingester) in parallel. +For more information, see the [Distributor](./distributor) page. + #### Hashing Distributors use consistent hashing in conjunction with a configurable diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md new file mode 100644 index 000000000000..02d91e5055d2 --- /dev/null +++ b/docs/sources/architecture/distributor.md @@ -0,0 +1,39 @@ +--- +title: Distributor +weight: 1000 +--- +# Distributor Component + +This document builds upon the information in the [Loki Architecture](./) page. + +## Where does it live? + +The distributor is the first component on Loki's write path. It's responsible for validating, preprocessing, and applying a subset of rate limiting to incoming data before sending it to the ingester component. + +## What does it do? + +### Validation + +The first step the distributor takes is to ensure that all incoming data is according to specification. This includes things like checking that the labels are valid Prometheus labels as well as ensuring the timestamps aren't too old or too new or the log lines aren't too long. + +### Preprocessing + +Currently the only way the distributor mutates incoming data is by normalizing labels. What this means is making `{foo="bar", bazz="buzz"}` equivalent to `{bazz="buzz", foo="bar"}`, or in other words, ensuring that the order of labels doesn't matter. This allows Loki to cache and hash them deterministically. + +### Rate limiting + +The distributor can also rate limit incoming logs based on the maximum per-tenant bitrate. It does this by checking a per tenant limit and dividing it by the current number of distributors. This allows the rate limit to be specified per tenant at the cluster level and enables us to scale the distributors up or down and have the per-distributor limit adjust accordingly. For instance, say we have 10 distributors and tenant A has a 10MB rate limit. Each distributor will allow up to 1MB/second before limiting. Now, say another large tenant joins the cluster and we need to spin up 10 more distributors. The now 20 distributors will adjust their rate limits for tenant A to `(10MB / 20 distributors) = 500KB/s`! This is how global limits allow much simpler and safer operation of the Loki cluster. + +**Note: The distributor uses the `ring` component under the hood to register itself amongst it's peers and get the total number of active distributors** + +### Forwarding + +Once the distributor has performed all of it's validation duties, it forwards data to the ingester component which is ultimately responsible for acknowledging the write. + +#### Replication factor + +In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication_factor_ of them. Generally, this is `3`. This helps ensure that even if an ingester or two fails, we won't lose data. Loosely, for each label set (called _series_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will error if less than a _quorum_ of writes succeed. A quorum is defined as `(replication_factor / 2) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write can be retried. + +## Why does it deserve it's own component? + +Notably, the distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path. The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks (either malicious or not) that could otherwise overload the ingesters. They act like the bouncer at the front door, ensuring everyong is appropriately dressed and has an invitation. From 441c1871ef251b58ff97ff4accf29fcea3571ce3 Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Wed, 24 Feb 2021 09:00:26 -0500 Subject: [PATCH 2/6] edge case edits --- docs/sources/architecture/distributor.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md index 02d91e5055d2..dad95ebb06fb 100644 --- a/docs/sources/architecture/distributor.md +++ b/docs/sources/architecture/distributor.md @@ -32,7 +32,9 @@ Once the distributor has performed all of it's validation duties, it forwards da #### Replication factor -In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication_factor_ of them. Generally, this is `3`. This helps ensure that even if an ingester or two fails, we won't lose data. Loosely, for each label set (called _series_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will error if less than a _quorum_ of writes succeed. A quorum is defined as `(replication_factor / 2) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write can be retried. +In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication_factor_ of them. Generally, this is `3`. This helps ensure that even if an ingester or two fails, we won't lose data. Loosely, for each label set (called _series_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will error if less than a _quorum_ of writes succeed. A quorum is defined as `floor(replication_factor / 2) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write can be retried. + +**Caveat: There's also an edge case where we acknowledge a write if 2 of the three ingesters do which means that in the case where 2 writes succeed, we can only lose one ingester before suffering data loss.** ## Why does it deserve it's own component? From 811767cc38f8cfc531a5d6409875b170681365c2 Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Thu, 25 Feb 2021 09:27:24 -0500 Subject: [PATCH 3/6] Update docs/sources/architecture/distributor.md Co-authored-by: Ed Welch --- docs/sources/architecture/distributor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md index dad95ebb06fb..c311f55bb789 100644 --- a/docs/sources/architecture/distributor.md +++ b/docs/sources/architecture/distributor.md @@ -32,7 +32,7 @@ Once the distributor has performed all of it's validation duties, it forwards da #### Replication factor -In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication_factor_ of them. Generally, this is `3`. This helps ensure that even if an ingester or two fails, we won't lose data. Loosely, for each label set (called _series_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will error if less than a _quorum_ of writes succeed. A quorum is defined as `floor(replication_factor / 2) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write can be retried. +In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication_factor_ of them. Generally, this is `3`. Replication allows for ingester restarts and rollouts without failing writes and adds additional protection from data loss for some scenarios. Loosely, for each label set (called a _stream_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will error if less than a _quorum_ of writes succeed. A quorum is defined as `floor(replication_factor / 2) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write can be retried. **Caveat: There's also an edge case where we acknowledge a write if 2 of the three ingesters do which means that in the case where 2 writes succeed, we can only lose one ingester before suffering data loss.** From 51b1ebfea46e618f2378bc39a98d620a8522274c Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Thu, 25 Feb 2021 09:28:34 -0500 Subject: [PATCH 4/6] suggestions --- docs/sources/architecture/distributor.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md index 02d91e5055d2..6c38b953db8e 100644 --- a/docs/sources/architecture/distributor.md +++ b/docs/sources/architecture/distributor.md @@ -18,13 +18,13 @@ The first step the distributor takes is to ensure that all incoming data is acco ### Preprocessing -Currently the only way the distributor mutates incoming data is by normalizing labels. What this means is making `{foo="bar", bazz="buzz"}` equivalent to `{bazz="buzz", foo="bar"}`, or in other words, ensuring that the order of labels doesn't matter. This allows Loki to cache and hash them deterministically. +Currently the only way the distributor mutates incoming data is by normalizing labels. What this means is making `{foo="bar", bazz="buzz"}` equivalent to `{bazz="buzz", foo="bar"}`, or in other words, sorting the labels. This allows Loki to cache and hash them deterministically. ### Rate limiting The distributor can also rate limit incoming logs based on the maximum per-tenant bitrate. It does this by checking a per tenant limit and dividing it by the current number of distributors. This allows the rate limit to be specified per tenant at the cluster level and enables us to scale the distributors up or down and have the per-distributor limit adjust accordingly. For instance, say we have 10 distributors and tenant A has a 10MB rate limit. Each distributor will allow up to 1MB/second before limiting. Now, say another large tenant joins the cluster and we need to spin up 10 more distributors. The now 20 distributors will adjust their rate limits for tenant A to `(10MB / 20 distributors) = 500KB/s`! This is how global limits allow much simpler and safer operation of the Loki cluster. -**Note: The distributor uses the `ring` component under the hood to register itself amongst it's peers and get the total number of active distributors** +**Note: The distributor uses the `ring` component under the hood to register itself amongst it's peers and get the total number of active distributors. This is a different "key" than the ingesters use in the ring and comes from the distributor's own [ring config](../configuration#distributor_config).** ### Forwarding From 2ec348b897dddd809539671a4b3e0cdf04d9f3bb Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Thu, 25 Feb 2021 10:40:54 -0500 Subject: [PATCH 5/6] further pr feedback --- docs/sources/architecture/distributor.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md index ae94333da358..d37fcc01c1c8 100644 --- a/docs/sources/architecture/distributor.md +++ b/docs/sources/architecture/distributor.md @@ -8,7 +8,7 @@ This document builds upon the information in the [Loki Architecture](./) page. ## Where does it live? -The distributor is the first component on Loki's write path. It's responsible for validating, preprocessing, and applying a subset of rate limiting to incoming data before sending it to the ingester component. +The distributor is the first component on Loki's write path downstream from any gateways providing auth or load balancing. It's responsible for validating, preprocessing, and applying a subset of rate limiting to incoming data before sending it to the ingester component. It is important that a load balancer sits in front of the distributor in order to properly balance traffic to them. ## What does it do? @@ -36,6 +36,8 @@ In order to mitigate the chance of _losing_ data on any single ingester, the dis **Caveat: There's also an edge case where we acknowledge a write if 2 of the three ingesters do which means that in the case where 2 writes succeed, we can only lose one ingester before suffering data loss.** +Replication factor isn't the only thing that prevents data loss, though, and arguably these days it's main purpose it to allow writes to continue uninterrupted during rollouts & restarts. The `ingester` component now includes a [write ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging) which persists incoming writes to disk to ensure they're not lost as long as the disk isn't corrupted. The complementary nature of replication factor and WAL ensures data isn't lost unless there are significant failures in both mechanisms (i.e. multiple ingesters die and lose/corrupt their disks). + ## Why does it deserve it's own component? -Notably, the distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path. The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks (either malicious or not) that could otherwise overload the ingesters. They act like the bouncer at the front door, ensuring everyong is appropriately dressed and has an invitation. +Notably, the distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path. The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks (either malicious or not) that could otherwise overload the ingesters. They act like the bouncer at the front door, ensuring everyong is appropriately dressed and has an invitation. It also allows us to fan-out writes according to our replication factor as described earlier. From 8ce95f08cfa3950e26a04db6ed5eebfd2711dbb3 Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Thu, 25 Feb 2021 17:37:51 -0500 Subject: [PATCH 6/6] typo --- docs/sources/architecture/distributor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sources/architecture/distributor.md b/docs/sources/architecture/distributor.md index d37fcc01c1c8..45c0182ceb2f 100644 --- a/docs/sources/architecture/distributor.md +++ b/docs/sources/architecture/distributor.md @@ -40,4 +40,4 @@ Replication factor isn't the only thing that prevents data loss, though, and arg ## Why does it deserve it's own component? -Notably, the distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path. The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks (either malicious or not) that could otherwise overload the ingesters. They act like the bouncer at the front door, ensuring everyong is appropriately dressed and has an invitation. It also allows us to fan-out writes according to our replication factor as described earlier. +Notably, the distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path. The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks (either malicious or not) that could otherwise overload the ingesters. They act like the bouncer at the front door, ensuring everyone is appropriately dressed and has an invitation. It also allows us to fan-out writes according to our replication factor as described earlier.