Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrape_config_files doesn't work #34786

Open
mike9421 opened this issue Aug 21, 2024 · 5 comments
Open

scrape_config_files doesn't work #34786

mike9421 opened this issue Aug 21, 2024 · 5 comments
Assignees
Labels
bug Something isn't working receiver/prometheus Prometheus receiver

Comments

@mike9421
Copy link

Component(s)

receiver/prometheus

What happened?

Description

I want to use scrape_config_files to add prometheus job. However, I found that the prometheus job defined in this way are not effective, even though the OTel configuration has already been applied. For details on applying, see applyConfig

Steps to Reproduce

1、add the scrape_config_files in the configuration of prometheus receiver, with the file path specified as scrape_files.yaml.
2、add the appropriate scrape_configs entries to scrape_files.yaml.
3、start OTel

Expected Result

The prometheus job in scrape_files.yaml are executing correctly.

Actual Result

The prometheus job does not work, and the OTel logs do not display the added job.

Collector version

v0.95

Environment information

Environment

OS: darwin/arm64
Compiler: go1.21.9

The same issue occurs when deployed on Kubernetes (k8s)

OpenTelemetry Collector configuration

##### otel.yaml
exporters:
  otlphttp/metric:
    metrics_endpoint: http://localhost:8080
    retry_on_failure:
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
      multiplier: 2
      randomization_factor: 0.5
extensions:
  pprof:
  health_check:
    endpoint: 0.0.0.0:13133
  memory_ballast:
      size_mib: "256"
processors:
  batch/metrics:
    send_batch_size: 500
    send_batch_max_size: 500
    timeout: 5s
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
  cumulativetodelta:
receivers:
  prometheus:
    trim_metric_suffixes: false
    config:
      scrape_config_files:
        - /scrape_files.yaml
      scrape_configs:
        - job_name: 'otel-scrape-self-test'
          scrape_interval: 10s
          scrape_timeout: 10s
          metrics_path: '/metrics'
          static_configs:
            - targets: ['0.0.0.0:8888']
service:
  telemetry:
    metrics:
      level: detailed
      address: 0.0.0.0:8888
  extensions:
  - pprof
  - health_check
  - memory_ballast
  pipelines:
    metrics/prometheus:
      receivers:
      - prometheus
      processors:
      - memory_limiter
      - cumulativetodelta
      - batch/metrics
      exporters:
      - otlphttp/metric

##### scrape_files.yaml
scrape_configs:
- job_name: 'otel-scrape-k8s-apiserver'
  scrape_interval: 10s
  scrape_timeout: 10s
  body_size_limit: 50MB
  follow_redirects: true
  scheme: https
  metrics_path: /metrics
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - default
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: kubernetes
      replacement: $1
      action: keep
    - action: replace
      target_label: otel_pod
      replacement: otel_1
- job_name: 'otel-scrape-self'
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: '/metrics'
  static_configs:
    - targets: ['0.0.0.0:9999']

Log output

2024-08-21T18:49:59.184+0800    info    service@v0.95.0/service.go:143  Starting otelcontribcol...      {"Version": "0.95.0-dev", "NumCPU": 8}
2024-08-21T18:49:59.184+0800    info    extensions/extensions.go:34     Starting extensions...
2024-08-21T18:49:59.184+0800    info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "pprof"}
2024-08-21T18:49:59.185+0800    info    pprofextension/pprofextension.go:60     Starting net/http/pprof server  {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2024-08-21T18:49:59.185+0800    info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "pprof"}
2024-08-21T18:49:59.185+0800    info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "memory_ballast"}
2024-08-21T18:49:59.187+0800    info    ballastextension@v0.95.0/memory_ballast.go:41   Setting memory ballast  {"kind": "extension", "name": "memory_ballast", "MiBs": 256}
2024-08-21T18:49:59.188+0800    info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "memory_ballast"}
2024-08-21T18:49:59.188+0800    info    extensions/extensions.go:37     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-08-21T18:49:59.188+0800    info    healthcheckextension/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13134","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-08-21T18:49:59.189+0800    warn    internal@v0.95.0/warning.go:42  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.        {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-08-21T18:49:59.189+0800    info    extensions/extensions.go:52     Extension started.      {"kind": "extension", "name": "health_check"}
2024-08-21T18:49:59.190+0800    info    prometheusreceiver/metrics_receiver.go:240      Starting discovery manager      {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-08-21T18:50:04.422+0800    info    prometheusreceiver/metrics_receiver.go:231      Scrape job added        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "otel-scrape-self-test"}
2024-08-21T18:50:04.422+0800    info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-08-21T18:50:04.422+0800    info    service@v0.95.0/service.go:169  Everything is ready. Begin running and processing data.
2024-08-21T18:50:04.422+0800    warn    localhostgate/featuregate.go:63 The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.       {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-08-21T18:50:04.422+0800    info    prometheusreceiver/metrics_receiver.go:282      Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-08-21T18:50:14.480+0800    info    exporterhelper/retry_sender.go:118      Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/metric", "error": "failed to make an HTTP request: Post \"https://otel-inner.yuanfudao.biz/metric/otel/v1\": dial tcp: lookup otel-inner.yuanfudao.biz: no such host", "interval": "2.623812628s"}

Additional context

No response

@mike9421 mike9421 added bug Something isn't working needs triage New item requiring triage labels Aug 21, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Aug 21, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

My best guess is that we don't apply the config to the discovery manager here:

for _, scrapeConfig := range cfg.ScrapeConfigs {

We iterate over cfg.ScrapeConfigs, rather than cfg.GetScrapeConfigs(), which incorporates configuration from scrape_config_files. We should update most usages of cfg.ScrapeConfigs to use the newer function

@dashpole dashpole added help wanted Extra attention is needed and removed needs triage New item requiring triage labels Aug 21, 2024
@mike9421 mike9421 changed the title scrape_config_files does work scrape_config_files doesn't work Aug 22, 2024
@mike9421
Copy link
Author

My best guess is that we don't apply the config to the discovery manager here:

for _, scrapeConfig := range cfg.ScrapeConfigs {

We iterate over cfg.ScrapeConfigs, rather than cfg.GetScrapeConfigs(), which incorporates configuration from scrape_config_files. We should update most usages of cfg.ScrapeConfigs to use the newer function

Thank you for your answer. It worked after adding cfg.ScrapeConfigs, _ = (*config.Config)(cfg).GetScrapeConfigs() before the code mentioned above.

The only drawback is that OTel cannot replace the environment variables in the scrape_files.yaml.

@bacherfl
Copy link
Contributor

If this issue is still available, I'd be happy to work on a fix for that

@dashpole dashpole removed the help wanted Extra attention is needed label Aug 27, 2024
@dashpole
Copy link
Contributor

It is all yours @bacherfl. Please cc me on the PR and i'll review

mx-psi pushed a commit that referenced this issue Sep 11, 2024
…fig_files` (#34897)

**Description:** This PR fixes a bug in the prometheus receiver where
the scrape configs provided via `scrape_config_files` were not applied.
Using `scrape_config_files` instead of providing the scrape configs
directly does come with some limitations regarding the use of env vars,
as also mentioned in
#34786 (comment).

**Link to tracking Issue:** #34786

**Testing:** Added unit tests

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: David Ashpole <dashpole@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

3 participants