Opentelemetry Collector · How it works

1 min read
Senior2 min read
Rapid overview

How it works

What is the OpenTelemetry Collector?

The OpenTelemetry Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data (metrics, traces, logs). It acts as a central hub for your observability pipeline.

┌─────────────┐     ┌─────────────────────────────────────┐     ┌─────────────┐
│   App 1     │────▶│                                     │────▶│  Jaeger     │
│ (OTel SDK)  │     │                                     │     └─────────────┘
└─────────────┘     │                                     │
                    │      OpenTelemetry Collector        │     ┌─────────────┐
┌─────────────┐     │                                     │────▶│  Prometheus │
│   App 2     │────▶│  Receivers │ Processors │ Exporters │     └─────────────┘
│ (OTel SDK)  │     │                                     │
└─────────────┘     │                                     │     ┌─────────────┐
                    │                                     │────▶│  Grafana    │
┌─────────────┐     │                                     │     │  Cloud      │
│   App 3     │────▶│                                     │     └─────────────┘
│ (OTel SDK)  │     └─────────────────────────────────────┘
└─────────────┘

Collector Architecture

Pipeline Components

# otel-collector-config.yaml
receivers:    # How data gets IN
processors:   # What happens to the data
exporters:    # Where data goes OUT
service:      # Ties it all together

Key Components

Receivers: Accept data from external sources

  • otlp - OpenTelemetry Protocol (gRPC/HTTP)
  • jaeger - Jaeger format
  • zipkin - Zipkin format
  • prometheus - Scrape Prometheus metrics
  • hostmetrics - System metrics (CPU, memory, disk)

Processors: Transform data

  • batch - Batch data for efficiency
  • memory_limiter - Prevent OOM
  • attributes - Add/modify attributes
  • filter - Drop unwanted data
  • tail_sampling - Smart trace sampling

Exporters: Send data to backends

  • otlp - Send to another collector or OTLP backend
  • jaeger - Jaeger backend
  • prometheus - Prometheus Remote Write
  • logging - Console output (debugging)
  • file - Write to files

Basic Configuration

Minimal Setup

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]

Production Configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 4
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - "http://localhost:*"

  # Scrape Prometheus metrics from apps
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 15s
          static_configs:
            - targets: ['localhost:8888']

  # Collect host metrics
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
      memory:
      disk:
      network:

processors:
  # Prevent out of memory
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

  # Batch for efficiency
  batch:
    timeout: 5s
    send_batch_size: 10000

  # Add resource attributes
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
      - key: service.namespace
        value: my-company
        action: upsert

  # Filter out noisy traces
  filter:
    traces:
      span:
        - 'attributes["http.target"] == "/health"'
        - 'attributes["http.target"] == "/metrics"'

  # Add attributes to all spans
  attributes:
    actions:
      - key: deployment.environment
        value: ${ENVIRONMENT}
        action: insert

exporters:
  # Send to Jaeger
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  # Send to Prometheus
  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: myapp

  # Send to Grafana Cloud
  otlphttp:
    endpoint: https://otlp-gateway.grafana.net/otlp
    headers:
      Authorization: "Basic ${GRAFANA_CLOUD_TOKEN}"

  # Debug logging
  logging:
    loglevel: warn

service:
  telemetry:
    logs:
      level: info
    metrics:
      address: 0.0.0.0:8888

  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource, filter, attributes]
      exporters: [jaeger, otlphttp]

    metrics:
      receivers: [otlp, prometheus, hostmetrics]
      processors: [memory_limiter, batch, resource]
      exporters: [prometheus, otlphttp]

    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [otlphttp, logging]

.NET Integration

Setup OpenTelemetry in ASP.NET Core

// Install packages:
// OpenTelemetry.Exporter.OpenTelemetryProtocol
// OpenTelemetry.Extensions.Hosting
// OpenTelemetry.Instrumentation.AspNetCore
// OpenTelemetry.Instrumentation.Http
// OpenTelemetry.Instrumentation.SqlClient

using OpenTelemetry.Logs;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

// Define resource (service identity)
var resourceBuilder = ResourceBuilder.CreateDefault()
    .AddService(
        serviceName: "OrderService",
        serviceVersion: "1.0.0",
        serviceInstanceId: Environment.MachineName)
    .AddAttributes(new Dictionary<string, object>
    {
        ["deployment.environment"] = builder.Environment.EnvironmentName,
        ["team"] = "platform"
    });

// Configure OpenTelemetry
builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("OrderService"))
    .WithTracing(tracing => tracing
        .SetResourceBuilder(resourceBuilder)
        .AddAspNetCoreInstrumentation(options =>
        {
            options.RecordException = true;
            options.Filter = httpContext =>
                !httpContext.Request.Path.StartsWithSegments("/health");
        })
        .AddHttpClientInstrumentation(options =>
        {
            options.RecordException = true;
            options.FilterHttpRequestMessage = request =>
                !request.RequestUri?.Host.Contains("health") ?? true;
        })
        .AddSqlClientInstrumentation(options =>
        {
            options.SetDbStatementForText = true;
            options.RecordException = true;
        })
        .AddSource("OrderService") // Custom ActivitySource
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://otel-collector:4317");
            options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
        }))
    .WithMetrics(metrics => metrics
        .SetResourceBuilder(resourceBuilder)
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddProcessInstrumentation()
        .AddMeter("OrderService") // Custom Meter
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://otel-collector:4317");
        }));

// Configure logging to send to collector
builder.Logging.AddOpenTelemetry(options =>
{
    options.SetResourceBuilder(resourceBuilder);
    options.IncludeFormattedMessage = true;
    options.IncludeScopes = true;
    options.ParseStateValues = true;
    options.AddOtlpExporter(otlp =>
    {
        otlp.Endpoint = new Uri("http://otel-collector:4317");
    });
});

var app = builder.Build();
app.Run();

Custom Instrumentation

using System.Diagnostics;
using System.Diagnostics.Metrics;

public class OrderService
{
    // Custom ActivitySource for traces
    private static readonly ActivitySource ActivitySource = new("OrderService");

    // Custom Meter for metrics
    private static readonly Meter Meter = new("OrderService", "1.0.0");
    private static readonly Counter<long> OrdersCreated = Meter.CreateCounter<long>(
        "orders_created_total",
        description: "Total number of orders created");
    private static readonly Histogram<double> OrderProcessingDuration = Meter.CreateHistogram<double>(
        "order_processing_duration_seconds",
        unit: "s",
        description: "Order processing duration");

    private readonly ILogger<OrderService> _logger;

    public async Task<Order> CreateOrderAsync(CreateOrderRequest request, CancellationToken ct)
    {
        // Start a span
        using var activity = ActivitySource.StartActivity("CreateOrder", ActivityKind.Internal);
        activity?.SetTag("order.user_id", request.UserId);
        activity?.SetTag("order.total", request.Total);

        var stopwatch = Stopwatch.StartNew();

        try
        {
            _logger.LogInformation("Creating order for user {UserId}", request.UserId);

            var order = await ProcessOrderAsync(request, ct);

            // Record metrics
            OrdersCreated.Add(1,
                new KeyValuePair<string, object?>("status", "success"),
                new KeyValuePair<string, object?>("payment_method", request.PaymentMethod));

            activity?.SetTag("order.id", order.Id);
            activity?.SetStatus(ActivityStatusCode.Ok);

            return order;
        }
        catch (Exception ex)
        {
            OrdersCreated.Add(1,
                new KeyValuePair<string, object?>("status", "failure"),
                new KeyValuePair<string, object?>("error_type", ex.GetType().Name));

            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            activity?.RecordException(ex);

            _logger.LogError(ex, "Order creation failed for user {UserId}", request.UserId);
            throw;
        }
        finally
        {
            stopwatch.Stop();
            OrderProcessingDuration.Record(
                stopwatch.Elapsed.TotalSeconds,
                new KeyValuePair<string, object?>("order_type", request.OrderType));
        }
    }
}

Kubernetes Deployment

Collector as DaemonSet (Node-level)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args:
            - --config=/etc/otel-collector-config.yaml
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP
            - containerPort: 8888  # Metrics
          resources:
            limits:
              memory: 500Mi
              cpu: 500m
            requests:
              memory: 200Mi
              cpu: 100m
          volumeMounts:
            - name: config
              mountPath: /etc/otel-collector-config.yaml
              subPath: otel-collector-config.yaml
      volumes:
        - name: config
          configMap:
            name: otel-collector-config
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: observability
spec:
  selector:
    app: otel-collector
  ports:
    - name: otlp-grpc
      port: 4317
    - name: otlp-http
      port: 4318

Collector as Deployment (Centralized)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-collector-gateway
  template:
    metadata:
      labels:
        app: otel-collector-gateway
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args:
            - --config=/etc/otel-collector-config.yaml
          resources:
            limits:
              memory: 2Gi
              cpu: 1000m

Tail-Based Sampling

Sample traces intelligently based on outcome (errors, latency).

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      # Always keep errors
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Keep slow traces
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000

      # Sample 10% of successful traces
      - name: probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

      # Keep traces with specific attributes
      - name: important-users
        type: string_attribute
        string_attribute:
          key: user.tier
          values: [premium, enterprise]

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, batch]
      exporters: [jaeger]

See also