Core Concepts · Quick recall Q&A

1 min read
Mid-level2 min read
Rapid overview

Quick recall Q&A

1. Explain the difference between metrics, logs, and traces

Metrics:

  • Numeric, time-series data
  • Aggregated, low storage cost
  • Best for: Alerting, dashboards, trends

Logs:

  • Detailed event records
  • High cardinality, high storage
  • Best for: Debugging, audit trails

Traces:

  • Request flow across services
  • Causally connected spans
  • Best for: Performance analysis, debugging distributed systems

2. How do you design alerts that don't cause alert fatigue?

  1. Alert on symptoms, not causes
  2. Set appropriate thresholds with hysteresis
  3. Use multi-window alerts (burn rate)
  4. Group related alerts
  5. Include actionable runbooks
  6. Review and tune regularly

3. What are SLOs and why are they important?

  • Service Level Objectives define reliability targets
  • Based on user-facing metrics (SLIs)
  • Error budget = allowed failures before impacting SLO
  • Balance reliability with feature velocity
  • Guide on-call and incident response priorities

4. How do you troubleshoot a slow request?

  1. Check traces - find slow span
  2. Check metrics - CPU, memory, saturation
  3. Check logs - errors, warnings
  4. Check dependencies - database, external APIs
  5. Profile if needed - identify bottleneck

See also