Core Concepts · Quick recall Q&A | Monitoring | DevOps

DevOps / Monitoring / Core Concepts / Quick recall Q&A

Core Concepts · Quick recall Q&A

1 min read

Mid-level2 min read

Rapid overview

Quick recall Q&A
1. Explain the difference between metrics, logs, and traces
2. How do you design alerts that don't cause alert fatigue?
3. What are SLOs and why are they important?
4. How do you troubleshoot a slow request?

Quick recall Q&A

1. Explain the difference between metrics, logs, and traces

Metrics:

Numeric, time-series data
Aggregated, low storage cost
Best for: Alerting, dashboards, trends

Logs:

Detailed event records
High cardinality, high storage
Best for: Debugging, audit trails

Traces:

Request flow across services
Causally connected spans
Best for: Performance analysis, debugging distributed systems

2. How do you design alerts that don't cause alert fatigue?

Alert on symptoms, not causes
Set appropriate thresholds with hysteresis
Use multi-window alerts (burn rate)
Group related alerts
Include actionable runbooks
Review and tune regularly

3. What are SLOs and why are they important?

Service Level Objectives define reliability targets
Based on user-facing metrics (SLIs)
Error budget = allowed failures before impacting SLO
Balance reliability with feature velocity
Guide on-call and incident response priorities

4. How do you troubleshoot a slow request?

Check traces - find slow span
Check metrics - CPU, memory, saturation
Check logs - errors, warnings
Check dependencies - database, external APIs
Profile if needed - identify bottleneck

See also