Core Concepts · Quick recall Q&A
1 min readRapid overview
Quick recall Q&A
1. Explain the difference between metrics, logs, and traces
Metrics:
- Numeric, time-series data
- Aggregated, low storage cost
- Best for: Alerting, dashboards, trends
Logs:
- Detailed event records
- High cardinality, high storage
- Best for: Debugging, audit trails
Traces:
- Request flow across services
- Causally connected spans
- Best for: Performance analysis, debugging distributed systems
2. How do you design alerts that don't cause alert fatigue?
- Alert on symptoms, not causes
- Set appropriate thresholds with hysteresis
- Use multi-window alerts (burn rate)
- Group related alerts
- Include actionable runbooks
- Review and tune regularly
3. What are SLOs and why are they important?
- Service Level Objectives define reliability targets
- Based on user-facing metrics (SLIs)
- Error budget = allowed failures before impacting SLO
- Balance reliability with feature velocity
- Guide on-call and incident response priorities
4. How do you troubleshoot a slow request?
- Check traces - find slow span
- Check metrics - CPU, memory, saturation
- Check logs - errors, warnings
- Check dependencies - database, external APIs
- Profile if needed - identify bottleneck