System Design
  • Introduction
  • Basics
    • Key Characteristics of Distributed Systems
    • Load Balancing
    • Reverse Proxy
    • Cache
    • Sharding or Data Partitioning
    • Index
    • Redundancy and Replication
    • SQL vs NoSQL
  • Advanced
    • The Difference between SLI, SLO, and SLA
    • Consistent Hashing
    • Server-to-client Communication
    • Data Sharding
  • Database
    • SQL
    • ACID
    • Data Partitioning
  • News Feed
    • Design a News Feed System
    • Timeline creation with sharded data
    • Facebook News Feed
    • Twitter News Feed (Timeline)
    • How does facebook rank news feed?
  • Mint
    • Design Mint
  • Web Crawler
    • Design a web crawler
    • Design a decentralized web crawler
  • TODO
    • TODO
    • Elastic Search
    • Lucene
    • twitter-snowflake
Powered by GitBook
On this page
  • Example
  • Reference

Was this helpful?

  1. Advanced

The Difference between SLI, SLO, and SLA

PreviousSQL vs NoSQLNextConsistent Hashing

Last updated 5 years ago

Was this helpful?

The SLA, SLO, and SLI are related concepts though they're different concepts.

  • SLA or Service Level Agreement is a contract that the service provider promises customers on service availability, performance, etc.

  • SLO or Service Level Objective is a goal that service provider wants to reach.

  • SLI or Service Level Indicator is a measurement the service provider uses for the goal.

So here is the relationship. The service provider needs to collect metrics based on SLI, define thresholds of metrics based on SLO, and monitor the thresholds of metrics so that it won't break SLA. In practical, the SLIs are the metrics in the monitoring system; the SLOs are alerting rules, and the SLAs are the numbers of the monitoring metrics applying to the SLOs.

Usually the SLO and the SLA are similar while the SLO is tighter than the SLA. The SLOs are generally used for internal only, and the SLAs are for external. If a service availability violates the SLO, operations need to react quickly to avoid it breaking SLA, otherwise, the company might need to refund some money to customers.

The SLA, SLO, and SLI are based on such assumption that is the service will not be available 100%. Instead, we guarantee that the system will be available greater than a certain number, for example, 99.5%.

The GCP blog explains it a lot. There are also two books heavily rely on these concepts: , and .

Example

SLA: the P99 response time is less than 1 second. If I don't meet this, I'll refund you. (Used externally with the customers)

SLO: the P99 response time is less than 0.5 second. (Used internally with the team).

SLI: the percentile numbers of the response time metric.

Reference

SLOs, SLIs, SLAs, oh my - CRE life lessons
Site Reliability Engineering
The Site Reliability Workbook
https://enqueuezero.com/the-difference-between-sli-slo-and-sla.html