Availability Patterns
Let's talk about availability patterns of system design.
Availability Patterns
Availability patterns are established architectural approaches used to ensure a system remains operational and accessible to users, even in the face of failures or unexpected events. These patterns focus on minimizing downtime and maintaining a consistent level of service by incorporating redundancy, fault tolerance, and recovery mechanisms into the system's design. They provide a structured way to address potential points of failure and ensure business continuity.
Availability in Numbers
| Duration | Acceptable downtime |
|---|---|
| Downtime per year | 8h 41min 38s |
| Downtime per month | 43m 28s |
| Downtime per week | 10m 4.8s |
| Downtime per day | 1m 26s |
Availability in Parallel vs in Sequence
Overall availability decreases when two components with availability < 100% are in sequence:
Example: If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%.
Availability Patterns
Replication
Replication is an availability pattern that involves having multiple copies of the same data stored in different locations. In the event of a failure, the data can be retrieved from a different location. There are two main types of replication: Master-Master replication and Master-Slave replication.
Master-Master replication: Multiple servers are configured as "masters," each accepting read and write operations. Provides high availability, but requires conflict resolution.
Master-Slave replication: One server is the "master" handling writes, and multiple "slaves" handle reads. If the master fails, a slave is promoted. Simpler to maintain.
SLI, SLO & SLA
SLI, SLO, and SLA are fundamental concepts to measure and manage service availability. They help define the performance indicators, objectives, and contractual agreements for maintaining availability and reliability.