Consistency
Availability
Availability Percentages versus Service Downtime | |||
---|---|---|---|
Availability % | Downtime per Year | Downtime per Month | Downtime per Week |
90% (1 nine) | 36.5 days | 72 hours | 16.8 hours |
99% (2 nines) | 3.65 days | 7.20 hours | 1.68 hours |
99.5% (2 nines) | 1.83 days | 3.60 hours | 50.4 minutes |
99.9% (3 nines) | 8.76 hours | 43.8 minutes | 10.1 minutes |
99.99% (4 nines) | 52.56 minutes | 4.32 minutes | 1.01 minutes |
99.999% (5 nines) | 5.26 minutes | 25.9 seconds | 6.05 seconds |
99.9999% (6 nines) | 31.5 seconds | 2.59 seconds | 0.605 seconds |
99.99999% (7 nines) | 3.15 seconds | 0.259 seconds | 0.0605 seconds |
Reliability
Reliability measures how well a system performs its intended operations (functional requirements). We use averages for that (Mean Time to Failure, Mean Time to Repair, etc.)
Availability measures the percentage of time a system accepts requests and responds to clients.
Example 1: A certain system may be 90% available but only reliable 80% of the time.
Example 2: Suppose we consider our “system” the stuff inside a data center (hardware + software). Let’s assume this data center suffers a network failure such that no outsider traffic is coming in and no insider traffic is going out. In this case, instantaneous availability might be zero (because clients cannot reach the service) even though inside the data center, all systems are perfectly functioning (instantaneous reliability 100%).
We use both of them (reliability and availability) in different contexts. For example, storage vendors often quote MTTF for their disks. Most online services use uptime (as a measure of availability) in their SLAs. For example, the uptime of EC2 virtual machines is 99.95%.
The reason for such sharing is to excite the technical community that the company is solving complex problems. They also hope to motivate more people to join their company. Such public blogs can also help to advertise company products to B2B customers. Additionally, such material helps the company train potential future workers independently.
Important Latencies
Component | Time (nanoseconds) |
---|---|
L1 cache reference | 0.9 |
L2 cache reference | 2.8 |
L3 cache reference | 12.9 |
Main memory reference | 100 |
Compress 1KB with Snzip | 3,000 (3 microseconds) |
Read 1 MB sequentially from memory | 9,000 (9 microseconds) |
Read 1 MB sequentially from SSD | 200,000 (200 microseconds) |
Round trip within same datacenter | 500,000 (500 microseconds) |
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD | 1,000,000 (1 milliseconds) |
Disk seek | 4,000,000 (4 milliseconds) |
Read 1 MB sequentially from disk | 2,000,000 (2 milliseconds) |
Send packet SF->NYC | 71,000,000 (71 milliseconds) |
Types of Requirements
Requirements will have two sub-categories:
- Functional requirements: These represent the features a user of the designed system can use. For example, the system will allow users to search for content using the search bar.
- Non-functional requirements (NFRs): The non-functional requirements are criteria based on which a system user will consider the system usable. NFR may include requirements like high availability, low latency, scalability, etc.
Further reads
- Latency vs Throughput: https://lnkd.in/gSBsmijw
- CAP Theorem: https://lnkd.in/gV7NunUD
- ACID Transactions: https://lnkd.in/gpQMxV9u
- Consistent Hashing: https://lnkd.in/gaCVWBJM
- Rate Limiting: https://lnkd.in/gjkrHkGu
- Microservices Architecture: https://lnkd.in/gy3kRzep
- API Design: https://lnkd.in/ghcbQySg
- Strong vs Eventual Consistency: https://lnkd.in/g2ACr56Q
- Synchronous vs asynchronous communications: https://lnkd.in/gYZ8Acth
- REST vs RPC: https://lnkd.in/gs7htCMG
- Batch Processing vs Stream Processing: https://lnkd.in/gBKHzqAe
- Fault Tolerance: https://lnkd.in/ggzdZVhM
- Consensus Algorithms: https://lnkd.in/gUcVEhUx
- Gossip Protocol: https://lnkd.in/gvkckQGY
- Serverless Architecture: https://lnkd.in/g3EYA3nz
- Service Discovery: https://lnkd.in/gt84khQG
- Disaster Recovery: https://lnkd.in/grpEFGfD
- Distributed Tracing: https://lnkd.in/ga5FJuH2
-
- Horizontal vs Vertical Scaling: https://www.spiceworks.com/tech/cloud/articles/horizontal-vs-vertical-cloud-scaling/
- Content Delivery Network (CDN): https://www.cloudflare.com/learning/cdn/what-is-a-cdn/
- Domain Name System (DNS): https://www.cloudflare.com/learning/dns/what-is-dns/
- Caching: https://medium.com/must-know-computer-science/system-design-caching-acbd1b02ca01
- Distributed Caching: https://redis.com/glossary/distributed-caching/
- Load Balancing: https://aws.amazon.com/what-is/load-balancing/
- SQL vs NoSQL: https://www.integrate.io/blog/the-sql-vs-nosql-difference/
- API Gateway: https://www.nginx.com/learn/api-gateway/
- Distributed Locking: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
- Checksum: https://www.lifewire.com/what-does-checksum-mean-2625825
-
Tree: https://lnkd.in/g2v9qf87
-
To Queue Or Not To Queue: https://lnkd.in/gh5tigTk
-
Hash Tables: https://lnkd.in/gsmg6XSA
-
Heaps: https://lnkd.in/g4xAGQa8
-
Linked List: https://lnkd.in/gN7fUxbJ
-
Recursion: https://lnkd.in/gvMiZWb8
-
Tries: https://lnkd.in/gbfm2DVR
-
Stacks and Overflows: https://lnkd.in/gKhNktj6
-
Binary Search: https://lnkd.in/gv_rDTUa
-
Dynamic Programming: https://lnkd.in/g_AYf32w
-
BFS Traversal - Going Broad In A Graph: https://lnkd.in/gVirya_Q
-
Introduction To Graph Theory: https://lnkd.in/geXpetJH
-
Substring problems: https://lnkd.in/gfV2PeeR
-
DFS Traversal - Deep Dive through a Graph: https://lnkd.in/gy-4mbgN
-
Finding The Shortest Path: https://lnkd.in/gA4Zz425