System Design

Consistency

|700

Availability

Availability Percentages versus Service Downtime
Availability % Downtime per Year Downtime per Month Downtime per Week
90% (1 nine) 36.5 days 72 hours 16.8 hours
99% (2 nines) 3.65 days 7.20 hours 1.68 hours
99.5% (2 nines) 1.83 days 3.60 hours 50.4 minutes
99.9% (3 nines) 8.76 hours 43.8 minutes 10.1 minutes
99.99% (4 nines) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (5 nines) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (6 nines) 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% (7 nines) 3.15 seconds 0.259 seconds 0.0605 seconds

Reliability

Reliability measures how well a system performs its intended operations (functional requirements). We use averages for that (Mean Time to Failure, Mean Time to Repair, etc.)

Availability measures the percentage of time a system accepts requests and responds to clients.

Example 1: A certain system may be 90% available but only reliable 80% of the time.

Example 2: Suppose we consider our “system” the stuff inside a data center (hardware + software). Let’s assume this data center suffers a network failure such that no outsider traffic is coming in and no insider traffic is going out. In this case, instantaneous availability might be zero (because clients cannot reach the service) even though inside the data center, all systems are perfectly functioning (instantaneous reliability 100%).

We use both of them (reliability and availability) in different contexts. For example, storage vendors often quote MTTF for their disks. Most online services use uptime (as a measure of availability) in their SLAs. For example, the uptime of EC2 virtual machines is 99.95%.

Why

The reason for such sharing is to excite the technical community that the company is solving complex problems. They also hope to motivate more people to join their company. Such public blogs can also help to advertise company products to B2B customers. Additionally, such material helps the company train potential future workers independently.

Important Latencies

Component Time (nanoseconds)
L1 cache reference 0.9
L2 cache reference 2.8
L3 cache reference 12.9
Main memory reference 100
Compress 1KB with Snzip 3,000 (3 microseconds)
Read 1 MB sequentially from memory 9,000 (9 microseconds)
Read 1 MB sequentially from SSD 200,000 (200 microseconds)
Round trip within same datacenter 500,000 (500 microseconds)
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD 1,000,000 (1 milliseconds)
Disk seek 4,000,000 (4 milliseconds)
Read 1 MB sequentially from disk 2,000,000 (2 milliseconds)
Send packet SF->NYC 71,000,000 (71 milliseconds)

Types of Requirements

Requirements will have two sub-categories:

  1. Functional requirements: These represent the features a user of the designed system can use. For example, the system will allow users to search for content using the search bar.
  2. Non-functional requirements (NFRs): The non-functional requirements are criteria based on which a system user will consider the system usable. NFR may include requirements like high availability, low latency, scalability, etc.

Further reads

  1. Latency vs Throughput: https://lnkd.in/gSBsmijw
  2. CAP Theorem: https://lnkd.in/gV7NunUD
  3. ACID Transactions: https://lnkd.in/gpQMxV9u
  4. Consistent Hashing: https://lnkd.in/gaCVWBJM
  5. Rate Limiting: https://lnkd.in/gjkrHkGu
  6. Microservices Architecture: https://lnkd.in/gy3kRzep
  7. API Design: https://lnkd.in/ghcbQySg
  8. Strong vs Eventual Consistency: https://lnkd.in/g2ACr56Q
  9. Synchronous vs asynchronous communications: https://lnkd.in/gYZ8Acth
  10. REST vs RPC: https://lnkd.in/gs7htCMG
  11. Batch Processing vs Stream Processing: https://lnkd.in/gBKHzqAe
  12. Fault Tolerance: https://lnkd.in/ggzdZVhM
  13. Consensus Algorithms: https://lnkd.in/gUcVEhUx
  14. Gossip Protocol: https://lnkd.in/gvkckQGY
  15. Serverless Architecture: https://lnkd.in/g3EYA3nz
  16. Service Discovery: https://lnkd.in/gt84khQG
  17. Disaster Recovery: https://lnkd.in/grpEFGfD
  18. Distributed Tracing: https://lnkd.in/ga5FJuH2
  1. Tree: https://lnkd.in/g2v9qf87

  2. To Queue Or Not To Queue: https://lnkd.in/gh5tigTk

  3. Hash Tables: https://lnkd.in/gsmg6XSA

  4. Heaps: https://lnkd.in/g4xAGQa8

  5. Linked List: https://lnkd.in/gN7fUxbJ

  6. Recursion: https://lnkd.in/gvMiZWb8

  7. Tries: https://lnkd.in/gbfm2DVR

  8. Stacks and Overflows: https://lnkd.in/gKhNktj6

  9. Binary Search: https://lnkd.in/gv_rDTUa

  10. Dynamic Programming: https://lnkd.in/g_AYf32w

  11. BFS Traversal - Going Broad In A Graph: https://lnkd.in/gVirya_Q

  12. Introduction To Graph Theory: https://lnkd.in/geXpetJH

  13. Substring problems: https://lnkd.in/gfV2PeeR

  14. DFS Traversal - Deep Dive through a Graph: https://lnkd.in/gy-4mbgN

  15. Finding The Shortest Path: https://lnkd.in/gA4Zz425

Thoughts 🤔 by Soumendra Kumar Sahoo is licensed under CC BY 4.0