System Sizing — Full Document
Back-of-Napkin System Sizing #
Estimating infrastructure for a social media app with 10M monthly active users. Uses CalcMark’s built-in network, storage, and capacity planning functions.
User Activity #
monthly_users = 10M → 10M
daily_users = 40% of monthly_users → 4MA 40% DAU/MAU ratio is typical for an engaged social platform. Each user posts about twice per week. Express as a weekly total, then convert:
posts_per_user_per_week = 2 → 2
weekly_posts = daily_users * posts_per_user_per_week → 8M
daily_posts = weekly_posts/week over 1 day → 1.14M
daily_posts_napkin = daily_posts as napkin → ~1.1MRead vs Write Ratio #
Social media is read-heavy – users scroll far more than they post. Assume 100 reads per user per day (timeline, profiles, search):
reads_per_user_per_day = 100 → 100
daily_reads = daily_users * reads_per_user_per_day → 400M
read_write_ratio = daily_reads / daily_posts → 350A 350:1 ratio means you invest heavily in caching and read replicas.
Traffic Rates #
Convert daily totals to per-second rates for capacity planning:
read_rate = (daily_reads)/day per second → 4.63K/s
write_rate = (daily_posts)/day per second → 13.227513/sPeak traffic is typically 3x average (lunch hour, evenings, breaking news):
peak_multiplier = 3 → 3
peak_read_rate = read_rate * peak_multiplier → 13.89K/sStorage Requirements #
Average post size (text + metadata, excluding media):
avg_post_size = 2 KB → 2 KB
daily_post_storage = daily_posts * avg_post_size → 2.18 GB
yearly_post_storage = daily_post_storage * 365 → 796 GB30% of posts include an image averaging 500 KB:
daily_media_storage = daily_posts * 30% * 500 KB → 163 GB
yearly_media_storage = daily_media_storage * 365 → 58.3 TBGzip text storage (images are already compressed):
compressed_posts = compress(yearly_post_storage, gzip) → 265 GB
total_yearly_storage = compressed_posts + yearly_media_storage → 58.5 TBDatabase Sizing #
Size the database tier against peak read traffic:
db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer → 4 server
db_primaries = write_rate at 2000 req/s per server with 25% buffer → 1 server
total_db_servers = db_read_replicas + db_primaries → 5 serverStorage I/O Performance #
A typical database query reads ~5 MB of data. Compare storage tiers:
hdd_query_time = seek(hdd) + read 5 MB from hdd → 0.0433 second
ssd_query_time = seek(ssd) + read 5 MB from ssd → 0.0092 second
nvme_query_time = seek(nvme) + read 5 MB from nvme → 0.0014 secondNVMe is 30x faster than HDD – the right choice for hot data.
Network Latency Budget #
Build a latency budget for a single API response:
network_rtt = rtt(regional) → 0.01 second
db_query = nvme_query_time → 0.0014 second
app_processing = 10 ms → 10 millisecond
total_latency = network_rtt + db_query + app_processing → 0.0214 secondUnder 100 ms means users perceive the response as instant.
Bandwidth Requirements #
Average API response is 10 KB. Peak bandwidth per second:
peak_bandwidth = (peak_read_rate over 1 second) * 10 KB → 136 MBCompare to standard network links:
gigabit_capacity = throughput(gigabit) → 125 MB/s
ten_gig_capacity = throughput(ten_gig) → 1.22 GB/sA single gigabit link is not enough. A 10-gig link gives plenty of headroom.
CDN and Caching #
Target a 95% cache hit rate so only 5% of reads hit origin:
cache_hit_target = 95% → 95%
origin_read_rate = read_rate * (1 - cache_hit_target) → 231.481481/sMedia transfer time from origin to CDN edge:
media_transfer = transfer 500 KB across continental ten_gig → 0.0504 secondAvailability and Error Budget #
Three nines (99.9%) is the standard SaaS target:
monthly_downtime = downtime(99.9%, month) → 43.2 minute
yearly_downtime = downtime(99.9%, year) → 8.76 hourBut raw downtime means nothing until you spend it. Assume each deploy takes 5 minutes of downtime (rolling restarts, health checks, draining):
deploy_time = 5 minutes → 5 minute
deploys_per_month = 8 → 8
total_deploy_downtime = deploy_time * deploys_per_month → 40 minute
remaining_error_budget = monthly_downtime - total_deploy_downtime → 3.2 minuteAt three nines, 8 deploys leave only 3.2 minutes of error budget.
Four nines (99.99%) makes scheduled deploys impossible:
strict_monthly_downtime = downtime(99.99%, month) → 4.32 minute
strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime → -35.7 minuteA negative budget means you cannot afford any downtime deploys at four nines. Zero-downtime deploys (blue-green, canary) are a hard requirement.
Summary #
Key metrics rounded for stakeholder conversations:
storage_napkin = total_yearly_storage as napkin → ~58.7 TB
traffic_napkin = daily_reads as napkin → 400M
servers_napkin = total_db_servers as napkin → ~5 server
error_budget_napkin = remaining_error_budget as napkin → 3.2 minuteFor 10M MAU, plan for 5 server database servers, ~58.7 TB of storage per year, and 3.2 minute of error budget after 8 monthly deploys.