System Sizing — Full Document

Back-of-Napkin System Sizing #

Estimating infrastructure for a social media app with 10M monthly active users. Uses CalcMark’s built-in network, storage, and capacity planning functions.

User Activity #

monthly_users = 10M → 10M
daily_users = 40% of monthly_users → 4M

A 40% DAU/MAU ratio is typical for an engaged social platform. Each user posts about twice per week. Express as a weekly total, then convert:

posts_per_user_per_week = 2 → 2
weekly_posts = daily_users * posts_per_user_per_week → 8M
daily_posts = weekly_posts/week over 1 day → 1.14M
daily_posts_napkin = daily_posts as napkin → ~1.1M

Read vs Write Ratio #

Social media is read-heavy – users scroll far more than they post. Assume 100 reads per user per day (timeline, profiles, search):

reads_per_user_per_day = 100 → 100
daily_reads = daily_users * reads_per_user_per_day → 400M
read_write_ratio = daily_reads / daily_posts → 350

A 350:1 ratio means you invest heavily in caching and read replicas.

Traffic Rates #

Convert daily totals to per-second rates for capacity planning:

read_rate = (daily_reads)/day per second → 4.63K/s
write_rate = (daily_posts)/day per second → 13.227513/s

Peak traffic is typically 3x average (lunch hour, evenings, breaking news):

peak_multiplier = 3 → 3
peak_read_rate = read_rate * peak_multiplier → 13.89K/s

Storage Requirements #

Average post size (text + metadata, excluding media):

avg_post_size = 2 KB → 2 KB
daily_post_storage = daily_posts * avg_post_size → 2.18 GB
yearly_post_storage = daily_post_storage * 365 → 796 GB

30% of posts include an image averaging 500 KB:

daily_media_storage = daily_posts * 30% * 500 KB → 163 GB
yearly_media_storage = daily_media_storage * 365 → 58.3 TB

Gzip text storage (images are already compressed):

compressed_posts = compress(yearly_post_storage, gzip) → 265 GB
total_yearly_storage = compressed_posts + yearly_media_storage → 58.5 TB

Database Sizing #

Size the database tier against peak read traffic:

db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer → 4 server
db_primaries = write_rate at 2000 req/s per server with 25% buffer → 1 server
total_db_servers = db_read_replicas + db_primaries → 5 server

Storage I/O Performance #

A typical database query reads ~5 MB of data. Compare storage tiers:

hdd_query_time = seek(hdd) + read 5 MB from hdd → 0.0433 second
ssd_query_time = seek(ssd) + read 5 MB from ssd → 0.0092 second
nvme_query_time = seek(nvme) + read 5 MB from nvme → 0.0014 second

NVMe is 30x faster than HDD – the right choice for hot data.

Network Latency Budget #

Build a latency budget for a single API response:

network_rtt = rtt(regional) → 0.01 second
db_query = nvme_query_time → 0.0014 second
app_processing = 10 ms → 10 millisecond
total_latency = network_rtt + db_query + app_processing → 0.0214 second

Under 100 ms means users perceive the response as instant.

Bandwidth Requirements #

Average API response is 10 KB. Peak bandwidth per second:

peak_bandwidth = (peak_read_rate over 1 second) * 10 KB → 136 MB

Compare to standard network links:

gigabit_capacity = throughput(gigabit) → 125 MB/s
ten_gig_capacity = throughput(ten_gig) → 1.22 GB/s

A single gigabit link is not enough. A 10-gig link gives plenty of headroom.

CDN and Caching #

Target a 95% cache hit rate so only 5% of reads hit origin:

cache_hit_target = 95% → 95%
origin_read_rate = read_rate * (1 - cache_hit_target) → 231.481481/s

Media transfer time from origin to CDN edge:

media_transfer = transfer 500 KB across continental ten_gig → 0.0504 second

Availability and Error Budget #

Three nines (99.9%) is the standard SaaS target:

monthly_downtime = downtime(99.9%, month) → 43.2 minute
yearly_downtime = downtime(99.9%, year) → 8.76 hour

But raw downtime means nothing until you spend it. Assume each deploy takes 5 minutes of downtime (rolling restarts, health checks, draining):

deploy_time = 5 minutes → 5 minute
deploys_per_month = 8 → 8
total_deploy_downtime = deploy_time * deploys_per_month → 40 minute
remaining_error_budget = monthly_downtime - total_deploy_downtime → 3.2 minute

At three nines, 8 deploys leave only 3.2 minutes of error budget.

Four nines (99.99%) makes scheduled deploys impossible:

strict_monthly_downtime = downtime(99.99%, month) → 4.32 minute
strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime → -35.7 minute

A negative budget means you cannot afford any downtime deploys at four nines. Zero-downtime deploys (blue-green, canary) are a hard requirement.

Summary #

Key metrics rounded for stakeholder conversations:

storage_napkin = total_yearly_storage as napkin → ~58.7 TB
traffic_napkin = daily_reads as napkin → 400M
servers_napkin = total_db_servers as napkin → ~5 server
error_budget_napkin = remaining_error_budget as napkin → 3.2 minute

For 10M MAU, plan for 5 server database servers, ~58.7 TB of storage per year, and 3.2 minute of error budget after 8 monthly deploys.