System Sizing

Back-of-napkin infrastructure estimation for a 10M user app with storage, bandwidth, and capacity planning.

How many servers do you need for a social media app with 10 million users? This walkthrough sizes the infrastructure from scratch – user activity, storage, database capacity, network latency, and availability – using CalcMark’s built-in system functions.

The complete CalcMark file is available at testdata/examples/system-sizing.cm.


User Activity #

Start with 10M monthly active users. A 40% DAU/MAU ratio is typical for an engaged social platform – Instagram-class engagement. The % of syntax reads like plain English.

monthly_users = 10M
daily_users = 40% of monthly_users
Results
monthly_users = 10M10M
daily_users = 40% of monthly_users4M

Each user posts about twice per week. Express that as a weekly total, then use over 1 day to get the daily count. as napkin rounds to two significant figures for quick reference.

posts_per_user_per_week = 2
weekly_posts = daily_users * posts_per_user_per_week
daily_posts = weekly_posts/week over 1 day
daily_posts_napkin = daily_posts as napkin
Results
posts_per_user_per_week = 22
weekly_posts = daily_users * posts_per_user_per_week8M
daily_posts = weekly_posts/week over 1 day1.14M
daily_posts_napkin = daily_posts as napkin~1.1M

That gives 4M daily active users generating ~1.14M posts per day.

CalcMark features: Multiplier suffixes (10M); % of for percentages; rate accumulation (over); as napkin for human-readable rounding.


Read vs Write Ratio #

Social media is read-heavy – users scroll far more than they post. Assume 100 reads per user per day across timeline, profiles, and search.

reads_per_user_per_day = 100
daily_reads = daily_users * reads_per_user_per_day
read_write_ratio = daily_reads / daily_posts
Results
reads_per_user_per_day = 100100
daily_reads = daily_users * reads_per_user_per_day400M
read_write_ratio = daily_reads / daily_posts350

400M daily reads versus ~1.14M daily writes – a read-to-write ratio of 350:1. This ratio drives your caching and replication strategy: invest heavily in read replicas and edge caching.

CalcMark features: Plain division for ratios; variable references across sections.


Traffic Rates #

You need per-second rates for capacity planning. CalcMark’s rate conversion syntax turns daily totals into rates. Peak traffic is typically 3x average (lunch hour, evening scrolling, breaking news).

read_rate = (daily_reads)/day per second
write_rate = (daily_posts)/day per second

peak_multiplier = 3
peak_read_rate = read_rate * peak_multiplier
Results
read_rate = (daily_reads)/day per second4.63K/s
write_rate = (daily_posts)/day per second13.227513/s
peak_multiplier = 33
peak_read_rate = read_rate * peak_multiplier13.89K/s

~4.63K reads/s average, peaking at ~13.89K reads/s. Write rate is a modest ~13.2/s – your bottleneck is reads, not writes.

CalcMark features: Rate conversion ((value)/day per second); arithmetic on rate values.


Storage Requirements #

Each post is about 2 KB of text and metadata. 30% of posts include an image averaging 500 KB. CalcMark’s compress() function estimates gzip compression on the text portion.

avg_post_size = 2 KB
daily_post_storage = daily_posts * avg_post_size
yearly_post_storage = daily_post_storage * 365

daily_media_storage = daily_posts * 30% * 500 KB
yearly_media_storage = daily_media_storage * 365

compressed_posts = compress(yearly_post_storage, gzip)
total_yearly_storage = compressed_posts + yearly_media_storage
Results
avg_post_size = 2 KB2 KB
daily_post_storage = daily_posts * avg_post_size2.18 GB
yearly_post_storage = daily_post_storage * 365796 GB
daily_media_storage = daily_posts * 30% * 500 KB163 GB
yearly_media_storage = daily_media_storage * 36558.3 TB
compressed_posts = compress(yearly_post_storage, gzip)265 GB
total_yearly_storage = compressed_posts + yearly_media_storage58.5 TB

Text storage is ~796 GB/year before compression. compress() applies a typical 3:1 gzip ratio, bringing it down to ~265 GB. Media dominates at ~58.3 TB/year. Total yearly storage lands at ~58.5 TB.

CalcMark features: Data size units (KB, GB, TB); inline 30% for percentages; compress() with compression algorithm argument.


Database Sizing #

CalcMark’s capacity planning syntax reads like a sentence. You specify a request rate, a per-server capacity, and a buffer percentage. It returns the number of servers needed.

db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer
db_primaries = write_rate at 2000 req/s per server with 25% buffer
total_db_servers = db_read_replicas + db_primaries
Results
db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer4 server
db_primaries = write_rate at 2000 req/s per server with 25% buffer1 server
total_db_servers = db_read_replicas + db_primaries5 server

At 13.89K peak reads/s with each server handling 5,000 req/s and a 20% headroom buffer, you need 4 read replicas. The write rate is low enough for 1 primary. Total: 5 database servers.

CalcMark features: at ... per server with N% buffer capacity planning syntax.


Storage I/O Performance #

A typical database query reads ~5 MB of data. CalcMark’s seek() and read ... from syntax uses well-known device characteristics to estimate I/O time across HDD, SSD, and NVMe.

hdd_query_time = seek(hdd) + read 5 MB from hdd
ssd_query_time = seek(ssd) + read 5 MB from ssd
nvme_query_time = seek(nvme) + read 5 MB from nvme
Results
hdd_query_time = seek(hdd) + read 5 MB from hdd0.0433 second
ssd_query_time = seek(ssd) + read 5 MB from ssd0.0092 second
nvme_query_time = seek(nvme) + read 5 MB from nvme0.0014 second

HDD: ~43 ms. SSD: ~9.2 ms. NVMe: ~1.4 ms. NVMe is 30x faster than HDD for this workload. The right choice for hot data is clear.

CalcMark features: seek() for device seek latency; read ... from natural language syntax for data read time by device type.


Network Latency Budget #

Build a latency budget for an API response: network round-trip, database query, and application processing time.

network_rtt = rtt(regional)
db_query = nvme_query_time
app_processing = 10 ms
total_latency = network_rtt + db_query + app_processing
Results
network_rtt = rtt(regional)0.01 second
db_query = nvme_query_time0.0014 second
app_processing = 10 ms10 millisecond
total_latency = network_rtt + db_query + app_processing0.0214 second

rtt(regional) returns 10 ms for a regional network hop. Adding the NVMe query (~1.4 ms) and 10 ms of app processing, your total latency budget is ~21.4 ms. Well under the 100 ms threshold users notice.

CalcMark features: rtt() with network distance argument; millisecond units; time addition.


Bandwidth Requirements #

Each response averages 10 KB. The over keyword accumulates the peak read rate over one second, then multiplies by response size to get bandwidth.

peak_bandwidth = (peak_read_rate over 1 second) * 10 KB

gigabit_capacity = throughput(gigabit)
ten_gig_capacity = throughput(ten_gig)
Results
peak_bandwidth = (peak_read_rate over 1 second) * 10 KB136 MB
gigabit_capacity = throughput(gigabit)125 MB/s
ten_gig_capacity = throughput(ten_gig)1.22 GB/s

Peak bandwidth is ~136 MB/s. A single gigabit link handles 125 MB/s – not enough. A 10-gigabit link at ~1.22 GB/s gives plenty of headroom.

CalcMark features: over for rate accumulation; throughput() for standard network link capacities.


CDN and Caching #

With a 95% cache hit target, only 5% of reads hit the origin. The transfer ... across syntax estimates time to push a media file from origin to CDN edge.

cache_hit_target = 95%
origin_read_rate = read_rate * (1 - cache_hit_target)

media_transfer = transfer 500 KB across continental ten_gig
Results
cache_hit_target = 95%95%
origin_read_rate = read_rate * (1 - cache_hit_target)231.481481/s
media_transfer = transfer 500 KB across continental ten_gig0.0504 second

Origin sees ~231 req/s instead of ~4,630 req/s – a 20x reduction. Transfer time is ~50 ms to push a 500 KB image across a continental distance over a 10-gig link.

CalcMark features: Percentage syntax (95%); transfer ... across natural language syntax with data size, distance, and link speed.


Availability and Error Budget #

The downtime() function converts an availability target into concrete allowed downtime. But raw downtime means nothing until you spend it against real operational costs.

monthly_downtime = downtime(99.9%, month)
yearly_downtime = downtime(99.9%, year)
Results
monthly_downtime = downtime(99.9%, month)43.2 minute
yearly_downtime = downtime(99.9%, year)8.76 hour

Three nines gives you 43.2 minutes/month or 8.76 hours/year. Now subtract deploy costs – assume each rolling deploy takes 5 minutes (draining, health checks, restart):

deploy_time = 5 minutes
deploys_per_month = 8
total_deploy_downtime = deploy_time * deploys_per_month
remaining_error_budget = monthly_downtime - total_deploy_downtime
Results
deploy_time = 5 minutes5 minute
deploys_per_month = 88
total_deploy_downtime = deploy_time * deploys_per_month40 minute
remaining_error_budget = monthly_downtime - total_deploy_downtime3.2 minute

Eight deploys consume 40 of your 43.2 minutes, leaving only 3.2 minutes for actual incidents. That is a razor-thin margin.

Now check four nines:

strict_monthly_downtime = downtime(99.99%, month)
strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime
Results
strict_monthly_downtime = downtime(99.99%, month)4.32 minute
strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime-35.7 minute

The budget goes negative (-35.7 minutes). At four nines your entire monthly allowance is 4.32 minutes – eight 5-minute deploys already exceed it by 8x. This is the calculation that forces a zero-downtime deployment strategy (blue-green, canary).

CalcMark features: downtime() with percentage availability target and time period; Duration arithmetic (subtraction, scaling) to spend the error budget.


Summary #

The as napkin modifier rounds everything into quick-reference numbers for stakeholder conversations.

storage_napkin = total_yearly_storage as napkin
traffic_napkin = daily_reads as napkin
servers_napkin = total_db_servers as napkin
error_budget_napkin = remaining_error_budget as napkin
Results
storage_napkin = total_yearly_storage as napkin~58.7 TB
traffic_napkin = daily_reads as napkin400M
servers_napkin = total_db_servers as napkin~5 server
error_budget_napkin = remaining_error_budget as napkin3.2 minute

Bottom line: ~58.7 TB of storage per year, 400M daily reads, 5 database servers, and ~3.2 minutes of monthly error budget after deploys. That is the back-of-napkin infrastructure for a 10M-user social media app.

CalcMark features: as napkin for executive-summary rounding.


Features Demonstrated #

This example showcases the following CalcMark features:

  • Multiplier suffixes10M expands to 10,000,000
  • Percentage syntax40%, 30%, 95%, 99.9% for readable proportions
  • % of40% of monthly_users reads like plain English
  • as napkin – human-readable rounding to 2 significant figures
  • Rate conversion(value)/day per second for daily-to-per-second rates
  • overpeak_read_rate over 1 second accumulates a rate over time
  • Capacity planningat ... per server with N% buffer
  • compress() – storage compression estimation with algorithm argument
  • seek() and read ... from – storage I/O latency by device type (HDD, SSD, NVMe)
  • rtt() – network round-trip time by distance
  • throughput() – standard network link capacities
  • transfer ... across – data transfer time across distance and link speed
  • downtime() – availability target to allowed downtime conversion; Duration arithmetic to spend the error budget
  • Data size unitsKB, MB, GB, TB, ms, req/s, server
  • Markdown prose – headings, paragraphs, and inline commentary between calculations
  • Template interpolation{{variable}} to embed computed values in prose

Try It #

testdata/examples/system-sizing.cm
cm testdata/examples/system-sizing.cm