System Sizing
Back-of-napkin infrastructure estimation for a 10M user app with storage, bandwidth, and capacity planning.
How many servers do you need for a social media app with 10 million users? This walkthrough sizes the infrastructure from scratch – user activity, storage, database capacity, network latency, and availability – using CalcMark’s built-in system functions.
The complete CalcMark file is available at testdata/examples/system-sizing.cm.
User Activity #
Start with 10M monthly active users. A 40% DAU/MAU ratio is typical for an engaged social platform – Instagram-class engagement. The % of syntax reads like plain English.
monthly_users = 10M
daily_users = 40% of monthly_users| monthly_users = 10M | → | 10M |
| daily_users = 40% of monthly_users | → | 4M |
Each user posts about twice per week. Express that as a weekly total, then use over 1 day to get the daily count. as napkin rounds to two significant figures for quick reference.
posts_per_user_per_week = 2
weekly_posts = daily_users * posts_per_user_per_week
daily_posts = weekly_posts/week over 1 day
daily_posts_napkin = daily_posts as napkin| posts_per_user_per_week = 2 | → | 2 |
| weekly_posts = daily_users * posts_per_user_per_week | → | 8M |
| daily_posts = weekly_posts/week over 1 day | → | 1.14M |
| daily_posts_napkin = daily_posts as napkin | → | ~1.1M |
That gives 4M daily active users generating ~1.14M posts per day.
CalcMark features: Multiplier suffixes (10M); % of for percentages; rate accumulation (over); as napkin for human-readable rounding.
Read vs Write Ratio #
Social media is read-heavy – users scroll far more than they post. Assume 100 reads per user per day across timeline, profiles, and search.
reads_per_user_per_day = 100
daily_reads = daily_users * reads_per_user_per_day
read_write_ratio = daily_reads / daily_posts| reads_per_user_per_day = 100 | → | 100 |
| daily_reads = daily_users * reads_per_user_per_day | → | 400M |
| read_write_ratio = daily_reads / daily_posts | → | 350 |
400M daily reads versus ~1.14M daily writes – a read-to-write ratio of 350:1. This ratio drives your caching and replication strategy: invest heavily in read replicas and edge caching.
CalcMark features: Plain division for ratios; variable references across sections.
Traffic Rates #
You need per-second rates for capacity planning. CalcMark’s rate conversion syntax turns daily totals into rates. Peak traffic is typically 3x average (lunch hour, evening scrolling, breaking news).
read_rate = (daily_reads)/day per second
write_rate = (daily_posts)/day per second
peak_multiplier = 3
peak_read_rate = read_rate * peak_multiplier| read_rate = (daily_reads)/day per second | → | 4.63K/s |
| write_rate = (daily_posts)/day per second | → | 13.227513/s |
| peak_multiplier = 3 | → | 3 |
| peak_read_rate = read_rate * peak_multiplier | → | 13.89K/s |
~4.63K reads/s average, peaking at ~13.89K reads/s. Write rate is a modest ~13.2/s – your bottleneck is reads, not writes.
CalcMark features: Rate conversion ((value)/day per second); arithmetic on rate values.
Storage Requirements #
Each post is about 2 KB of text and metadata. 30% of posts include an image averaging 500 KB. CalcMark’s compress() function estimates gzip compression on the text portion.
avg_post_size = 2 KB
daily_post_storage = daily_posts * avg_post_size
yearly_post_storage = daily_post_storage * 365
daily_media_storage = daily_posts * 30% * 500 KB
yearly_media_storage = daily_media_storage * 365
compressed_posts = compress(yearly_post_storage, gzip)
total_yearly_storage = compressed_posts + yearly_media_storage| avg_post_size = 2 KB | → | 2 KB |
| daily_post_storage = daily_posts * avg_post_size | → | 2.18 GB |
| yearly_post_storage = daily_post_storage * 365 | → | 796 GB |
| daily_media_storage = daily_posts * 30% * 500 KB | → | 163 GB |
| yearly_media_storage = daily_media_storage * 365 | → | 58.3 TB |
| compressed_posts = compress(yearly_post_storage, gzip) | → | 265 GB |
| total_yearly_storage = compressed_posts + yearly_media_storage | → | 58.5 TB |
Text storage is ~796 GB/year before compression. compress() applies a typical 3:1 gzip ratio, bringing it down to ~265 GB. Media dominates at ~58.3 TB/year. Total yearly storage lands at ~58.5 TB.
CalcMark features: Data size units (KB, GB, TB); inline 30% for percentages; compress() with compression algorithm argument.
Database Sizing #
CalcMark’s capacity planning syntax reads like a sentence. You specify a request rate, a per-server capacity, and a buffer percentage. It returns the number of servers needed.
db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer
db_primaries = write_rate at 2000 req/s per server with 25% buffer
total_db_servers = db_read_replicas + db_primaries| db_read_replicas = peak_read_rate at 5000 req/s per server with 20% buffer | → | 4 server |
| db_primaries = write_rate at 2000 req/s per server with 25% buffer | → | 1 server |
| total_db_servers = db_read_replicas + db_primaries | → | 5 server |
At 13.89K peak reads/s with each server handling 5,000 req/s and a 20% headroom buffer, you need 4 read replicas. The write rate is low enough for 1 primary. Total: 5 database servers.
CalcMark features: at ... per server with N% buffer capacity planning syntax.
Storage I/O Performance #
A typical database query reads ~5 MB of data. CalcMark’s seek() and read ... from syntax uses well-known device characteristics to estimate I/O time across HDD, SSD, and NVMe.
hdd_query_time = seek(hdd) + read 5 MB from hdd
ssd_query_time = seek(ssd) + read 5 MB from ssd
nvme_query_time = seek(nvme) + read 5 MB from nvme| hdd_query_time = seek(hdd) + read 5 MB from hdd | → | 0.0433 second |
| ssd_query_time = seek(ssd) + read 5 MB from ssd | → | 0.0092 second |
| nvme_query_time = seek(nvme) + read 5 MB from nvme | → | 0.0014 second |
HDD: ~43 ms. SSD: ~9.2 ms. NVMe: ~1.4 ms. NVMe is 30x faster than HDD for this workload. The right choice for hot data is clear.
CalcMark features: seek() for device seek latency; read ... from natural language syntax for data read time by device type.
Network Latency Budget #
Build a latency budget for an API response: network round-trip, database query, and application processing time.
network_rtt = rtt(regional)
db_query = nvme_query_time
app_processing = 10 ms
total_latency = network_rtt + db_query + app_processing| network_rtt = rtt(regional) | → | 0.01 second |
| db_query = nvme_query_time | → | 0.0014 second |
| app_processing = 10 ms | → | 10 millisecond |
| total_latency = network_rtt + db_query + app_processing | → | 0.0214 second |
rtt(regional) returns 10 ms for a regional network hop. Adding the NVMe query (~1.4 ms) and 10 ms of app processing, your total latency budget is ~21.4 ms. Well under the 100 ms threshold users notice.
CalcMark features: rtt() with network distance argument; millisecond units; time addition.
Bandwidth Requirements #
Each response averages 10 KB. The over keyword accumulates the peak read rate over one second, then multiplies by response size to get bandwidth.
peak_bandwidth = (peak_read_rate over 1 second) * 10 KB
gigabit_capacity = throughput(gigabit)
ten_gig_capacity = throughput(ten_gig)| peak_bandwidth = (peak_read_rate over 1 second) * 10 KB | → | 136 MB |
| gigabit_capacity = throughput(gigabit) | → | 125 MB/s |
| ten_gig_capacity = throughput(ten_gig) | → | 1.22 GB/s |
Peak bandwidth is ~136 MB/s. A single gigabit link handles 125 MB/s – not enough. A 10-gigabit link at ~1.22 GB/s gives plenty of headroom.
CalcMark features: over for rate accumulation; throughput() for standard network link capacities.
CDN and Caching #
With a 95% cache hit target, only 5% of reads hit the origin. The transfer ... across syntax estimates time to push a media file from origin to CDN edge.
cache_hit_target = 95%
origin_read_rate = read_rate * (1 - cache_hit_target)
media_transfer = transfer 500 KB across continental ten_gig| cache_hit_target = 95% | → | 95% |
| origin_read_rate = read_rate * (1 - cache_hit_target) | → | 231.481481/s |
| media_transfer = transfer 500 KB across continental ten_gig | → | 0.0504 second |
Origin sees ~231 req/s instead of ~4,630 req/s – a 20x reduction. Transfer time is ~50 ms to push a 500 KB image across a continental distance over a 10-gig link.
CalcMark features: Percentage syntax (95%); transfer ... across natural language syntax with data size, distance, and link speed.
Availability and Error Budget #
The downtime() function converts an availability target into concrete allowed downtime. But raw downtime means nothing until you spend it against real operational costs.
monthly_downtime = downtime(99.9%, month)
yearly_downtime = downtime(99.9%, year)| monthly_downtime = downtime(99.9%, month) | → | 43.2 minute |
| yearly_downtime = downtime(99.9%, year) | → | 8.76 hour |
Three nines gives you 43.2 minutes/month or 8.76 hours/year. Now subtract deploy costs – assume each rolling deploy takes 5 minutes (draining, health checks, restart):
deploy_time = 5 minutes
deploys_per_month = 8
total_deploy_downtime = deploy_time * deploys_per_month
remaining_error_budget = monthly_downtime - total_deploy_downtime| deploy_time = 5 minutes | → | 5 minute |
| deploys_per_month = 8 | → | 8 |
| total_deploy_downtime = deploy_time * deploys_per_month | → | 40 minute |
| remaining_error_budget = monthly_downtime - total_deploy_downtime | → | 3.2 minute |
Eight deploys consume 40 of your 43.2 minutes, leaving only 3.2 minutes for actual incidents. That is a razor-thin margin.
Now check four nines:
strict_monthly_downtime = downtime(99.99%, month)
strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime| strict_monthly_downtime = downtime(99.99%, month) | → | 4.32 minute |
| strict_deploy_budget = strict_monthly_downtime - total_deploy_downtime | → | -35.7 minute |
The budget goes negative (-35.7 minutes). At four nines your entire monthly allowance is 4.32 minutes – eight 5-minute deploys already exceed it by 8x. This is the calculation that forces a zero-downtime deployment strategy (blue-green, canary).
CalcMark features: downtime() with percentage availability target and time period; Duration arithmetic (subtraction, scaling) to spend the error budget.
Summary #
The as napkin modifier rounds everything into quick-reference numbers for stakeholder conversations.
storage_napkin = total_yearly_storage as napkin
traffic_napkin = daily_reads as napkin
servers_napkin = total_db_servers as napkin
error_budget_napkin = remaining_error_budget as napkin| storage_napkin = total_yearly_storage as napkin | → | ~58.7 TB |
| traffic_napkin = daily_reads as napkin | → | 400M |
| servers_napkin = total_db_servers as napkin | → | ~5 server |
| error_budget_napkin = remaining_error_budget as napkin | → | 3.2 minute |
Bottom line: ~58.7 TB of storage per year, 400M daily reads, 5 database servers, and ~3.2 minutes of monthly error budget after deploys. That is the back-of-napkin infrastructure for a 10M-user social media app.
CalcMark features: as napkin for executive-summary rounding.
Features Demonstrated #
This example showcases the following CalcMark features:
- Multiplier suffixes –
10Mexpands to 10,000,000 - Percentage syntax –
40%,30%,95%,99.9%for readable proportions % of–40% of monthly_usersreads like plain Englishas napkin– human-readable rounding to 2 significant figures- Rate conversion –
(value)/day per secondfor daily-to-per-second rates over–peak_read_rate over 1 secondaccumulates a rate over time- Capacity planning –
at ... per server with N% buffer compress()– storage compression estimation with algorithm argumentseek()andread ... from– storage I/O latency by device type (HDD, SSD, NVMe)rtt()– network round-trip time by distancethroughput()– standard network link capacitiestransfer ... across– data transfer time across distance and link speeddowntime()– availability target to allowed downtime conversion; Duration arithmetic to spend the error budget- Data size units –
KB,MB,GB,TB,ms,req/s,server - Markdown prose – headings, paragraphs, and inline commentary between calculations
- Template interpolation –
{{variable}}to embed computed values in prose
Try It #
testdata/examples/system-sizing.cmcm testdata/examples/system-sizing.cm