How to Choose a Sample Size for a Survey

A survey sample size is not a number you guess at — it falls out of three decisions you have to make explicitly: how confident you want to be, how tight a margin of error you can tolerate, and how the question is likely to split. This guide walks through the Cochran formula behind every sample-size calculator, the finite-population correction that makes small populations cheaper to survey, and the practical mistakes that wreck a study before the first interview.

#math#statistics#sample-size#survey#cochran#margin-of-error

What "sample size" actually means in a survey

A survey sample size is the number of completed, usable responses you need to estimate something about a larger population to a chosen precision. It is not a budget question, not a sense of how many people would be impressive, and not a fixed number that every study uses. It falls out of three explicit decisions: how confident you want to be in the final estimate, how tight a margin of error you can tolerate, and how the question is likely to split among respondents. Once those three are pinned down, the sample size calculator returns a single number — the minimum sample that delivers the requested precision under standard assumptions.

That number is independent of the population's absolute size for any practical purpose above a few hundred thousand. Polling a country of 330 million Americans needs the same sample as polling a country of 67 million British adults: roughly 385 respondents at the canonical 95% confidence / ±5% margin of error. The intuition that bigger populations need bigger samples is wrong, and it is one of the most persistent misconceptions in applied statistics. What does change is the cost of a tighter margin or a higher confidence level, both of which can multiply the sample requirement quickly.

The Cochran formula behind the calculator

The closed-form sample size for estimating a population proportion comes from inverting the normal-approximation confidence interval for a proportion. If you want a sample large enough that the observed proportion p̂ falls within ±E of the true proportion p with probability 1 − α, the required sample size is:

n₀ = z² · p · (1 − p) / E²

Here z is the two-sided standard-normal critical value for the chosen confidence level (1.96 for 95%, 2.576 for 99%, 1.645 for 90%), p is the expected response proportion expressed as a decimal between 0 and 1, and E is the desired margin of error as a proportion (0.05 for ±5 percentage points). The formula is published in Cochran's 1977 Sampling Techniques §4.2 and in the NIST/SEMATECH e-Handbook of Statistical Methods §7.2.4.2, and it is the basis of every textbook and online sample size calculator for proportion estimation.

Three properties of the formula are worth internalising. First, n₀ scales with the square of the z-score, so doubling confidence is not free — moving from 95% to 99% multiplies the sample by (2.576 / 1.96)² ≈ 1.73. Second, n₀ scales with 1 / E², so halving the margin quadruples the requirement; this is why opinion polls live with ±3 to ±5 rather than ±1. Third, the variance term p · (1 − p) is maximised at p = 0.5 and equals 0.25 there; it falls toward zero at the extremes, which makes 50% the most conservative — and safest — default when no prior estimate is available.

The finite-population correction (FPC)

The Cochran formula above assumes an infinite or unknown population, which is a reasonable approximation when the population is large relative to the sample. When the population N is finite and the unadjusted sample is a meaningful fraction of it, the formula understates the information each respondent provides — picking 80 people out of a 100-person team tells you almost everything about that team, far more than picking 80 strangers out of a city. The finite-population correction reflects that with a simple adjustment:

n = n₀ / (1 + (n₀ − 1) / N)

At 95% confidence and a ±5% margin with p = 0.5, the unadjusted Cochran figure is 385. The corrected requirement is 278 for a population of 1,000, 217 for 500, 132 for 200 and 80 for 100. For populations above roughly 100,000 the correction trims the sample by a single respondent or two and is normally ignored. The sample size calculator applies the FPC automatically whenever a population size is entered; leave the population blank or set it to zero to get the unadjusted infinite-population figure.

Worked example: a national opinion poll

Suppose a polling team is fielding a survey on a referendum question and wants ±5 percentage points of error at 95% confidence. With no prior estimate of how the question will split, the team uses p = 0.5 and an infinite population (a national electorate is effectively unbounded for sampling purposes). Plugging in:

n₀ = 1.96² · 0.5 · 0.5 / 0.05² = 3.8416 · 0.25 / 0.0025 = 384.16 → 385 respondents (ceiling-rounded)

That is the figure SurveyMonkey, Qualtrics, Gallup and every major polling firm quote as the canonical minimum for a national survey. The same calculation at 99% confidence with the same ±5% margin gives 2.576² · 0.25 / 0.0025 ≈ 664 — almost three-quarters again as many interviews for the higher confidence level. Tightening to ±2.5% at 95% takes the requirement to 1,537. Going all the way to ±1% needs 9,604, twenty-five times the original sample for one-fifth the margin.

Now consider a different scenario: a private-club committee polling 1,000 paying members on a proposed fee change, again at 95% / ±5% / p = 0.5. The unadjusted figure is the same 385, but the finite-population correction pulls it down sharply:

n = 384.16 / (1 + 383.16 / 1000) = 384.16 / 1.38316 = 277.74 → 278 respondents

That 28% saving is the FPC at work. The committee polls 278 members rather than 385, runs the survey for two weeks instead of three, and arrives at the same precision. The same calculation for a 100-person engineering team drops to 80 respondents — at which point the survey is closer to a census than a sample.

Factors that change the required sample

Confidence level

95% is the standard for almost every published survey. It corresponds to z = 1.96 and is what every methodology footnote assumes unless stated otherwise. 90% (z = 1.645) shrinks the sample to about 70% of the 95% figure and is fine for exploratory or internal work. 99% (z = 2.576) roughly doubles the 95% sample and is appropriate for clinical trials, pharmaceutical work and high-stakes industrial decisions. 99.9% requires almost three times the 95% sample and is rare outside regulated industries.

Margin of error

The most expensive lever. Because the formula scales with 1 / E², the margin of error dominates the sample size. Pollsters live with ±3 to ±5 percentage points because anything tighter quickly becomes unaffordable. Pew Research Center, for example, reports a typical margin of ±3 to ±4 on its national surveys of roughly 1,000 to 1,500 adults. Cutting that in half would multiply the sample by four — and the cost of the survey along with it.

Expected proportion

Use 50% unless you have a credible prior. The variance term p · (1 − p) is symmetric around 0.5, so an expected split of 60/40 produces the same sample as 40/60, and both are slightly smaller than the 50/50 conservative figure. The saving only becomes substantial at the extremes: at p = 0.9 (or 0.1), the required sample is about 36% of the p = 0.5 figure. For new surveys, 50% is the safe default; for repeat waves of established surveys, plugging in the previous wave's result is a legitimate way to tighten the sample.

Population size

Only matters when the population is small relative to the unadjusted sample — roughly when n₀ exceeds 5% of N. For a population of 10,000 or above, the finite-population correction is usually negligible. For populations under 1,000 it can cut the sample by 25–80%. The sample size calculator handles this automatically.

Response rate

Strictly speaking, response rate does not change the required sample — but it changes the gross number of invitations. The Cochran formula gives the number of completed, usable responses. If a phone or web survey expects a 20% completion rate, the gross contact list must be five times the calculated sample. A target of 385 completes therefore needs around 1,925 contacts. Response rate also raises the spectre of nonresponse bias: people who opt out of surveys are systematically different from those who agree, which is why pollsters weight achieved samples back to known demographic benchmarks.

How to choose your inputs

  • Start with the decision. What action will the survey result drive? Whatever it is, work backwards from the precision that action requires. A budget decision that swings on a 5-point shift in support needs a margin meaningfully tighter than 5 points — otherwise the survey cannot resolve the question.
  • Default to 95% / ±5% / p = 0.5. This is the canonical setting for a reason. It produces a sample of 385 for an open population and that is rarely overkill for a one-shot survey. Move only when you have a clear reason.
  • Enter the population if it is small. An internal employee survey, a membership poll, a single school's parents — anywhere the population is in the hundreds or low thousands, the finite-population correction is worth claiming.
  • Inflate for response rate. Compute the required completes with the sample size calculator, then divide by your expected response rate to get the gross contact list.
  • Build in a buffer. Real fieldwork loses respondents to screening failures, partial completes and data-quality screens. A buffer of 10–20% above the minimum sample protects against the cleaned dataset falling short.
  • Consider stratification. If you need to report results separately for subgroups (men vs women, age bands, regions), the sample size calculation applies to each subgroup independently. Reporting 95% / ±5% on the overall result does not give you 95% / ±5% on a quarter-sized subgroup — that subgroup needs its own minimum.

Common mistakes

Treating the result as the gross contact list. The calculator returns completed responses. If response rate is anything below 100%, you need to contact more people. Treating the calculator output as the number of invitations sent virtually guarantees the final sample misses the target precision.

Forgetting that subgroups need their own sample. A national survey of 1,000 with a ±3% margin sounds tight, but a regional subgroup of 100 has a ±10% margin — much weaker. If the analysis plan includes subgroup comparisons, plan the sample around the smallest reportable cell.

Confusing this with statistical power. The Cochran formula sizes a confidence interval, not a hypothesis test. If the study is built around testing a difference between two groups or detecting an effect of a specified size, power-analysis tools like G*Power, R's pwr package or Python's statsmodels.power are the right machinery — not a proportion-based sample-size calculator.

Picking p < 0.5 to shrink the sample. Plugging in p = 0.3 because the sample is cheaper is wishful thinking unless that figure is supported by a pilot or previous wave. If the true proportion turns out closer to 0.5, the achieved margin will be wider than promised. The conservative 50% default protects against that.

When to seek a statistician

For straightforward single-question surveys with one population and one reported proportion, the sample size calculator on this page is enough. Get expert input when the design gets more complex: multi-stage cluster samples, stratified sampling with different precision targets per stratum, longitudinal surveys with attrition, surveys feeding into causal inference, or anything where the analysis plan includes regression coefficients rather than simple proportions. Clinical trials, regulated quality control and academic studies that will face peer-review almost always benefit from a power analysis run by someone with formal training in survey methodology.

Frequently asked questions

Answers to the most common questions about sample size, the Cochran formula and the finite-population correction are listed in the FAQ block on this page. For other statistical work, see the related standard deviation calculator and average calculator.

Frequently asked questions

Why does the 'magic number' for a national poll keep coming out as 385?

Because almost every national poll uses the same three defaults: 95% confidence, a ±5 percentage-point margin of error, and the conservative assumption that the question could split anywhere — including 50/50. Plug those into the Cochran formula n = z² · p · (1 − p) / E² and you get 1.96² · 0.5 · 0.5 / 0.05² = 384.16, which rounds up to 385. The exact integer some sources quote is 384, but ceiling-rounding the unadjusted figure gives 385 and is the more defensible choice for a minimum sample. The number is independent of country population — polling 330 million Americans needs the same sample as polling 67 million British adults, because the formula only depends on how many people you ask, not how many exist.

How do I pick the expected proportion p when I have no idea what it will be?

Leave it at 50%. The variance term p · (1 − p) is maximised at p = 0.5 — that value of 0.25 is the largest the term can take, and it shrinks toward zero at the extremes of 0 or 1. Using p = 0.5 therefore gives the largest, most conservative sample size and protects you against any actual response distribution. If you genuinely have a strong prior — perhaps a pilot study or a previous wave of the same survey — plugging in the real p produces a smaller, cheaper sample, but the saving is modest until p is well away from 0.5. At p = 0.7 the required sample is 84% of the p = 0.5 figure; at p = 0.9 it drops to 36%. For a brand-new survey, 50% is the safe default.

When does the finite-population correction actually matter?

Apply it when the unadjusted sample n₀ would be a meaningful fraction of the population N — a common rule of thumb is when n₀ exceeds 5% of N. The correction is n = n₀ / (1 + (n₀ − 1) / N), and it shrinks the requirement because each respondent carries more information when the universe is bounded. For a 1,000-person population at 95% / ±5%, the requirement falls from 385 to 278. For a 100-person team it falls to 80. For a population of one million or more, the correction makes a difference of one or two respondents and can safely be ignored. The sample-size calculator applies it automatically when a population is entered.

Is this the same as statistical power for a hypothesis test?

No. Sample size for a confidence interval — the problem this calculator solves — asks how many respondents you need to estimate a single proportion within a given margin of error. Power analysis for a hypothesis test asks how large a sample you need to detect a specified effect size at chosen significance (α) and power (1 − β) levels. The two are related, but the inputs and formulas differ. For two-sample comparisons of means, two-sample proportion tests, ANOVA and regression, tools like G*Power, the pwr package in R, or statsmodels.stats.power in Python are appropriate. Confusing the two is a classic mistake — sample-size calculators built for proportions will under-power a hypothesis test that needs to detect a small effect.

Does my response rate change the sample size I need?

It changes how many invitations you need to send, not how many completed responses you need. The Cochran formula gives the number of completed, usable responses. If you expect a 25% response rate, you need to send the calculated sample divided by 0.25 — for a target of 385 completes that means contacting 1,540 people. Response rate also matters because nonresponse bias is real: people who decline a survey often differ systematically from people who agree. Online and phone surveys frequently see response rates in the single digits, so inflate the gross contact list aggressively and consider weighting the achieved sample to known population benchmarks.

Why does halving the margin of error quadruple the sample size?

Because the formula scales with 1 / E². The required sample is inversely proportional to the square of the margin, so a tighter interval is expensive: ±5% at 95% confidence needs 385 respondents; ±2.5% needs 1,537; ±1% needs 9,604. The same effect runs in reverse — a survey of 100 respondents at p = 0.5 has a margin of about ±10 percentage points at 95% confidence, which is rarely tight enough to justify any decisions. The sample-size calculator makes this explicit: nudge the margin slider and watch the required sample jump non-linearly. Pollsters live with ±3 to ±5 because anything tighter is rapidly unaffordable.

What confidence level should I use — 90%, 95% or 99%?

95% is the universal default for opinion polls, market research, social-science studies and most quality-control work. It corresponds to a z-score of 1.96 and is what every published methodology page assumes unless stated otherwise. 90% is acceptable for exploratory or low-stakes work and gives roughly 70% of the 95% sample. 99% is used in clinical trials, pharmaceutical work and high-stakes industrial quality control where the cost of a wrong conclusion is large; it roughly doubles the 95% sample. 99.9% is rare outside of regulated industries — it requires almost three times the 95% sample for a marginal gain in certainty.

How do I calculate sample size for a mean rather than a proportion?

For a mean μ, the formula becomes n = (z · σ / E)², where σ is the population standard deviation in the same units as the margin of error E. You need a prior estimate of σ from a pilot study, published research or comparable data. The z-score and the finite-population correction work identically. The calculator on this site is built for proportions because most surveys ask categorical questions ('do you support X?', 'have you done Y in the last month?'); for studies measuring continuous outcomes like income, blood pressure or response time, switch to the mean formula and remember that σ dominates the answer.

Informational only. Not personalised financial, legal, or tax advice.