P-Value Calculator
Convert a z, t or chi-squared test statistic into a p-value. Pick one- or two-tailed, enter your degrees of freedom (for t and χ²), and the calculator returns the exact p-value and whether the result is significant at α = 0.05 and α = 0.01.
p-value (two-tailed)
0.0500
- CDF value F(statistic)
- 0.975002
- Significance at α = 0.05
- Yes — reject H₀
- Significance at α = 0.01
- No — fail to reject H₀
Under the standard normal null distribution, the two-tailed p-value for a statistic of 1.96 is 0.0500 — significant at α = 0.05 (not at 0.01).
How to use this calculator
Pick the distribution your test statistic comes from. Use z when the population standard deviation is known or the sample is large (n ≥ 30) and the test is on a mean or proportion. Use t when σ is unknown and n is small — most regression coefficient tests and one-sample mean tests with unknown σ are t-tests. Use χ² for goodness-of-fit, independence, or variance tests. Enter the value of your test statistic exactly as your software reports it (signed for z and t, non-negative for χ²). Enter degrees of freedom for t (usually n − 1, or the residual df from your regression) and χ² (usually (rows − 1)(cols − 1) for a contingency table, or k − 1 for goodness-of-fit). Choose the tail that matches your alternative hypothesis: two-tailed for H₁: μ ≠ μ₀, one-tailed (right) for H₁: μ > μ₀, one-tailed (left) for H₁: μ < μ₀. χ² tests are always right-tailed — the calculator forces that automatically.
How the calculation works
The p-value is the probability, assuming the null hypothesis is true, of observing a test statistic at least as extreme as the one you got. The calculator evaluates the cumulative distribution function F of the relevant null distribution at your statistic, then takes the appropriate tail. For z it uses the standard normal CDF Φ via the Abramowitz & Stegun erf approximation. For t it uses the regularised incomplete beta function I_x(a,b) — F_T(t; ν) = 1 − ½·I_{ν/(ν+t²)}(ν/2, ½) for t ≥ 0, with the symmetric reflection for t < 0. For χ² it uses the regularised lower incomplete gamma — F(x; k) = P(k/2, x/2). Two-tailed p-values are 2·min(F, 1−F) when the distribution is symmetric (z and t). One-tailed (right) is 1 − F; one-tailed (left) is F. The continued-fraction implementations match R, Python’s scipy.stats and a TI-84 to at least four decimal places across the usual statistical range.
Worked example
You ran a two-sample t-test in your spreadsheet and it returned t = 2.228 with df = 10. You want a two-tailed p-value. Select t, enter 2.228, df = 10, two-tailed. The calculator returns p ≈ 0.0500 — exactly the textbook 5% critical value for t(10), confirming the result is right on the edge of significance at α = 0.05 and not significant at α = 0.01. If you had used z (ignoring df) instead, you would have read p ≈ 0.0259 — about half the t-test p-value. For small df, the t-distribution’s heavier tails matter, and using z when you should have used t is a common way to overstate significance.
Frequently asked questions
What does a p-value actually tell me?
It is the probability of observing a test statistic at least as extreme as yours, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true, and it is not the probability that your result is due to chance. A small p-value means your data are unlikely under H₀ — it does not by itself say anything about effect size, practical importance, or the probability that any specific alternative hypothesis is correct.
When should I use a one-tailed vs a two-tailed test?
Use a two-tailed test whenever your alternative hypothesis is "different from" (H₁: μ ≠ μ₀). Use a one-tailed test only when your alternative is directional ("greater than" or "less than") and you specified that direction before looking at the data. Switching to one-tailed after seeing the result so you can halve the p-value is data dredging — it inflates the false-positive rate and most journals will reject it on sight. When in doubt, go two-tailed.
Why does my p-value not match what R or scipy gives?
For typical statistics (|z| < 8, |t| < 30, χ² up to a few hundred) the calculator agrees with R’s pnorm/pt/pchisq, Python’s scipy.stats.norm.cdf/t.cdf/chi2.cdf, and a TI-84 to at least four decimal places. If you see a mismatch, check three things. First, the tail — some software returns one-tailed p by default. Second, the sign of your statistic — for one-tailed tests this matters. Third, degrees of freedom — make sure you are passing n − 1 (or the right residual df), not n.
Why is the χ² option locked to one-tailed (right)?
The chi-squared test is right-tailed by construction. The test statistic is a sum of squared standardised differences, so larger values always mean stronger evidence against H₀, and small values just mean the observed counts are close to expected — never a reason to reject H₀ in the standard goodness-of-fit or independence tests. Software that lets you pick "left-tailed χ²" is testing something else entirely (e.g. a variance test against a lower bound), which is rare enough that this calculator does not expose it.
Is p < 0.05 the same as "the result matters"?
No. The 0.05 threshold is a convention, not a law of nature, and with a large enough sample any trivial effect will eventually reach significance. A p-value below 0.05 with a large sample can correspond to an effect too small to act on; a p-value above 0.05 with a small sample can hide an important effect. Report the effect size and a confidence interval alongside the p-value — that combination tells you both whether the effect is real and whether it is big enough to care about.
How precise are these p-values?
The incomplete gamma and incomplete beta routines use Lentz’s continued fraction with a relative tolerance of 1e-14 and up to 400 iterations, which is enough to converge to machine precision across the usual statistical range. The Abramowitz & Stegun 7.1.26 erf used for the normal CDF has |error| ≤ 1.5 × 10⁻⁷. For p-values reported to four decimal places, none of these approximation errors are visible.