Question 1

What does a p-value actually tell me?

Accepted Answer

It is the probability of observing a test statistic at least as extreme as yours, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true, and it is not the probability that your result is due to chance. A small p-value means your data are unlikely under H₀ — it does not by itself say anything about effect size, practical importance, or the probability that any specific alternative hypothesis is correct.

Question 2

When should I use a one-tailed vs a two-tailed test?

Accepted Answer

Use a two-tailed test whenever your alternative hypothesis is "different from" (H₁: μ ≠ μ₀). Use a one-tailed test only when your alternative is directional ("greater than" or "less than") and you specified that direction before looking at the data. Switching to one-tailed after seeing the result so you can halve the p-value is data dredging — it inflates the false-positive rate and most journals will reject it on sight. When in doubt, go two-tailed.

Question 3

Why does my p-value not match what R or scipy gives?

Accepted Answer

For typical statistics (|z| < 8, |t| < 30, χ² up to a few hundred) the calculator agrees with R’s pnorm/pt/pchisq, Python’s scipy.stats.norm.cdf/t.cdf/chi2.cdf, and a TI-84 to at least four decimal places. If you see a mismatch, check three things. First, the tail — some software returns one-tailed p by default. Second, the sign of your statistic — for one-tailed tests this matters. Third, degrees of freedom — make sure you are passing n − 1 (or the right residual df), not n.

Question 4

Why is the χ² option locked to one-tailed (right)?

Accepted Answer

The chi-squared test is right-tailed by construction. The test statistic is a sum of squared standardised differences, so larger values always mean stronger evidence against H₀, and small values just mean the observed counts are close to expected — never a reason to reject H₀ in the standard goodness-of-fit or independence tests. Software that lets you pick "left-tailed χ²" is testing something else entirely (e.g. a variance test against a lower bound), which is rare enough that this calculator does not expose it.

Question 5

Is p < 0.05 the same as "the result matters"?

Accepted Answer

No. The 0.05 threshold is a convention, not a law of nature, and with a large enough sample any trivial effect will eventually reach significance. A p-value below 0.05 with a large sample can correspond to an effect too small to act on; a p-value above 0.05 with a small sample can hide an important effect. Report the effect size and a confidence interval alongside the p-value — that combination tells you both whether the effect is real and whether it is big enough to care about.

Question 6

How precise are these p-values?

Accepted Answer

The incomplete gamma and incomplete beta routines use Lentz’s continued fraction with a relative tolerance of 1e-14 and up to 400 iterations, which is enough to converge to machine precision across the usual statistical range. The Abramowitz & Stegun 7.1.26 erf used for the normal CDF has |error| ≤ 1.5 × 10⁻⁷. For p-values reported to four decimal places, none of these approximation errors are visible.

P-Value Calculator

How to use this calculator

How the calculation works

Worked example

Frequently asked questions

Related calculators