P-value Calculator

Calculate p-values for statistical hypothesis testing

Enter your calculated Z value

Common values: 0.05, 0.01

P-value

0.0500

Reject null hypothesis

Test Statistic

1.960

Z-value

Statistical Interpretation

• P-value: 0.0500

• Significance level (α): 0.05

• Confidence level: 95%

• Conclusion: Statistically significant - Reject the null hypothesis

How it works

A p-value is the probability of seeing a result at least as extreme as your data if the null hypothesis (no effect) were true. A small p-value means the data would be unlikely under 'no effect,' which is evidence against the null. It's compared to a significance threshold, usually 0.05.

P-value and significance

Compute a test statistic, then p = P(statistic this extreme | null is true)
test statistic
z, t, or chi-square from the data
α
significance level, often 0.05

Worked example

  • A test yields a z-score of 2.1
  • Two-tailed test
  1. Find the tail probability beyond ±2.1

p ≈ 0.036 — below 0.05, so statistically significant.

Good to know

  • p < 0.05 is a common threshold, but it's a convention, not a law of nature.
  • A small p-value shows an effect is unlikely due to chance — it does not measure the size or importance of the effect.
  • P-values say nothing about the probability the hypothesis is true; they assume the null and look at the data.

Related Calculators

Frequently Asked Questions

What is a p-value?

A p-value is the probability of observing results at least as extreme as yours if the null hypothesis were true. A small p-value means your data would be surprising under the null — it is not the probability that the null hypothesis is true.

What does p < 0.05 mean?

It means results this extreme would occur less than 5% of the time by chance under the null hypothesis, so the result is called "statistically significant" at the conventional 0.05 level. The threshold is a convention, not a law — 0.01 and 0.10 are also used depending on the field and stakes.

Should I use a one-tailed or two-tailed test?

Two-tailed tests detect differences in either direction and are the safer default. One-tailed tests put all the rejection probability on one side, giving more power but only when you can justify — before seeing the data — that only one direction matters.

Which statistical test should I use?

Use a Z-test for large samples or known population standard deviation, a t-test for small samples with unknown standard deviation, a chi-square test for categorical data (independence or goodness of fit), and an F-test for comparing variances or in ANOVA.

Is a statistically significant result always meaningful?

No. With a large enough sample, even a trivial effect becomes statistically significant. Always pair the p-value with the effect size and confidence interval to judge whether the difference is large enough to matter in practice.