P-value Calculator
Calculate p-values for statistical hypothesis testing
Enter your calculated Z value
Common values: 0.05, 0.01
P-value
Reject null hypothesis
Test Statistic
Z-value
Statistical Interpretation
• P-value: 0.0500
• Significance level (α): 0.05
• Confidence level: 95%
• Conclusion: Statistically significant - Reject the null hypothesis
How it works
A p-value is the probability of seeing a result at least as extreme as your data if the null hypothesis (no effect) were true. A small p-value means the data would be unlikely under 'no effect,' which is evidence against the null. It's compared to a significance threshold, usually 0.05.
P-value and significance
Compute a test statistic, then p = P(statistic this extreme | null is true)
- test statistic
- z, t, or chi-square from the data
- α
- significance level, often 0.05
Worked example
- A test yields a z-score of 2.1
- Two-tailed test
- Find the tail probability beyond ±2.1
p ≈ 0.036 — below 0.05, so statistically significant.
Good to know
- p < 0.05 is a common threshold, but it's a convention, not a law of nature.
- A small p-value shows an effect is unlikely due to chance — it does not measure the size or importance of the effect.
- P-values say nothing about the probability the hypothesis is true; they assume the null and look at the data.
Related Calculators
Frequently Asked Questions
What is a p-value?
A p-value is the probability of observing results at least as extreme as yours if the null hypothesis were true. A small p-value means your data would be surprising under the null — it is not the probability that the null hypothesis is true.
What does p < 0.05 mean?
It means results this extreme would occur less than 5% of the time by chance under the null hypothesis, so the result is called "statistically significant" at the conventional 0.05 level. The threshold is a convention, not a law — 0.01 and 0.10 are also used depending on the field and stakes.
Should I use a one-tailed or two-tailed test?
Two-tailed tests detect differences in either direction and are the safer default. One-tailed tests put all the rejection probability on one side, giving more power but only when you can justify — before seeing the data — that only one direction matters.
Which statistical test should I use?
Use a Z-test for large samples or known population standard deviation, a t-test for small samples with unknown standard deviation, a chi-square test for categorical data (independence or goodness of fit), and an F-test for comparing variances or in ANOVA.
Is a statistically significant result always meaningful?
No. With a large enough sample, even a trivial effect becomes statistically significant. Always pair the p-value with the effect size and confidence interval to judge whether the difference is large enough to matter in practice.