
- SciPy - Home
- SciPy - Introduction
- SciPy - Environment Setup
- SciPy - Basic Functionality
- SciPy - Relationship with NumPy
- SciPy Clusters
- SciPy - Clusters
- SciPy - Hierarchical Clustering
- SciPy - K-means Clustering
- SciPy - Distance Metrics
- SciPy Constants
- SciPy - Constants
- SciPy - Mathematical Constants
- SciPy - Physical Constants
- SciPy - Unit Conversion
- SciPy - Astronomical Constants
- SciPy - Fourier Transforms
- SciPy - FFTpack
- SciPy - Discrete Fourier Transform (DFT)
- SciPy - Fast Fourier Transform (FFT)
- SciPy Integration Equations
- SciPy - Integrate Module
- SciPy - Single Integration
- SciPy - Double Integration
- SciPy - Triple Integration
- SciPy - Multiple Integration
- SciPy Differential Equations
- SciPy - Differential Equations
- SciPy - Integration of Stochastic Differential Equations
- SciPy - Integration of Ordinary Differential Equations
- SciPy - Discontinuous Functions
- SciPy - Oscillatory Functions
- SciPy - Partial Differential Equations
- SciPy Interpolation
- SciPy - Interpolate
- SciPy - Linear 1-D Interpolation
- SciPy - Polynomial 1-D Interpolation
- SciPy - Spline 1-D Interpolation
- SciPy - Grid Data Multi-Dimensional Interpolation
- SciPy - RBF Multi-Dimensional Interpolation
- SciPy - Polynomial & Spline Interpolation
- SciPy Curve Fitting
- SciPy - Curve Fitting
- SciPy - Linear Curve Fitting
- SciPy - Non-Linear Curve Fitting
- SciPy - Input & Output
- SciPy - Input & Output
- SciPy - Reading & Writing Files
- SciPy - Working with Different File Formats
- SciPy - Efficient Data Storage with HDF5
- SciPy - Data Serialization
- SciPy Linear Algebra
- SciPy - Linalg
- SciPy - Matrix Creation & Basic Operations
- SciPy - Matrix LU Decomposition
- SciPy - Matrix QU Decomposition
- SciPy - Singular Value Decomposition
- SciPy - Cholesky Decomposition
- SciPy - Solving Linear Systems
- SciPy - Eigenvalues & Eigenvectors
- SciPy Image Processing
- SciPy - Ndimage
- SciPy - Reading & Writing Images
- SciPy - Image Transformation
- SciPy - Filtering & Edge Detection
- SciPy - Top Hat Filters
- SciPy - Morphological Filters
- SciPy - Low Pass Filters
- SciPy - High Pass Filters
- SciPy - Bilateral Filter
- SciPy - Median Filter
- SciPy - Non - Linear Filters in Image Processing
- SciPy - High Boost Filter
- SciPy - Laplacian Filter
- SciPy - Morphological Operations
- SciPy - Image Segmentation
- SciPy - Thresholding in Image Segmentation
- SciPy - Region-Based Segmentation
- SciPy - Connected Component Labeling
- SciPy Optimize
- SciPy - Optimize
- SciPy - Special Matrices & Functions
- SciPy - Unconstrained Optimization
- SciPy - Constrained Optimization
- SciPy - Matrix Norms
- SciPy - Sparse Matrix
- SciPy - Frobenius Norm
- SciPy - Spectral Norm
- SciPy Condition Numbers
- SciPy - Condition Numbers
- SciPy - Linear Least Squares
- SciPy - Non-Linear Least Squares
- SciPy - Finding Roots of Scalar Functions
- SciPy - Finding Roots of Multivariate Functions
- SciPy - Signal Processing
- SciPy - Signal Filtering & Smoothing
- SciPy - Short-Time Fourier Transform
- SciPy - Wavelet Transform
- SciPy - Continuous Wavelet Transform
- SciPy - Discrete Wavelet Transform
- SciPy - Wavelet Packet Transform
- SciPy - Multi-Resolution Analysis
- SciPy - Stationary Wavelet Transform
- SciPy - Statistical Functions
- SciPy - Stats
- SciPy - Descriptive Statistics
- SciPy - Continuous Probability Distributions
- SciPy - Discrete Probability Distributions
- SciPy - Statistical Tests & Inference
- SciPy - Generating Random Samples
- SciPy - Kaplan-Meier Estimator Survival Analysis
- SciPy - Cox Proportional Hazards Model Survival Analysis
- SciPy Spatial Data
- SciPy - Spatial
- SciPy - Special Functions
- SciPy - Special Package
- SciPy Advanced Topics
- SciPy - CSGraph
- SciPy - ODR
- SciPy Useful Resources
- SciPy - Reference
- SciPy - Quick Guide
- SciPy - Cheatsheet
- SciPy - Useful Resources
- SciPy - Discussion
SciPy - Statistical Tests and Inference
Statistical tests and inference involve deriving conclusions about a population from sample data. These methodologies are fundamental for validating hypotheses, analyzing data trends, and making informed decisions in research, economics, engineering and many other fields. SciPys scipy.stats module offers a comprehensive set of tools to perform various statistical tests and data inferences.
Important Statistical Tests in SciPy
The scipy.stats library in Python includes a variety of functions to execute tests such as t-tests, chi-square tests and ANOVA, helping you validate assumptions and test hypotheses in different applications.
SciPy provides several statistical tests designed to assess different types of data and determine if observed differences or relationships are statistically significant. These tests play a critical role in hypothesis testing and analysis.
t-Test
A t-test is used to assess whether the means of two groups are different from one another typically applied in situations like comparing the results of two sample groups. The function scipy.stats.ttest_ind() can be used to perform a t-test on two independent samples.
The following example demonstrates how to perform a t-test on two datasets −
from scipy.stats import ttest_ind import numpy as np # Generate sample data group1 = np.random.normal(0, 1, 100) group2 = np.random.normal(0.5, 1, 100) # Conduct the t-test stat, p_value = ttest_ind(group1, group2) print(f"t-statistic: {stat:.4f}") print(f"p-value: {p_value:.4f}")
Here is the result of the t-test showing the t-statistic and p-value which help us to determine if the differences between the two groups are statistically significant −
t-statistic: -3.1020 p-value: 0.0022
Chi-Squared Test
The Chi-Squared Test is typically used to analyze categorical data, determining whether there is an association between two categorical variables. It's useful in situations like contingency tables where data is grouped into categories.
To perform Chi-Squared Test, SciPy provides the scipy.stats.chi2_contingency() function −
from scipy.stats import chi2_contingency import numpy as np # Example data in a contingency table data = np.array([[10, 20], [20, 30]]) # Run the chi-squared test chi2_stat, p_val, dof, expected = chi2_contingency(data) print(f"Chi-squared statistic: {chi2_stat:.4f}") print(f"p-value: {p_val:.4f}") print(f"Degrees of freedom: {dof}") print(f"Expected values: \n{expected}")
Below is the output of the Chi-squared test showing the statistic, p-value, degrees of freedom, and expected values:
Chi-squared statistic: 0.1280 p-value: 0.7205 Degrees of freedom: 1 Expected values: [[11.25 18.75] [18.75 31.25]]
ANOVA (Analysis of Variance)
ANOVA tests whether there are significant differences among the means of three or more groups. It's useful when comparing multiple datasets to determine if at least one of them is different from the others.
To perform a one-way ANOVA we can use the scipy.stats.f_oneway() function, following is the example which performs the Annova test −
from scipy.stats import f_oneway import numpy as np # Example data from three groups group1 = np.random.normal(0, 1, 100) group2 = np.random.normal(1, 1, 100) group3 = np.random.normal(2, 1, 100) # Run one-way ANOVA f_stat, p_value = f_oneway(group1, group2, group3) print(f"F-statistic: {f_stat:.4f}") print(f"p-value: {p_value:.4f}")
Heres the result of the ANOVA test showing the F-statistic and p-value, which help us assess whether the group means are statistically different:
F-statistic: 75.5012 p-value: 0.0000
Normality Tests
To determine if a dataset follows a normal distribution we can use normality tests like the Shapiro-Wilk Test or D'Agostino and Pearson's Test available in SciPy. The scipy.stats.shapiro() function conducts the Shapiro-Wilk test to check normality −
from scipy.stats import shapiro import numpy as np # Example data data = np.random.normal(0, 1, 100) # Perform Shapiro-Wilk normality test stat, p_value = shapiro(data) print(f"Test statistic: {stat:.4f}") print(f"p-value: {p_value:.4f}")
Following is the output of the Shapiro-Wilk test helps to evaluate if the sample data is consistent with a normal distribution −
Test statistic: 0.9878 p-value: 0.4939
Using Statistical Inference in SciPy
SciPy provides essential tools for making inferences about a population from sample data, such as −
- p-value: This is used to determine the statistical significance of test results. A p-value below a threshold (commonly 0.05) suggests a significant result.
- Confidence Intervals: Estimate the range in which a population parameter (such as the mean) lies based on sample data.
- Effect Size: Quantifies the magnitude of an observed effect or difference.
Using these methods the researchers can perform thorough statistical analyses and make decisions backed by solid evidence from their data.