SciPy - Home
SciPy - Introduction
SciPy - Environment Setup
SciPy - Basic Functionality
SciPy - Relationship with NumPy
SciPy Clusters
SciPy - Clusters
SciPy - Hierarchical Clustering
SciPy - K-means Clustering
SciPy - Distance Metrics
SciPy Constants
SciPy - Constants
SciPy - Mathematical Constants
SciPy - Physical Constants
SciPy - Unit Conversion
SciPy - Astronomical Constants
SciPy - Fourier Transforms
SciPy - FFTpack
SciPy - Discrete Fourier Transform (DFT)
SciPy - Fast Fourier Transform (FFT)
SciPy Integration Equations
SciPy - Integrate Module
SciPy - Single Integration
SciPy - Double Integration
SciPy - Triple Integration
SciPy - Multiple Integration
SciPy Differential Equations
SciPy - Differential Equations
SciPy - Integration of Stochastic Differential Equations
SciPy - Integration of Ordinary Differential Equations
SciPy - Discontinuous Functions
SciPy - Oscillatory Functions
SciPy - Partial Differential Equations
SciPy Interpolation
SciPy - Interpolate
SciPy - Linear 1-D Interpolation
SciPy - Polynomial 1-D Interpolation
SciPy - Spline 1-D Interpolation
SciPy - Grid Data Multi-Dimensional Interpolation
SciPy - RBF Multi-Dimensional Interpolation
SciPy - Polynomial & Spline Interpolation
SciPy Curve Fitting
SciPy - Curve Fitting
SciPy - Linear Curve Fitting
SciPy - Non-Linear Curve Fitting
SciPy - Input & Output
SciPy - Input & Output
SciPy - Reading & Writing Files
SciPy - Working with Different File Formats
SciPy - Efficient Data Storage with HDF5
SciPy - Data Serialization
SciPy Linear Algebra
SciPy - Linalg
SciPy - Matrix Creation & Basic Operations
SciPy - Matrix LU Decomposition
SciPy - Matrix QU Decomposition
SciPy - Singular Value Decomposition
SciPy - Cholesky Decomposition
SciPy - Solving Linear Systems
SciPy - Eigenvalues & Eigenvectors
SciPy Image Processing
SciPy - Ndimage
SciPy - Reading & Writing Images
SciPy - Image Transformation
SciPy - Filtering & Edge Detection
SciPy - Top Hat Filters
SciPy - Morphological Filters
SciPy - Low Pass Filters
SciPy - High Pass Filters
SciPy - Bilateral Filter
SciPy - Median Filter
SciPy - Non - Linear Filters in Image Processing
SciPy - High Boost Filter
SciPy - Laplacian Filter
SciPy - Morphological Operations
SciPy - Image Segmentation
SciPy - Thresholding in Image Segmentation
SciPy - Region-Based Segmentation
SciPy - Connected Component Labeling
SciPy Optimize
SciPy - Optimize
SciPy - Special Matrices & Functions
SciPy - Unconstrained Optimization
SciPy - Constrained Optimization
SciPy - Matrix Norms
SciPy - Sparse Matrix
SciPy - Frobenius Norm
SciPy - Spectral Norm
SciPy Condition Numbers
SciPy - Condition Numbers
SciPy - Linear Least Squares
SciPy - Non-Linear Least Squares
SciPy - Finding Roots of Scalar Functions
SciPy - Finding Roots of Multivariate Functions
SciPy - Signal Processing
SciPy - Signal Filtering & Smoothing
SciPy - Short-Time Fourier Transform
SciPy - Wavelet Transform
SciPy - Continuous Wavelet Transform
SciPy - Discrete Wavelet Transform
SciPy - Wavelet Packet Transform
SciPy - Multi-Resolution Analysis
SciPy - Stationary Wavelet Transform
SciPy - Statistical Functions
SciPy - Stats
SciPy - Descriptive Statistics
SciPy - Continuous Probability Distributions
SciPy - Discrete Probability Distributions
SciPy - Statistical Tests & Inference
SciPy - Generating Random Samples
SciPy - Kaplan-Meier Estimator Survival Analysis
SciPy - Cox Proportional Hazards Model Survival Analysis
SciPy Spatial Data
SciPy - Spatial
SciPy - Special Functions
SciPy - Special Package
SciPy Advanced Topics
SciPy - CSGraph
SciPy - ODR
SciPy Useful Resources
SciPy - Reference
SciPy - Quick Guide
SciPy - Cheatsheet
SciPy - Useful Resources
SciPy - Discussion

SciPy - Statistical Tests and Inference

Quiz

Statistical tests and inference involve deriving conclusions about a population from sample data. These methodologies are fundamental for validating hypotheses, analyzing data trends, and making informed decisions in research, economics, engineering and many other fields. SciPys scipy.stats module offers a comprehensive set of tools to perform various statistical tests and data inferences.

Important Statistical Tests in SciPy

The scipy.stats library in Python includes a variety of functions to execute tests such as t-tests, chi-square tests and ANOVA, helping you validate assumptions and test hypotheses in different applications.

SciPy provides several statistical tests designed to assess different types of data and determine if observed differences or relationships are statistically significant. These tests play a critical role in hypothesis testing and analysis.

t-Test

A t-test is used to assess whether the means of two groups are different from one another typically applied in situations like comparing the results of two sample groups. The function scipy.stats.ttest_ind() can be used to perform a t-test on two independent samples.

The following example demonstrates how to perform a t-test on two datasets −

from scipy.stats import ttest_ind
import numpy as np

# Generate sample data
group1 = np.random.normal(0, 1, 100)
group2 = np.random.normal(0.5, 1, 100)

# Conduct the t-test
stat, p_value = ttest_ind(group1, group2)

print(f"t-statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")

Here is the result of the t-test showing the t-statistic and p-value which help us to determine if the differences between the two groups are statistically significant −

t-statistic: -3.1020
p-value: 0.0022

Chi-Squared Test

The Chi-Squared Test is typically used to analyze categorical data, determining whether there is an association between two categorical variables. It's useful in situations like contingency tables where data is grouped into categories.

To perform Chi-Squared Test, SciPy provides the scipy.stats.chi2_contingency() function −

from scipy.stats import chi2_contingency
import numpy as np

# Example data in a contingency table
data = np.array([[10, 20], [20, 30]])

# Run the chi-squared test
chi2_stat, p_val, dof, expected = chi2_contingency(data)

print(f"Chi-squared statistic: {chi2_stat:.4f}")
print(f"p-value: {p_val:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"Expected values: \n{expected}")

Below is the output of the Chi-squared test showing the statistic, p-value, degrees of freedom, and expected values:

Chi-squared statistic: 0.1280
p-value: 0.7205
Degrees of freedom: 1
Expected values:
[[11.25 18.75]
 [18.75 31.25]]

ANOVA (Analysis of Variance)

ANOVA tests whether there are significant differences among the means of three or more groups. It's useful when comparing multiple datasets to determine if at least one of them is different from the others.

To perform a one-way ANOVA we can use the scipy.stats.f_oneway() function, following is the example which performs the Annova test −

from scipy.stats import f_oneway
import numpy as np

# Example data from three groups
group1 = np.random.normal(0, 1, 100)
group2 = np.random.normal(1, 1, 100)
group3 = np.random.normal(2, 1, 100)

# Run one-way ANOVA
f_stat, p_value = f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Heres the result of the ANOVA test showing the F-statistic and p-value, which help us assess whether the group means are statistically different:

F-statistic: 75.5012
p-value: 0.0000

Normality Tests

To determine if a dataset follows a normal distribution we can use normality tests like the Shapiro-Wilk Test or D'Agostino and Pearson's Test available in SciPy. The scipy.stats.shapiro() function conducts the Shapiro-Wilk test to check normality −

from scipy.stats import shapiro
import numpy as np

# Example data
data = np.random.normal(0, 1, 100)

# Perform Shapiro-Wilk normality test
stat, p_value = shapiro(data)

print(f"Test statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")

Following is the output of the Shapiro-Wilk test helps to evaluate if the sample data is consistent with a normal distribution −

Test statistic: 0.9878
p-value: 0.4939

Using Statistical Inference in SciPy

SciPy provides essential tools for making inferences about a population from sample data, such as −

p-value: This is used to determine the statistical significance of test results. A p-value below a threshold (commonly 0.05) suggests a significant result.
Confidence Intervals: Estimate the range in which a population parameter (such as the mean) lies based on sample data.
Effect Size: Quantifies the magnitude of an observed effect or difference.

Using these methods the researchers can perform thorough statistical analyses and make decisions backed by solid evidence from their data.

Print Page