
- SciPy - Home
- SciPy - Introduction
- SciPy - Environment Setup
- SciPy - Basic Functionality
- SciPy - Relationship with NumPy
- SciPy Clusters
- SciPy - Clusters
- SciPy - Hierarchical Clustering
- SciPy - K-means Clustering
- SciPy - Distance Metrics
- SciPy Constants
- SciPy - Constants
- SciPy - Mathematical Constants
- SciPy - Physical Constants
- SciPy - Unit Conversion
- SciPy - Astronomical Constants
- SciPy - Fourier Transforms
- SciPy - FFTpack
- SciPy - Discrete Fourier Transform (DFT)
- SciPy - Fast Fourier Transform (FFT)
- SciPy Integration Equations
- SciPy - Integrate Module
- SciPy - Single Integration
- SciPy - Double Integration
- SciPy - Triple Integration
- SciPy - Multiple Integration
- SciPy Differential Equations
- SciPy - Differential Equations
- SciPy - Integration of Stochastic Differential Equations
- SciPy - Integration of Ordinary Differential Equations
- SciPy - Discontinuous Functions
- SciPy - Oscillatory Functions
- SciPy - Partial Differential Equations
- SciPy Interpolation
- SciPy - Interpolate
- SciPy - Linear 1-D Interpolation
- SciPy - Polynomial 1-D Interpolation
- SciPy - Spline 1-D Interpolation
- SciPy - Grid Data Multi-Dimensional Interpolation
- SciPy - RBF Multi-Dimensional Interpolation
- SciPy - Polynomial & Spline Interpolation
- SciPy Curve Fitting
- SciPy - Curve Fitting
- SciPy - Linear Curve Fitting
- SciPy - Non-Linear Curve Fitting
- SciPy - Input & Output
- SciPy - Input & Output
- SciPy - Reading & Writing Files
- SciPy - Working with Different File Formats
- SciPy - Efficient Data Storage with HDF5
- SciPy - Data Serialization
- SciPy Linear Algebra
- SciPy - Linalg
- SciPy - Matrix Creation & Basic Operations
- SciPy - Matrix LU Decomposition
- SciPy - Matrix QU Decomposition
- SciPy - Singular Value Decomposition
- SciPy - Cholesky Decomposition
- SciPy - Solving Linear Systems
- SciPy - Eigenvalues & Eigenvectors
- SciPy Image Processing
- SciPy - Ndimage
- SciPy - Reading & Writing Images
- SciPy - Image Transformation
- SciPy - Filtering & Edge Detection
- SciPy - Top Hat Filters
- SciPy - Morphological Filters
- SciPy - Low Pass Filters
- SciPy - High Pass Filters
- SciPy - Bilateral Filter
- SciPy - Median Filter
- SciPy - Non - Linear Filters in Image Processing
- SciPy - High Boost Filter
- SciPy - Laplacian Filter
- SciPy - Morphological Operations
- SciPy - Image Segmentation
- SciPy - Thresholding in Image Segmentation
- SciPy - Region-Based Segmentation
- SciPy - Connected Component Labeling
- SciPy Optimize
- SciPy - Optimize
- SciPy - Special Matrices & Functions
- SciPy - Unconstrained Optimization
- SciPy - Constrained Optimization
- SciPy - Matrix Norms
- SciPy - Sparse Matrix
- SciPy - Frobenius Norm
- SciPy - Spectral Norm
- SciPy Condition Numbers
- SciPy - Condition Numbers
- SciPy - Linear Least Squares
- SciPy - Non-Linear Least Squares
- SciPy - Finding Roots of Scalar Functions
- SciPy - Finding Roots of Multivariate Functions
- SciPy - Signal Processing
- SciPy - Signal Filtering & Smoothing
- SciPy - Short-Time Fourier Transform
- SciPy - Wavelet Transform
- SciPy - Continuous Wavelet Transform
- SciPy - Discrete Wavelet Transform
- SciPy - Wavelet Packet Transform
- SciPy - Multi-Resolution Analysis
- SciPy - Stationary Wavelet Transform
- SciPy - Statistical Functions
- SciPy - Stats
- SciPy - Descriptive Statistics
- SciPy - Continuous Probability Distributions
- SciPy - Discrete Probability Distributions
- SciPy - Statistical Tests & Inference
- SciPy - Generating Random Samples
- SciPy - Kaplan-Meier Estimator Survival Analysis
- SciPy - Cox Proportional Hazards Model Survival Analysis
- SciPy Spatial Data
- SciPy - Spatial
- SciPy - Special Functions
- SciPy - Special Package
- SciPy Advanced Topics
- SciPy - CSGraph
- SciPy - ODR
- SciPy Useful Resources
- SciPy - Reference
- SciPy - Quick Guide
- SciPy - Cheatsheet
- SciPy - Useful Resources
- SciPy - Discussion
SciPy - Descriptive Statistics
Descriptive statistics is a branch of statistics that focuses on summarizing and organizing data to reveal meaningful insights. It helps in understanding the distribution, central tendency and variability of data. The Python library SciPy, particularly its stats module provides various functions to compute descriptive statistics efficiently.
Key Measures in Descriptive Statistics
Descriptive statistics are used to summarize and describe the main features of a dataset. These measures fall into three main categories as follows −
Measures of Central Tendency in SciPy
Measures of central tendency summarize a dataset by identifying a single value that represents the center or "typical" value of the data. The three main measures of central tendency as mentioned below −
Mean (Arithmetic Average)
The mean is calculated by summing all data points and dividing by the total number of points. It is sensitive to outliers which can significantly affect its value. The formula for Mean is given as below −
Mean = ½ (∑ X) / N
Below is the example of finding Mean by the function with the help of scipy.stats.tmean() function −
from scipy import stats data = [10, 20, 30, 40, 50] # Calculate mean using SciPy mean_value = stats.tmean(data) print("Mean:", mean_value)
Here is the output of Mean with the help of scipy.stats.tmean() function −
Mean: 30.0
Median
The median is the value that falls in the center of a sorted dataset. When there is an even number of data points then the median is calculated as the average of the two middle values. Unlike the mean, the median is less affected by outliers.
Here is the example which calculates the median with the help of scipy.stats.scoreatpercentile() function −
from scipy import stats # Sample data data = [10, 20, 30, 40, 50] # Calculate median using SciPy's scoreatpercentile median_value = stats.scoreatpercentile(data, 50) print("Median:", median_value)
Below is the output of the median calculated using the function scipy.stats.scoreatpercentile() −
Median: 30.0
Mode
The mode is the value that occurs most frequently in the dataset. If there is more than one mode, it is referred to as multimodal.
Following is the example which calculates the Mode with the help of scipy.stats.mode() function −
from scipy import stats # Sample data data = [10, 20, 20, 30, 40] # Calculate mode using SciPy mode_value = stats.mode(data) # Access mode and count correctly print("Mode:", mode_value.mode, "Frequency:", mode_value.count)
Below is the output of the Mode calculated using the function scipy.stats.mode() −
Mode: 20 Frequency: 2
Measures of Dispersion in SciPy
Measures of dispersion indicate how data values are spread out or dispersed within a dataset. They help determine the variability or consistency of data points relative to each other. The key measures of dispersion are described below −
Range
The range is the simplest way to measure dispersion, calculated by subtracting the smallest value from the largest value in the dataset. Although it gives a quick sense of data spread, it is highly influenced by outliers.
Here is an example that shows how to compute the range using the numpy.ptp() function −
# Sample data data = [10, 20, 20, 30, 40] range_value = max(data) - min(data) print("Range:", range_value)
Here is the output of the range calculation −
Range: 30
Variance
Variance measures how much the data values deviate from the mean. It is computed by averaging the squared differences between each data point and the mean value. A higher variance indicates more spread-out data.
The mathematical representation of variance is given below −
Variance = ½ (∑ (X - Mean)2) ÷ N
The following example calculates variance using the scipy.stats.tvar() function −
from scipy import stats data = [10, 20, 30, 40, 50] # Calculate variance using SciPy variance_value = stats.tvar(data) print("Variance:", variance_value)
Here is the output of the variance calculation using scipy.stats.tvar() function −
Variance: 250.0
Standard Deviation
Standard deviation is derived from the variance and provides a measure of data dispersion in the same units as the original dataset. It indicates how much the values differ from the mean.
Below example shows how to compute the standard deviation using the scipy.stats.tstd() function −
from scipy import stats data = [10, 20, 30, 40, 50] # Calculate standard deviation using SciPy std_deviation = stats.tstd(data) print("Standard Deviation:", std_deviation)
Below is the output of the standard deviation calculation using scipy.stats.tstd() function −
Standard Deviation: 15.811388300841896
Skewness
Skewness measures the asymmetry of a dataset's distribution around its mean. If the skewness is positive, it indicates that the data has a long right tail (positive skew) whereas a negative skew indicates a long left tail (negative skew). The formula for calculating skewness is given below −
Skewness = (n ∑i (Xi - X)3) / ((n - 1) s3)
Below is an example of how to calculate Skewness using the scipy.stats.skew() function −
from scipy import stats data = [10, 20, 20, 30, 40, 50, 60] # Calculate skewness using SciPy skewness_value = stats.skew(data) print("Skewness:", skewness_value)
Here is the output when calculating Skewness using the function scipy.stats.skew() −
Skewness: 0.28372927689018057
Kurtosis
Kurtosis measures the heaviness of the tails of a data distribution. High kurtosis suggests the presence of outliers or extreme values while low kurtosis indicates a distribution with fewer outliers. The formula for calculating kurtosis is given below −
Kurtosis = &frac{n ∑ (Xi - X)4}{(n - 1) · s4}
Below is an example of calculating Kurtosis using the scipy.stats.kurtosis() function −
from scipy import stats data = [10, 20, 20, 30, 40, 50, 60] # Calculate kurtosis using SciPy kurtosis_value = stats.kurtosis(data) print("Kurtosis:", kurtosis_value)
Here is the output when calculating Kurtosis using the function scipy.stats.kurtosis() −
Kurtosis: -1.2208044982698956