SciPy - Home
SciPy - Introduction
SciPy - Environment Setup
SciPy - Basic Functionality
SciPy - Relationship with NumPy
SciPy Clusters
SciPy - Clusters
SciPy - Hierarchical Clustering
SciPy - K-means Clustering
SciPy - Distance Metrics
SciPy Constants
SciPy - Constants
SciPy - Mathematical Constants
SciPy - Physical Constants
SciPy - Unit Conversion
SciPy - Astronomical Constants
SciPy - Fourier Transforms
SciPy - FFTpack
SciPy - Discrete Fourier Transform (DFT)
SciPy - Fast Fourier Transform (FFT)
SciPy Integration Equations
SciPy - Integrate Module
SciPy - Single Integration
SciPy - Double Integration
SciPy - Triple Integration
SciPy - Multiple Integration
SciPy Differential Equations
SciPy - Differential Equations
SciPy - Integration of Stochastic Differential Equations
SciPy - Integration of Ordinary Differential Equations
SciPy - Discontinuous Functions
SciPy - Oscillatory Functions
SciPy - Partial Differential Equations
SciPy Interpolation
SciPy - Interpolate
SciPy - Linear 1-D Interpolation
SciPy - Polynomial 1-D Interpolation
SciPy - Spline 1-D Interpolation
SciPy - Grid Data Multi-Dimensional Interpolation
SciPy - RBF Multi-Dimensional Interpolation
SciPy - Polynomial & Spline Interpolation
SciPy Curve Fitting
SciPy - Curve Fitting
SciPy - Linear Curve Fitting
SciPy - Non-Linear Curve Fitting
SciPy - Input & Output
SciPy - Input & Output
SciPy - Reading & Writing Files
SciPy - Working with Different File Formats
SciPy - Efficient Data Storage with HDF5
SciPy - Data Serialization
SciPy Linear Algebra
SciPy - Linalg
SciPy - Matrix Creation & Basic Operations
SciPy - Matrix LU Decomposition
SciPy - Matrix QU Decomposition
SciPy - Singular Value Decomposition
SciPy - Cholesky Decomposition
SciPy - Solving Linear Systems
SciPy - Eigenvalues & Eigenvectors
SciPy Image Processing
SciPy - Ndimage
SciPy - Reading & Writing Images
SciPy - Image Transformation
SciPy - Filtering & Edge Detection
SciPy - Top Hat Filters
SciPy - Morphological Filters
SciPy - Low Pass Filters
SciPy - High Pass Filters
SciPy - Bilateral Filter
SciPy - Median Filter
SciPy - Non - Linear Filters in Image Processing
SciPy - High Boost Filter
SciPy - Laplacian Filter
SciPy - Morphological Operations
SciPy - Image Segmentation
SciPy - Thresholding in Image Segmentation
SciPy - Region-Based Segmentation
SciPy - Connected Component Labeling
SciPy Optimize
SciPy - Optimize
SciPy - Special Matrices & Functions
SciPy - Unconstrained Optimization
SciPy - Constrained Optimization
SciPy - Matrix Norms
SciPy - Sparse Matrix
SciPy - Frobenius Norm
SciPy - Spectral Norm
SciPy Condition Numbers
SciPy - Condition Numbers
SciPy - Linear Least Squares
SciPy - Non-Linear Least Squares
SciPy - Finding Roots of Scalar Functions
SciPy - Finding Roots of Multivariate Functions
SciPy - Signal Processing
SciPy - Signal Filtering & Smoothing
SciPy - Short-Time Fourier Transform
SciPy - Wavelet Transform
SciPy - Continuous Wavelet Transform
SciPy - Discrete Wavelet Transform
SciPy - Wavelet Packet Transform
SciPy - Multi-Resolution Analysis
SciPy - Stationary Wavelet Transform
SciPy - Statistical Functions
SciPy - Stats
SciPy - Descriptive Statistics
SciPy - Continuous Probability Distributions
SciPy - Discrete Probability Distributions
SciPy - Statistical Tests & Inference
SciPy - Generating Random Samples
SciPy - Kaplan-Meier Estimator Survival Analysis
SciPy - Cox Proportional Hazards Model Survival Analysis
SciPy Spatial Data
SciPy - Spatial
SciPy - Special Functions
SciPy - Special Package
SciPy Advanced Topics
SciPy - CSGraph
SciPy - ODR
SciPy Useful Resources
SciPy - Reference
SciPy - Quick Guide
SciPy - Cheatsheet
SciPy - Useful Resources
SciPy - Discussion

SciPy - Descriptive Statistics

Quiz

Descriptive statistics is a branch of statistics that focuses on summarizing and organizing data to reveal meaningful insights. It helps in understanding the distribution, central tendency and variability of data. The Python library SciPy, particularly its stats module provides various functions to compute descriptive statistics efficiently.

Key Measures in Descriptive Statistics

Descriptive statistics are used to summarize and describe the main features of a dataset. These measures fall into three main categories as follows −

Measures of Central Tendency in SciPy

Measures of central tendency summarize a dataset by identifying a single value that represents the center or "typical" value of the data. The three main measures of central tendency as mentioned below −

Mean (Arithmetic Average)

The mean is calculated by summing all data points and dividing by the total number of points. It is sensitive to outliers which can significantly affect its value. The formula for Mean is given as below −

Mean = ½ (∑ X) / N

Below is the example of finding Mean by the function with the help of scipy.stats.tmean() function −

from scipy import stats

data = [10, 20, 30, 40, 50]

# Calculate mean using SciPy
mean_value = stats.tmean(data)
print("Mean:", mean_value)

Here is the output of Mean with the help of scipy.stats.tmean() function −

Mean: 30.0

Median

The median is the value that falls in the center of a sorted dataset. When there is an even number of data points then the median is calculated as the average of the two middle values. Unlike the mean, the median is less affected by outliers.

Here is the example which calculates the median with the help of scipy.stats.scoreatpercentile() function −

from scipy import stats

# Sample data
data = [10, 20, 30, 40, 50]

# Calculate median using SciPy's scoreatpercentile
median_value = stats.scoreatpercentile(data, 50)
print("Median:", median_value)

Below is the output of the median calculated using the function scipy.stats.scoreatpercentile() −

Median: 30.0

Mode

The mode is the value that occurs most frequently in the dataset. If there is more than one mode, it is referred to as multimodal.

Following is the example which calculates the Mode with the help of scipy.stats.mode() function −

from scipy import stats

# Sample data
data = [10, 20, 20, 30, 40]

# Calculate mode using SciPy
mode_value = stats.mode(data)

# Access mode and count correctly
print("Mode:", mode_value.mode, "Frequency:", mode_value.count)

Below is the output of the Mode calculated using the function scipy.stats.mode() −

Mode: 20 Frequency: 2

Measures of Dispersion in SciPy

Measures of dispersion indicate how data values are spread out or dispersed within a dataset. They help determine the variability or consistency of data points relative to each other. The key measures of dispersion are described below −

Range

The range is the simplest way to measure dispersion, calculated by subtracting the smallest value from the largest value in the dataset. Although it gives a quick sense of data spread, it is highly influenced by outliers.

Here is an example that shows how to compute the range using the numpy.ptp() function −

# Sample data
data = [10, 20, 20, 30, 40]

range_value = max(data) - min(data)
print("Range:", range_value)

Here is the output of the range calculation −

Range: 30

Variance

Variance measures how much the data values deviate from the mean. It is computed by averaging the squared differences between each data point and the mean value. A higher variance indicates more spread-out data.

The mathematical representation of variance is given below −

Variance = ½ (∑ (X - Mean)²) ÷ N

The following example calculates variance using the scipy.stats.tvar() function −

from scipy import stats

data = [10, 20, 30, 40, 50]

# Calculate variance using SciPy
variance_value = stats.tvar(data)
print("Variance:", variance_value)

Here is the output of the variance calculation using scipy.stats.tvar() function −

Variance: 250.0

Standard Deviation

Standard deviation is derived from the variance and provides a measure of data dispersion in the same units as the original dataset. It indicates how much the values differ from the mean.

Below example shows how to compute the standard deviation using the scipy.stats.tstd() function −

from scipy import stats

data = [10, 20, 30, 40, 50]

# Calculate standard deviation using SciPy
std_deviation = stats.tstd(data)
print("Standard Deviation:", std_deviation)

Below is the output of the standard deviation calculation using scipy.stats.tstd() function −

Standard Deviation: 15.811388300841896

Skewness

Skewness measures the asymmetry of a dataset's distribution around its mean. If the skewness is positive, it indicates that the data has a long right tail (positive skew) whereas a negative skew indicates a long left tail (negative skew). The formula for calculating skewness is given below −

Skewness = (n ∑_i (X_i - X)³) / ((n - 1) s³)

Below is an example of how to calculate Skewness using the scipy.stats.skew() function −

from scipy import stats

data = [10, 20, 20, 30, 40, 50, 60]

# Calculate skewness using SciPy
skewness_value = stats.skew(data)
print("Skewness:", skewness_value)

Here is the output when calculating Skewness using the function scipy.stats.skew() −

Skewness: 0.28372927689018057

Kurtosis

Kurtosis measures the heaviness of the tails of a data distribution. High kurtosis suggests the presence of outliers or extreme values while low kurtosis indicates a distribution with fewer outliers. The formula for calculating kurtosis is given below −

Kurtosis = &frac{n ∑ (X_i - X)⁴}{(n - 1) · s⁴}

Below is an example of calculating Kurtosis using the scipy.stats.kurtosis() function −

from scipy import stats

data = [10, 20, 20, 30, 40, 50, 60]

# Calculate kurtosis using SciPy
kurtosis_value = stats.kurtosis(data)
print("Kurtosis:", kurtosis_value)

Here is the output when calculating Kurtosis using the function scipy.stats.kurtosis() −

Kurtosis: -1.2208044982698956

Print Page