
- SciPy - Home
- SciPy - Introduction
- SciPy - Environment Setup
- SciPy - Basic Functionality
- SciPy - Relationship with NumPy
- SciPy Clusters
- SciPy - Clusters
- SciPy - Hierarchical Clustering
- SciPy - K-means Clustering
- SciPy - Distance Metrics
- SciPy Constants
- SciPy - Constants
- SciPy - Mathematical Constants
- SciPy - Physical Constants
- SciPy - Unit Conversion
- SciPy - Astronomical Constants
- SciPy - Fourier Transforms
- SciPy - FFTpack
- SciPy - Discrete Fourier Transform (DFT)
- SciPy - Fast Fourier Transform (FFT)
- SciPy Integration Equations
- SciPy - Integrate Module
- SciPy - Single Integration
- SciPy - Double Integration
- SciPy - Triple Integration
- SciPy - Multiple Integration
- SciPy Differential Equations
- SciPy - Differential Equations
- SciPy - Integration of Stochastic Differential Equations
- SciPy - Integration of Ordinary Differential Equations
- SciPy - Discontinuous Functions
- SciPy - Oscillatory Functions
- SciPy - Partial Differential Equations
- SciPy Interpolation
- SciPy - Interpolate
- SciPy - Linear 1-D Interpolation
- SciPy - Polynomial 1-D Interpolation
- SciPy - Spline 1-D Interpolation
- SciPy - Grid Data Multi-Dimensional Interpolation
- SciPy - RBF Multi-Dimensional Interpolation
- SciPy - Polynomial & Spline Interpolation
- SciPy Curve Fitting
- SciPy - Curve Fitting
- SciPy - Linear Curve Fitting
- SciPy - Non-Linear Curve Fitting
- SciPy - Input & Output
- SciPy - Input & Output
- SciPy - Reading & Writing Files
- SciPy - Working with Different File Formats
- SciPy - Efficient Data Storage with HDF5
- SciPy - Data Serialization
- SciPy Linear Algebra
- SciPy - Linalg
- SciPy - Matrix Creation & Basic Operations
- SciPy - Matrix LU Decomposition
- SciPy - Matrix QU Decomposition
- SciPy - Singular Value Decomposition
- SciPy - Cholesky Decomposition
- SciPy - Solving Linear Systems
- SciPy - Eigenvalues & Eigenvectors
- SciPy Image Processing
- SciPy - Ndimage
- SciPy - Reading & Writing Images
- SciPy - Image Transformation
- SciPy - Filtering & Edge Detection
- SciPy - Top Hat Filters
- SciPy - Morphological Filters
- SciPy - Low Pass Filters
- SciPy - High Pass Filters
- SciPy - Bilateral Filter
- SciPy - Median Filter
- SciPy - Non - Linear Filters in Image Processing
- SciPy - High Boost Filter
- SciPy - Laplacian Filter
- SciPy - Morphological Operations
- SciPy - Image Segmentation
- SciPy - Thresholding in Image Segmentation
- SciPy - Region-Based Segmentation
- SciPy - Connected Component Labeling
- SciPy Optimize
- SciPy - Optimize
- SciPy - Special Matrices & Functions
- SciPy - Unconstrained Optimization
- SciPy - Constrained Optimization
- SciPy - Matrix Norms
- SciPy - Sparse Matrix
- SciPy - Frobenius Norm
- SciPy - Spectral Norm
- SciPy Condition Numbers
- SciPy - Condition Numbers
- SciPy - Linear Least Squares
- SciPy - Non-Linear Least Squares
- SciPy - Finding Roots of Scalar Functions
- SciPy - Finding Roots of Multivariate Functions
- SciPy - Signal Processing
- SciPy - Signal Filtering & Smoothing
- SciPy - Short-Time Fourier Transform
- SciPy - Wavelet Transform
- SciPy - Continuous Wavelet Transform
- SciPy - Discrete Wavelet Transform
- SciPy - Wavelet Packet Transform
- SciPy - Multi-Resolution Analysis
- SciPy - Stationary Wavelet Transform
- SciPy - Statistical Functions
- SciPy - Stats
- SciPy - Descriptive Statistics
- SciPy - Continuous Probability Distributions
- SciPy - Discrete Probability Distributions
- SciPy - Statistical Tests & Inference
- SciPy - Generating Random Samples
- SciPy - Kaplan-Meier Estimator Survival Analysis
- SciPy - Cox Proportional Hazards Model Survival Analysis
- SciPy Spatial Data
- SciPy - Spatial
- SciPy - Special Functions
- SciPy - Special Package
- SciPy Advanced Topics
- SciPy - CSGraph
- SciPy - ODR
- SciPy Useful Resources
- SciPy - Reference
- SciPy - Quick Guide
- SciPy - Cheatsheet
- SciPy - Useful Resources
- SciPy - Discussion
SciPy - Cox Proportional Hazards Model Survival Analysis
The Cox Proportional Hazards Model is a popular statistical method used for survival analysis. It helps estimate the effect of various variables on the time it takes for an event such as failure or death to occur. This model is particularly valuable when dealing with censored data where some individuals may not have experienced the event by the end of the study.
Although SciPy doesn't have a built-in Cox Proportional Hazards model, the lifelines library which is based on SciPy, offers comprehensive support for survival analysis, including the Cox Proportional Hazards model.
Steps of Cox Proportional Hazards in Python using lifelines
Following are the steps that need to be followed to implement the Cox Proportional Hazards in Python using lifelines −
Install Lifeline Library
First we have to install the lifeline library by executing the below command in the command prompt, if haven't installed before −
pip install lifelines
Importing the Libraries
After that we have to import all the necessary libraries −
import pandas as pd import numpy as np from lifelines import CoxPHFitter
Prepare our Data
For the Cox Proportional Hazards model our dataset should include at least two components which are mentioned as follows −
- Duration, which is the time until the event or censoring occurs.
- Event/Censoring Indicator, 1 if the event occurred, 0 if the observation was censored.
Below is the example dataset to implement the Cox Proportional Hazards Model −
# Example dataset data = { 'age': [60, 65, 70, 80, 85], 'sex': [1, 0, 1, 1, 0], # 1 for male, 0 for female 'duration': [5, 6, 7, 8, 9], # Duration in years 'event': [1, 0, 1, 1, 0] # 1 for event (e.g., death), 0 for censored } # Create a DataFrame df = pd.DataFrame(data)
Fit the ox Proportional Hazards Model
Here we will display a summary of the model with estimated coefficients, standard errors, z-scores and p-values for each predictor variable which helps us to understand their effects on the hazard ratio.
# Instantiate the Cox Proportional Hazards model cph = CoxPHFitter() # Fit the model with the dataset cph.fit(df, duration_col='duration', event_col='event') # Print the model summary cph.print_summary()
Making Predictions
Once the model is fitted we can predict survival functions for new data or calculate the cumulative hazard.
# Predict the survival function for a new individual new_data = pd.DataFrame({ 'age': [75], 'sex': [1], }) # Predict survival for the new individual survival_function = cph.predict_survival_function(new_data) print(survival_function)
Example
Following is the example which simulates a dataset with different risk factors such as age, gender, smoking status and analyzes their effect on survival time −
import pandas as pd from lifelines import CoxPHFitter # Simulated dataset: Survival data with age, sex, and smoking status data = { 'age': [60, 62, 65, 70, 72, 75, 80, 85], # Age of individuals 'sex': [1, 0, 1, 1, 0, 1, 0, 1], # 1 for male, 0 for female 'smoking_status': [1, 0, 1, 0, 1, 0, 1, 0], # 1 for smoker, 0 for non-smoker 'duration': [5, 6, 7, 8, 9, 5, 6, 7], # Survival time in years 'event': [1, 1, 0, 1, 1, 0, 1, 1] # 1 for event (e.g., death), 0 for censored } # Create DataFrame df = pd.DataFrame(data) # Instantiate the Cox Proportional Hazards model cph = CoxPHFitter() # Fit the model with the dataset cph.fit(df, duration_col='duration', event_col='event') # Print the model summary to analyze the results cph.print_summary() # Predict the survival function for a new individual (e.g., 70-year-old smoker) new_data = pd.DataFrame({ 'age': [70], 'sex': [1], # Male 'smoking_status': [1], # Smoker }) # Predict survival for the new individual survival_function = cph.predict_survival_function(new_data) # Print the survival function for the new individual print(survival_function)
Here is the output of the above example −
< lifelines.CoxPHFitter: fitted with 8 total observations, 2 right-censored observations > duration col = 'duration' event col = 'event' baseline estimation = breslow number of observations = 8 number of events observed = 6 partial log-likelihood = -7.39 time fit was run = 2025-01-31 12:06:31 UTC --- coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95% covariate age -0.02 0.98 0.06 -0.14 0.11 0.87 1.12 sex -0.23 0.79 1.06 -2.31 1.84 0.10 6.32 smoking_status -0.55 0.58 1.03 -2.57 1.47 0.08 4.33 cmp to z p -log2(p) covariate age 0.00 -0.26 0.80 0.33 sex 0.00 -0.22 0.83 0.28 smoking_status 0.00 -0.54 0.59 0.76 --- Concordance = 0.47 Partial AIC = 20.79 log-likelihood ratio test = 0.33 on 3 df -log2(p) of ll-ratio test = 0.07 0 5.0 0.918415 6.0 0.734865 7.0 0.610552 8.0 0.435392 9.0 0.191870