SciPy - Home
SciPy - Introduction
SciPy - Environment Setup
SciPy - Basic Functionality
SciPy - Relationship with NumPy
SciPy Clusters
SciPy - Clusters
SciPy - Hierarchical Clustering
SciPy - K-means Clustering
SciPy - Distance Metrics
SciPy Constants
SciPy - Constants
SciPy - Mathematical Constants
SciPy - Physical Constants
SciPy - Unit Conversion
SciPy - Astronomical Constants
SciPy - Fourier Transforms
SciPy - FFTpack
SciPy - Discrete Fourier Transform (DFT)
SciPy - Fast Fourier Transform (FFT)
SciPy Integration Equations
SciPy - Integrate Module
SciPy - Single Integration
SciPy - Double Integration
SciPy - Triple Integration
SciPy - Multiple Integration
SciPy Differential Equations
SciPy - Differential Equations
SciPy - Integration of Stochastic Differential Equations
SciPy - Integration of Ordinary Differential Equations
SciPy - Discontinuous Functions
SciPy - Oscillatory Functions
SciPy - Partial Differential Equations
SciPy Interpolation
SciPy - Interpolate
SciPy - Linear 1-D Interpolation
SciPy - Polynomial 1-D Interpolation
SciPy - Spline 1-D Interpolation
SciPy - Grid Data Multi-Dimensional Interpolation
SciPy - RBF Multi-Dimensional Interpolation
SciPy - Polynomial & Spline Interpolation
SciPy Curve Fitting
SciPy - Curve Fitting
SciPy - Linear Curve Fitting
SciPy - Non-Linear Curve Fitting
SciPy - Input & Output
SciPy - Input & Output
SciPy - Reading & Writing Files
SciPy - Working with Different File Formats
SciPy - Efficient Data Storage with HDF5
SciPy - Data Serialization
SciPy Linear Algebra
SciPy - Linalg
SciPy - Matrix Creation & Basic Operations
SciPy - Matrix LU Decomposition
SciPy - Matrix QU Decomposition
SciPy - Singular Value Decomposition
SciPy - Cholesky Decomposition
SciPy - Solving Linear Systems
SciPy - Eigenvalues & Eigenvectors
SciPy Image Processing
SciPy - Ndimage
SciPy - Reading & Writing Images
SciPy - Image Transformation
SciPy - Filtering & Edge Detection
SciPy - Top Hat Filters
SciPy - Morphological Filters
SciPy - Low Pass Filters
SciPy - High Pass Filters
SciPy - Bilateral Filter
SciPy - Median Filter
SciPy - Non - Linear Filters in Image Processing
SciPy - High Boost Filter
SciPy - Laplacian Filter
SciPy - Morphological Operations
SciPy - Image Segmentation
SciPy - Thresholding in Image Segmentation
SciPy - Region-Based Segmentation
SciPy - Connected Component Labeling
SciPy Optimize
SciPy - Optimize
SciPy - Special Matrices & Functions
SciPy - Unconstrained Optimization
SciPy - Constrained Optimization
SciPy - Matrix Norms
SciPy - Sparse Matrix
SciPy - Frobenius Norm
SciPy - Spectral Norm
SciPy Condition Numbers
SciPy - Condition Numbers
SciPy - Linear Least Squares
SciPy - Non-Linear Least Squares
SciPy - Finding Roots of Scalar Functions
SciPy - Finding Roots of Multivariate Functions
SciPy - Signal Processing
SciPy - Signal Filtering & Smoothing
SciPy - Short-Time Fourier Transform
SciPy - Wavelet Transform
SciPy - Continuous Wavelet Transform
SciPy - Discrete Wavelet Transform
SciPy - Wavelet Packet Transform
SciPy - Multi-Resolution Analysis
SciPy - Stationary Wavelet Transform
SciPy - Statistical Functions
SciPy - Stats
SciPy - Descriptive Statistics
SciPy - Continuous Probability Distributions
SciPy - Discrete Probability Distributions
SciPy - Statistical Tests & Inference
SciPy - Generating Random Samples
SciPy - Kaplan-Meier Estimator Survival Analysis
SciPy - Cox Proportional Hazards Model Survival Analysis
SciPy Spatial Data
SciPy - Spatial
SciPy - Special Functions
SciPy - Special Package
SciPy Advanced Topics
SciPy - CSGraph
SciPy - ODR
SciPy Useful Resources
SciPy - Reference
SciPy - Quick Guide
SciPy - Cheatsheet
SciPy - Useful Resources
SciPy - Discussion

SciPy - Cox Proportional Hazards Model Survival Analysis

Quiz

The Cox Proportional Hazards Model is a popular statistical method used for survival analysis. It helps estimate the effect of various variables on the time it takes for an event such as failure or death to occur. This model is particularly valuable when dealing with censored data where some individuals may not have experienced the event by the end of the study.

Although SciPy doesn't have a built-in Cox Proportional Hazards model, the lifelines library which is based on SciPy, offers comprehensive support for survival analysis, including the Cox Proportional Hazards model.

Steps of Cox Proportional Hazards in Python using lifelines

Following are the steps that need to be followed to implement the Cox Proportional Hazards in Python using lifelines −

Install Lifeline Library

First we have to install the lifeline library by executing the below command in the command prompt, if haven't installed before −

pip install lifelines

Importing the Libraries

After that we have to import all the necessary libraries −

import pandas as pd
import numpy as np
from lifelines import CoxPHFitter

Prepare our Data

For the Cox Proportional Hazards model our dataset should include at least two components which are mentioned as follows −

Duration, which is the time until the event or censoring occurs.
Event/Censoring Indicator, 1 if the event occurred, 0 if the observation was censored.

Below is the example dataset to implement the Cox Proportional Hazards Model −

# Example dataset
data = {
    'age': [60, 65, 70, 80, 85],
    'sex': [1, 0, 1, 1, 0],  # 1 for male, 0 for female
    'duration': [5, 6, 7, 8, 9],  # Duration in years
    'event': [1, 0, 1, 1, 0]  # 1 for event (e.g., death), 0 for censored
}
# Create a DataFrame
df = pd.DataFrame(data)

Fit the ox Proportional Hazards Model

Here we will display a summary of the model with estimated coefficients, standard errors, z-scores and p-values for each predictor variable which helps us to understand their effects on the hazard ratio.

# Instantiate the Cox Proportional Hazards model
cph = CoxPHFitter()

# Fit the model with the dataset
cph.fit(df, duration_col='duration', event_col='event')

# Print the model summary
cph.print_summary()

Making Predictions

Once the model is fitted we can predict survival functions for new data or calculate the cumulative hazard.

# Predict the survival function for a new individual
new_data = pd.DataFrame({
    'age': [75],
    'sex': [1],
})

# Predict survival for the new individual
survival_function = cph.predict_survival_function(new_data)
print(survival_function)

Example

Following is the example which simulates a dataset with different risk factors such as age, gender, smoking status and analyzes their effect on survival time −

import pandas as pd
from lifelines import CoxPHFitter

# Simulated dataset: Survival data with age, sex, and smoking status
data = {
    'age': [60, 62, 65, 70, 72, 75, 80, 85],      # Age of individuals
    'sex': [1, 0, 1, 1, 0, 1, 0, 1],              # 1 for male, 0 for female
    'smoking_status': [1, 0, 1, 0, 1, 0, 1, 0],   # 1 for smoker, 0 for non-smoker
    'duration': [5, 6, 7, 8, 9, 5, 6, 7],         # Survival time in years
    'event': [1, 1, 0, 1, 1, 0, 1, 1]             # 1 for event (e.g., death), 0 for censored
}

# Create DataFrame
df = pd.DataFrame(data)

# Instantiate the Cox Proportional Hazards model
cph = CoxPHFitter()

# Fit the model with the dataset
cph.fit(df, duration_col='duration', event_col='event')

# Print the model summary to analyze the results
cph.print_summary()

# Predict the survival function for a new individual (e.g., 70-year-old smoker)
new_data = pd.DataFrame({
    'age': [70],
    'sex': [1],          # Male
    'smoking_status': [1],  # Smoker
})

# Predict survival for the new individual
survival_function = cph.predict_survival_function(new_data)

# Print the survival function for the new individual
print(survival_function)

Here is the output of the above example −

< lifelines.CoxPHFitter: fitted with 8 total observations, 2 right-censored observations >
             duration col = 'duration'
                event col = 'event'
      baseline estimation = breslow
   number of observations = 8
number of events observed = 6
   partial log-likelihood = -7.39
         time fit was run = 2025-01-31 12:06:31 UTC

---
                coef exp(coef)  se(coef)  coef lower 95%  coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
covariate
age            -0.02      0.98      0.06           -0.14            0.11                0.87                1.12
sex            -0.23      0.79      1.06           -2.31            1.84                0.10                6.32
smoking_status -0.55      0.58      1.03           -2.57            1.47                0.08                4.33

                cmp to     z    p  -log2(p)
covariate
age               0.00 -0.26 0.80      0.33
sex               0.00 -0.22 0.83      0.28
smoking_status    0.00 -0.54 0.59      0.76
---
Concordance = 0.47
Partial AIC = 20.79
log-likelihood ratio test = 0.33 on 3 df
-log2(p) of ll-ratio test = 0.07

            0
5.0  0.918415
6.0  0.734865
7.0  0.610552
8.0  0.435392
9.0  0.191870

Print Page