SciPy - Kaplan-Meier Estimator Survival Analysis



The Kaplan-Meier estimator is a statistical method used to estimate the probability of survival over time, especially when dealing with censored data, where some subjects' event times are unknown due to loss of follow-up. Its commonly applied in survival analysis in medical research, reliability engineering and other fields.

SciPy doesn't directly implement the Kaplan-Meier estimator but we can use Python's lifelines library to perform the estimation and visualize the survival function. Below is an explanation of the Kaplan-Meier method and how we can apply it.

Basics of Kaplan-Meier Estimator

The Kaplan-Meier estimator calculates the survival function which represents the probability that a subject survives past a given time. It does so by considering both observed event times and censored data points i.e., when an event hasn't occurred but the subject is lost to follow-up.

The estimator is defined by a step-function curve that steps down each time an event occurs. The survival probability decreases when a failure or event happens and stays flat between events.

Implementation Using lifelines library

Though the SciPy library provides robust numerical and statistical functions, for survival analysis like Kaplan-Meier, lifelines is more efficient and designed specifically for this purpose. Heres how we can implement the Kaplan-Meier estimator using lifelines library−

Install lifelines

First we need to install the lifelines library with the help of command prompt by using the below command −

pip install lifelines

Manual Implementation of the Kaplan-Meier Estimator

The Kaplan-Meier estimator is used to estimate the survival function from lifetime data. It is commonly used in survival analysis and is a non-parametric method to estimate the probability of survival over time.

Heres how we can compute the Kaplan-Meier estimator using Python, without any external library but optionally using lifelines for simplicity −

import numpy as np
import pandas as pd

# Define function to compute Kaplan-Meier estimator
def kaplan_meier_estimator(event_times, events_observed):
    # Sort the event times
    event_times_sorted = np.sort(event_times)
    
    # Initialize the number at risk (initially everyone is at risk)
    n_risk = len(event_times_sorted)
    
    # Initialize the survival probabilities
    survival_probs = []
    previous_survival_prob = 1.0
    
    # Iterate through the event times to calculate survival probabilities
    for time in event_times_sorted:
        # Number of events at this time (number of deaths)
        deaths = np.sum((event_times == time) & (events_observed == 1))
        
        # Number of individuals at risk just before this time
        risk_set = np.sum(event_times >= time)
        
        # Update the survival probability
        survival_prob = previous_survival_prob * (1 - deaths / risk_set)
        survival_probs.append(survival_prob)
        
        # Update previous survival probability
        previous_survival_prob = survival_prob
        
    # Create a DataFrame for the Kaplan-Meier estimate
    km_df = pd.DataFrame({
        'Event Time': event_times_sorted,
        'Survival Probability': survival_probs
    })
    
    return km_df

# Example Data
event_times = np.array([5, 6, 6, 2, 4, 3, 8, 10, 7])  # Time to event (or censoring)
events_observed = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1])  # 1 = event observed, 0 = censored

# Compute the Kaplan-Meier estimator
km_df = kaplan_meier_estimator(event_times, events_observed)

print(km_df)

Following is the output of the manual implementation of the Kaplan-Meier estimator without using the lifelines library −

  Event Time  Survival Probability
0           2              0.888889
1           3              0.777778
2           4              0.777778
3           5              0.648148
4           6              0.518519
5           6              0.414815
6           7              0.276543
7           8              0.138272
8          10              0.138272

Using lifelines library

If we want to use a more straightforward library to compute the Kaplan-Meier estimator, we can use the lifelines library which we installed before −

from lifelines import KaplanMeierFitter
import numpy as np

# Example Data
event_times = np.array([5, 6, 6, 2, 4, 3, 8, 10, 7])  # Time to event (or censoring)
events_observed = np.array([1, 1, 0, 1, 0, 1, 1, 0, 1])  # 1 = event observed, 0 = censored

# Instantiate the KaplanMeierFitter
kmf = KaplanMeierFitter()

# Fit the model to the data
kmf.fit(event_times, event_observed=events_observed)

# Plot the Kaplan-Meier estimator
kmf.plot()

# Display the survival probabilities at each time point
print(kmf.survival_function_)

Here is the output of the Kaplan-Meier estimator computed using the lifelines library −

     KM_estimate
timeline
0.0          1.000000
2.0          0.888889
3.0          0.777778
4.0          0.777778
5.0          0.648148
6.0          0.518519
7.0          0.345679
8.0          0.172840
10.0         0.172840

Customizing the Plot

After computing the Kaplan-Meier we can adjust the plot by customizing the title, labels and line style. For example we can change the color or make the line dashed to highlight different parts of the curve. Below is the code −

import numpy as np
from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt

# Example data: event times and censoring indicators
event_times = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
censored = np.array([1, 1, 0, 1, 1, 0, 1, 1, 0])  # 1: event, 0: censored

# Initialize the Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Fit the model to the data
kmf.fit(event_times, event_observed=censored)

# Customize the Kaplan-Meier plot
kmf.plot_survival_function(color='blue', linestyle='-', label='Survival Curve')

# Adding titles and axis labels
plt.title('Customized Kaplan-Meier Survival Curve')
plt.xlabel('Time (Months)')
plt.ylabel('Survival Probability')

# Display the plot
plt.show()

Following is the output of the Customized Kaplan-Meier estimator plot computed using the lifelines library −

Kaplan-Meier customized plot
Advertisements