NumPy - Time Series Analysis



Time Series Analysis in NumPy

Time series analysis is a technique used to analyze time-ordered data points, such as stock prices, sales data, or temperature readings. NumPy, with its array operations, allows you to handle, manipulate, and analyze time series data.

In this tutorial, we will explore how to use NumPy for time series analysis, covering important techniques such as creating time series arrays, performing statistical operations, and visualizing trends over time.

Creating Time Series Data with NumPy

Creating a time series in NumPy typically involves generating an array of datetime objects that correspond to a sequence of time points. You can use NumPy's datetime64 data type to create time-based arrays.

Once you have a time series, you can store associated data, such as stock prices or temperatures, alongside the time points.

Example

In the following example, we will create a simple time series array representing daily timestamps and an associated data array representing stock prices −

import numpy as np

# Create a time series array for daily data
dates = np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[D]')

# Create an array of stock prices corresponding to the dates
stock_prices = np.array([150.25, 152.75, 153.50, 155.00, 154.25])

print("Dates:", dates)
print("Stock Prices:", stock_prices)

The output will display the corresponding time series data −

Dates: ['2024-01-01' '2024-01-02' '2024-01-03' '2024-01-04' '2024-01-05']
Stock Prices: [150.25 152.75 153.5  155.   154.25]

Statistical Analysis on Time Series Data

NumPy provides several statistical functions that are useful for time series analysis. For instance, you can compute the mean, standard deviation, and cumulative sums over time to observe trends and fluctuations in the data.

Example

In the following example, we will compute the mean stock price and the cumulative sum of stock prices over the time series −

import numpy as np

# Time series of stock prices
stock_prices = np.array([150.25, 152.75, 153.50, 155.00, 154.25])

# Compute the mean stock price
mean_price = np.mean(stock_prices)

# Compute the cumulative sum of stock prices
cumulative_sum = np.cumsum(stock_prices)

print("Mean Stock Price:", mean_price)
print("Cumulative Sum of Stock Prices:", cumulative_sum)

The output will be as follows −

Mean Stock Price: 153.15
Cumulative Sum of Stock Prices: [150.25 303.   456.5  611.5  765.75]

Calculating Differences and Changes Over Time

In time series analysis, it is often useful to calculate the differences between consecutive data points to understand changes over time.

NumPy's diff() function allows you to compute the difference between each pair of adjacent values in an array, which is useful for identifying trends, volatility, or growth rates.

Example

In this example, we will calculate the daily change in stock prices using NumPy's diff() function −

import numpy as np

# Time series of stock prices
stock_prices = np.array([150.25, 152.75, 153.50, 155.00, 154.25])

# Calculate the daily change in stock prices
price_changes = np.diff(stock_prices)

print("Price Changes:", price_changes)

The output will display the daily changes in stock prices −

Price Changes: [ 2.5   0.75  1.5  -0.75]

Visualizing Time Series Data

Visualizing time series data helps to identify trends, patterns, and anomalies over time. While NumPy doesn't have built-in plotting functions, you can easily use external libraries like Matplotlib to visualize time series data alongside NumPy arrays.

Example

In the following example, we will plot the time series of stock prices using Matplotlib −

import numpy as np
import matplotlib.pyplot as plt

# Time series of dates and stock prices
dates = np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[D]')
stock_prices = np.array([150.25, 152.75, 153.50, 155.00, 154.25])

# Plot the stock prices over time
plt.plot(dates, stock_prices)
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Stock Price Time Series')
plt.xticks(rotation=45)
plt.show()

This will generate a simple line chart showing the trend of stock prices over time −

Time Series Analysis

Resampling Time Series Data

Resampling is an important technique in time series analysis that involves changing the frequency of the data. You may want to resample your data from daily to monthly or from hourly to daily, depending on the nature of your analysis.

While NumPy doesn't directly provide resampling functions, you can use NumPy's slicing and aggregation methods to achieve this.

Example

In this example, we will resample daily stock prices to weekly averages by taking the mean over each week −

import numpy as np

# Daily stock prices for 10 days
daily_prices = np.array([150.25, 152.75, 153.50, 155.00, 154.25, 156.00, 158.00, 160.25, 162.50, 163.75])

# Resample to weekly averages (assuming 7 days per week for simplicity)
weekly_avg = np.mean(daily_prices[:7].reshape(-1, 7), axis=1)

print("Weekly Average Stock Prices:", weekly_avg)

The output will show the weekly average prices −

Weekly Average Stock Prices: [154.25]
Advertisements