SciPy - Linear Least Squares



The linear least squares (LLS) is a method for finding the best approximation to an over-determined system of linear equations. An over-determined system has more equations than unknowns and is typically inconsistent, so an exact solution does not exist.

Mathematical equation of the Linear Least squares given for a system Ax=b by minimize the residual is given as follows −

Axb22

Where A is an mn matrix (mn), b is an m-dimensional vector, x is an n-dimensional vector.

Implementation of Linear Least Squares

SciPy provides several tools to solve the linear least-squares problem efficiently. The primary function for implementing the Linear Least Squares is with the function scipy.linalg.lstsq().

This function solves the least-squares problem using Singular Value Decomposition (SVD) internally by ensuring numerical stability and robustness. It works for both full-rank and rank-deficient matrices.

How Linear Least Squares Method Works

Here are the steps how the Linear Least squares works in scipy −

  • Normal Equations: These equations while theoretically valid, solving via normal equations can lead to numerical instability. The equation can be given as −
    ATAx = ATb
  • Singular Value Decomposition (SVD): The function internally uses SVD to decompose A −
    A = UVT

    The least-squares solution is computed as −

    x = V-1UTb

    The above approach avoids the numerical issues of normal equations.

Syntax

Following is the syntax for using the function scipy.linalg.lstsq()

scipy.linalg.lstsq(a, b, cond=None, overwrite_a=False, overwrite_b=False, check_finite=True, lapack_driver=None)

Parameters

Here are the parameters of the function scipy.linalg.lstsq()

  • a: The m x n coefficient matrix A.
  • b: The m-dimensional vector or matrix with multiple columns for multiple right-hand sides.
  • cond(optional): Cutoff for small singular values. Singular values smaller than cond * max(singular_values) are treated as zero.
  • overwrite_a: If True then overwrites matrix A to save memory.
  • overwrite_b: If True then overwrites vector b to save memory.
  • check_finite: If True then checks that input matrices contain only finite numbers.
  • lapack_driver: This specifies the LAPACK driver used to solve the problem. Options include gelsd, gelsy or gelss.

Over-determined System (m>n)

An over-determined system occurs when there are more equations than unknowns (m>n) by making the system likely inconsistent. The linear least squares method finds the solution x that minimizes the residual i.e., the sum of squared differences between the observed values (b) and the values predicted by the model (Ax).

Following is the example which helps us to find the best fit solution for the given overdetermined system with the help of the function scipy.linalg.lstsq()

import numpy as np
from scipy.linalg import lstsq

# Define the overdetermined system
A = np.array([[1, 1], [1, 2], [1, 3]])
b = np.array([1, 2, 2])

# Solve using scipy.linalg.lstsq
x, residuals, rank, singular_values = lstsq(A, b)

# Output results
print("Matrix A:")
print(A)
print("\nVector b:")
print(b)
print("\nSolution (x):", x)
print("Residuals:", residuals)
print("Rank of A:", rank)
print("Singular values of A:", singular_values)

Following is the output of the function scipy.linalg.lstsq() function which is used for overdetermined system −

Matrix A:
[[1 1]
 [1 2]
 [1 3]]

Vector b:
[1 2 2]

Solution (x): [0.66666667 0.5       ]
Residuals: 0.16666666666666677
Rank of A: 2
Singular values of A: [4.07914333 0.60049122]

Underdetermined System (m<n)

An underdetermined system occurs when there are fewer equations than unknowns (m<n). This means the system has infinitely many solutions as there are not enough constraints to uniquely determine all variables. In such cases the least-squares solution minimizes the residual and provides the solution with the smallest Euclidean norm.

Following is the example which helps us to find the best fit solution for the given underdetermined system with the help of the function scipy.linalg.lstsq()

import numpy as np
from scipy.linalg import lstsq

# Define the underdetermined system
A = np.array([[1, 1, 1], [1, 2, 3]])
b = np.array([1, 2])

# Solve using scipy.linalg.lstsq
x, residuals, rank, singular_values = lstsq(A, b)

# Output results
print("Matrix A:")
print(A)
print("\nVector b:")
print(b)
print("\nSolution (x):", x)
print("Residuals:", residuals)
print("Rank of A:", rank)
print("Singular values of A:", singular_values)

Here is the output of the function scipy.linalg.lstsq() function which is used for underdetermined system −

Matrix A:
[[1 1 1]
 [1 2 3]]

Vector b:
[1 2]

Solution (x): [0.33333333 0.33333333 0.33333333]
Residuals: []
Rank of A: 2
Singular values of A: [4.07914333 0.60049122]
Advertisements