SciPy - Sparse Matrix



Sparse Matrix in SciPy

In SciPy a sparse matrix is a matrix in which most of the elements are zero. Instead of storing all the elements like in a dense matrix, We can use sparse matrices which only store the non-zero elements along with their positions i.e., indices.

This results in a significant reduction in memory usage and increases computational efficiency when working with large matrices, especially when dealing with scientific, engineering or machine learning problems where matrices can often be very large and sparse.

SciPy provides a module called scipy.sparse which is designed for handling sparse matrices efficiently. The sparse matrix module in SciPy supports several formats and operations that allow for efficient storage and manipulation of large sparse data.

Why To Use Sparse Matrices?

As we discussed above Sparse matrices are matrices that contain a large number of zero elements. This results in significant memory and computational savings especially when dealing with large datasets where most of the entries are zeros. So here are the key reasons to use sparse matrices −

  • Memory Efficiency: The Sparse Matrices use much less memory by storing only non-zero elements which is especially useful for large matrices with many zeros.
  • Faster Computation: Sparse matrix algorithms skip over zerosby leading to faster matrix operations such as multiplication, solving linear systems.
  • Scalability: They allow large datasets such as graphs or text data such as term-document matrices to be processed more efficiently.
  • Compression: Sparse matrices are ideal for compressing large datasets by representing only the essential non-zero elements.
  • Optimized Algorithms:Many algorithms particularly in linear algebra and optimization are optimized to work with sparse matrices by enhancing performance.
  • Better I/O: Storing sparse matrices reduces disk space and accelerates data transfer especially for large-scale systems.

Sparse Matrix Formats

Following table lists different Sparse Matrix Formats provided by SciPy −

S.No Sparse Matrix Format Function & Description
1 Compressed Sparse Row (CSR) format. scipy.sparse.csr_matrix()
Optimized for fast row slicing and matrix-vector multiplication. Stores non-zero elements in a 1D array, along with row index pointers and column indices.
2 Compressed Sparse Column (CSC) format. scipy.sparse.csc_matrix()
Optimized for fast column slicing and matrix-vector multiplication. Similar to CSR but stores column indices and row pointers instead of row indices.
3 Coordinate List (COO) format. scipy.sparse.coo_matrix()
Stores the matrix as a list of (row, column) indices and their corresponding values. Useful for constructing sparse matrices incrementally.
4 Diagonal (DIA) format. scipy.sparse.dia_matrix()
Designed for matrices with a small number of non-zero diagonals. Efficient for diagonal matrices, stores diagonals in a compact format.
5 List of Lists (LIL) format. scipy.sparse.lil_matrix()
Stores each row as a list of column indices and non-zero values. Efficient for incrementally constructing sparse matrices and modifying individual elements.
6 Block Sparse Row (BSR) format. scipy.sparse.bsr_matrix()
Used for block-sparse matrices, where the matrix is divided into sub-blocks. Efficient for block-wise matrix operations, such as block diagonal matrices.
7 Dictionary Of Keys (DOK) format. scipy.sparse.dok_matrix()
Stores data in pairs of keys and their corresponding values. The key serves as an identifier for each value by allowing efficient access, modification and retrieval of data. In Python the dict data structure is commonly used to represent a dictionary.

Conversion Between Sparse Formats

Conversion between different sparse matrix formats in SciPy is straightforward and often necessary as each format is optimized for specific operations. SciPy provides methods like .tocsr(), .tocsc(), .tocoo(), .tolil(), .todok() and .todia() to perform the above different sparse matrix conversions.

Example

Following is an example which shows how to perform conversion between different Spare Formats. Here in this example we have a sparse matrix in COO (Coordinate) format and we want that to convert it to other formats by using the scipy.sparese module −

from scipy.sparse import coo_matrix

# Define a COO sparse matrix
row = [0, 1, 2]
col = [0, 2, 0]
data = [1, 3, 4]
coo = coo_matrix((data, (row, col)), shape=(3, 3))

print("COO Matrix:")
print(coo)

# Convert COO to CSR (Compressed Sparse Row)
csr = coo.tocsr()
print("\nConverted to CSR Format:")
print(csr)

# Convert COO to CSC (Compressed Sparse Column)
csc = coo.tocsc()
print("\nConverted to CSC Format:")
print(csc)

# Convert COO to LIL (List of Lists)
lil = coo.tolil()
print("\nConverted to LIL Format:")
print(lil)

# Convert COO to DOK (Dictionary of Keys)
dok = coo.todok()
print("\nConverted to DOK Format:")
print(dok)

Following is the output for the 2D matrix's L2 Norm −

COO Matrix:
<COOrdinate sparse matrix of dtype 'int64'
        with 3 stored elements and shape (3, 3)>
  Coords        Values
  (0, 0)        1
  (1, 2)        3
  (2, 0)        4

Converted to CSR Format:
<Compressed Sparse Row sparse matrix of dtype 'int64'
        with 3 stored elements and shape (3, 3)>
  Coords        Values
  (0, 0)        1
  (1, 2)        3
  (2, 0)        4

Converted to CSC Format:
<Compressed Sparse Column sparse matrix of dtype 'int64'
        with 3 stored elements and shape (3, 3)>
  Coords        Values
  (0, 0)        1
  (2, 0)        4
  (1, 2)        3

Converted to LIL Format:
<List of Lists sparse matrix of dtype 'int64'
        with 3 stored elements and shape (3, 3)>
  Coords        Values
  (0, 0)        1
  (1, 2)        3
  (2, 0)        4

Converted to DOK Format:
<Dictionary Of Keys sparse matrix of dtype 'int64'
        with 3 stored elements and shape (3, 3)>
  Coords        Values
  (0, 0)        1
  (1, 2)        3
  (2, 0)        4
Advertisements