SciPy - Reading and Writing Files



SciPy is primarily used for scientific and mathematical computing but it also offers functionalities that can help with reading and writing certain file formats especially in scientific data.

In SciPy reading and writing files is handled primarily through NumPy's file I/O functions. To save data we use np.savetxt() for text files or np.save() for binary .npy files which efficiently store arrays. Loading these files is as simple as np.loadtxt() and np.load() respectively.

For MATLAB files the SciPy provides scipy.io.savemat and scipy.io.loadmat. For other formats like .wav for audio scipy.io.wavfile.read() and scipy.io.wavfile.write() are available. SciPy supports working with sparse matrices via scipy.sparse, with scipy.io.mmread() and scipy.io.mmwrite() for Matrix Market formats by facilitating efficient file handling across formats.

Below are some of the common ways SciPy is used to read and write files which focuses on working with data formats often encountered in scientific computing −

Working with .mat Files (MATLAB files)

MAT files are data files used by MATLAB which is a high-level programming language and environment for numerical computation and visualization. The .mat file format is designed to store variables, arrays and other data structures in a way that MATLAB can easily read and write.

MAT files can be read and written by MATLAB as well as by other programming languages like Python uses libraries like SciPy and R.

SciPy provides tools for working with MATLAB .mat files by enabling Python users to exchange data with MATLAB users. These files are handled through the scipy.io module.

Loading .mat Files

To load .mat files in Python we can use the loadmat function from SciPys io module. This function reads ..mat files and converts them into a Python dictionary by allowing us to access MATLAB variables in Python.

Here is the step by step guide to load the .mat file with the help of scipy −

  • First we have to import the loadmat function.
  • After that we have to specify the path of the .mat file. loadmat reads the file and returns a dictionary where the keys are the variable names and the values are the data.
  • Next access variables in the .mat file by referring to the dictionary keys.

Example

Here is the example which loads the .mat file with the help of loadmat() function of the Scipy library −

from scipy.io import loadmat

# loading the .mat file from the local drive
data = loadmat('/files/array_file.mat')

# Access a variable named 'my_array' in the MATLAB file
my_array = data['my_array']
print(my_array)

Here is the output after loading the .mat file with the help of loadmat() function −

[[1 2 3]
 [4 5 6]]

Writing into .mat files

To write into .mat files in Python we can use the savemat function from SciPys io module. This function saves data as a .mat file by allowing Python data to be shared with MATLAB or other programs that support this format.

Following are the steps to be followed to write the data into the .mat file −

  • Import the function: First we have to import the savemat function from the scipy.io module.
  • Prepare the Data: The data to be saved should be in the form of a dictionary, where each key is the variable name (as it should appear in MATLAB) and each value is the data to be saved (usually as a NumPy array or other serializable object).
  • Save the Data to a .mat File: Use savemat to specify the filename and data dictionary.
from scipy.io import savemat
import numpy as np

# Data to save
data = {
    'array1': np.array([1, 2, 3]),
    'matrix1': np.array([[1, 2], [3, 4]])
}
# Save data to a .mat file
savemat('/files/written_matfile.mat', {'my_array': data})
print("Data written into the Mat file")

Here is the output after writing into the .mat file with the help of savemat() function −

Data written into the Mat file

Reading and Writing .npz and .npy Files

.npy and .npz are file formats used by NumPy to store arrays efficiently in binary format. They are commonly used for saving and loading data in Python particularly for handling large arrays in a compact, fast-access format.

The .npy file include metadata such as data type and shape, to enable efficient and accurate reconstruction of the array when loaded and In .npz file each array is stored as a separate .npy file within the archive with keys for access.

Writing into .npy files

To write into .npy files in Python using NumPy we can use the np.save() function. This function stores a single NumPy array in a binary format with metadata by making it efficient for saving large datasets.

Here are the steps that to be followed to write the data into the .npy file −

  • Import Numpy: First we need to import NumPy.
  • Prepare the Data: We need to have a NumPy array that we want to save.
  • Save the Array: We have to np.save() function to save the array to a .npy file.
import numpy as np

# Create a NumPy array
array_data = np.array([1, 2, 3, 4, 5])

# Save the array to a .npy file
np.save('/files/written_npyfile.npy', array_data)
print("Data saved to the npy file")

Here is the output after writing into the .npy file with the help of np.save() function −

Data saved to the npy file

Reading the .npy files

To read .npy files in Python we have to use the np.load() function from NumPy. This function loads a .npy file by restoring the array with its original shape and data type.

Below are the steps that to be followed to read the data from the .npy file −

  • Import Numpy: First we need to import NumPy.
  • Load the Data:We have to use np.load() function to read the .npy file by specifying the filename.
import numpy as np

# Load a .npy file
array = np.load('/files/written_npyfile.npy')
print(array)

Here is the output after Reading the .npy file with the help of np.load() function −

[1 2 3 4 5]

Reading the .npz files

To read .npz files in Python we can use the np.load() function from NumPy. An .npz file is essentially a compressed archive containing multiple .npy files with each corresponding to a separate array. When we load an .npz file it returns a NpzFile object which behaves like a dictionary. Each array inside the .npz file can be accessed by its corresponding key.

Below are the steps that to be followed to read the data from the .npz file −

  • Import NumPy: First we must import NumPy library.
  • Load the .npz file: For loading the .npz file we have to use the function
  • Access the Arrays: Access the individual arrays inside the .npz file by their keys which are the variable names used when the .npz file was saved.
import numpy as np

# Load the .npz file
data = np.load('/files/data_arrays.npz')

# Access individual arrays using their keys
array1 = data['array1']
array2 = data['array2']

# Print the arrays
print(array1)
print(array2)

# Print all keys in the .npz file
print(data.files)

Following is the output after Reading the .npz file with the help of np.load() function −

[1 2 3]
[[4 5]
 [6 7]]
['array1', 'array2']

Writing the .npz files

To read .npz files in Python we can use the np.load() function from NumPy. An .npz file is essentially a compressed archive containing multiple .npy files with each corresponding to a separate array. When we load an .npz file it returns a NpzFile object which behaves like a dictionary. Each array inside the .npz file can be accessed by its corresponding key.

Below are the steps that to be followed to read the data from the .npz file −

  • Import NumPy: First we should make sure that we have NumPy imported in our script
  • Prepare the data: We can store multiple arrays in a .npz file. Each array is stored with a unique name which are similar to dictionary key-value pairs.
  • Save the Data to a .npz File: We can use the np.savez() function to save multiple arrays to a .npz file.
Alternatively we can use np.savez_compressed() function to compress the .npz file to reduce the file size
import numpy as np

# Create two NumPy arrays
array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5], [6, 7]])

# Save arrays into a .npz file
np.savez('/files/data_arrays.npz', array1=array1, array2=array2)

# Alternatively, to save with compression
np.savez_compressed('/files/data_arrays_compressed.npz', array1=array1, array2=array2)
print("Files saved as with compression and without compression")

Following is the output of writing into the .npz files without file compression and with compression −

Files saved as with compression and without compression

Working with Sparse Matrices

Working with sparse matrices is an important concept when dealing with large datasets where most of the elements are zero. Sparse matrices are memory-efficient as they only store the non-zero elements and their positions rather than the entire matrix. SciPy provides various functions for working with sparse matrices.

Sparse Matrix Formats in SciPy

SciPy offers several sparse matrix formats each optimized for different types of operations −

  • CSR (Compressed Sparse Row): Efficient for row slicing and matrix-vector products.
  • CSC (Compressed Sparse Column): Efficient for column slicing and matrix-vector products.
  • COO (Coordinate List): Efficient for constructing sparse matrices and for quick insertions of elements.
  • LIL (List of Lists): Efficient for constructing sparse matrices incrementally.
  • DIA (Diagonal): Efficient for diagonal matrices.
  • BSR (Block Sparse Row): Efficient for block-sparse matrices.

Saving and Loading Sparse Matrices

We can save and load sparse matrices using scipy.sparse module and the .npz format since .npz files can store multiple arrays including sparse matrix formats.

Saving Sparse Matrices

To save sparse matrices we use scipy.sparse.save_npz() which saves a sparse matrix to a .npz file. The function takes two arguments namely the filename and the sparse matrix to save. Following is the example which saves the sparse matrix −

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import save_npz

# Create a sparse matrix (CSR format)
data = np.array([1, 2, 3, 4])
row_indices = np.array([0, 1, 2, 3])
col_indices = np.array([0, 1, 2, 3])
sparse_matrix = csr_matrix((data, (row_indices, col_indices)), shape=(4, 4))

# Save the sparse matrix to a .npz file
save_npz('/files/sparse_matrix.npz', sparse_matrix)
print("Done saving the sparse matrices")

Following is the output of saving the sparse matrix using the scipy library −

Done saving the sparse matrices

Loading Sparse Matrices

To load a sparse matrix we use scipy.sparse.load_npz(). This function loads a sparse matrix stored in .npz format and returns it in the correct sparse matrix format such as CSR, CSC, etc. Here is the example of loading the sparse matrix −

from scipy.sparse import load_npz

# Load the sparse matrix from a .npz file
loaded_sparse_matrix = load_npz('/files/sparse_matrix.npz')

# Print the loaded sparse matrix
print(loaded_sparse_matrix)

Following is the output of loading the sparse matrix using the scipy library −

<Compressed Sparse Row sparse matrix of dtype 'int32'
        with 4 stored elements and shape (4, 4)>
  Coords        Values
  (0, 0)        1
  (1, 1)        2
  (2, 2)        3
  (3, 3)        4

Let's see about the other file types which we can use in Scipy in the next chapter.

Advertisements