Compressed Sparse formats CSR and CSC in Python
In Python scientific computing, SciPy’s CSR and CSC formats efficiently store sparse matrices by keeping only non-zero values. CSR is ideal for fast row operations, while CSC is suited for quick column access and transposition.
What is CSR Format?
The CSR format is optimized for fast row slicing and matrix-vector products. It compresses the sparse matrix by storing only the non-zero elements and the corresponding row indices. The structure is divided into three one-dimensional arrays:
- Data: The Stores the non-zero values.
- Indices: The Stores the column indices of the elements in the data array.
- Indptr: Stores the index pointers to the start of the each row in the data array.
Example:
import numpy as np
from scipy.sparse import csr_matrix
m = np.array([
[0, 0, 1],
[4, 0, 0],
[0, 0, 3]
])
csr = csr_matrix(m)
print(csr)
print("\nData:", csr.data)
print("Indices:", csr.indices)
print("Indptr:", csr.indptr)
Output
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 2) 1
(1, 0) 4
(2, 2) 3
Data: [1 4 3]
Indices: [2 0 2]
Indptr: [0 1 2 3]
Explanation:
- csr_matrix(m) converts the dense array to CSR format, storing only non-zero values efficiently.
- data, indices and indptr store the values, their column indices and row start positions in data.
What is CSC Format?
The CSC format is optimized for fast column slicing and efficient arithmetic operations. It is similar to CSR but compresses the matrix by storing only the non-zero elements and the corresponding column indices. The structure is divided into three one-dimensional arrays:
- Data: Stores the non-zero values.
- Indices: The Stores the row indices of the elements in the data array.
- Indptr: Stores the index pointers to the start of the each column in the data array.
Example:
import numpy as np
from scipy.sparse import csc_matrix
m = np.array([
[0, 0, 1],
[4, 0, 0],
[0, 0, 3]
])
csc = csc_matrix(m)
print(csc)
print("\nData:", csc.data)
print("Indices:", csc.indices)
print("Indptr:", csc.indptr)
Output
<Compressed Sparse Column sparse matrix of dtype 'int64'
with 3 stored elements and shape (3, 3)>
Coords Values
(1, 0) 4
(0, 2) 1
(2, 2) 3
Data: [4 1 3]
Indices: [1 0 2]
Indptr: [0 1 1 3]
Explanation:
- csc_matrix(m) converts the dense array to CSC format, efficiently storing only non-zero values column-wise.
- data, indices and indptr store the non-zero values, their row indices and the starting index of each column in data.
Difference Between CSR and CSC Formats
Knowing the difference between CSR and CSC helps you pick the right format for your task. CSR for fast row access, CSC for efficient column operations.
Characteristics | CSR Formats | CSC Formats |
|---|---|---|
Storage Orientation | Row-wise compression | Column-wise compression |
Efficient Operations | Row slicing, matrix-vector multiplication | Column slicing, matrix transposition |
Indexing | Uses row pointers to the store start of each row | Uses column pointers to the store start of each column |
Applications | Solving the linear systems, iterative methods, and sparse row operations | The Matrix factorizations, eigenvalue problems, and sparse column operations |
Conversion | Can be converted to/from other formats like CSC easily | Can be converted to/from other formats like CSR easily |
Space Efficiency | Depends on the sparsity pattern | Depends on the sparsity pattern |
Example Usage in Python | csr_matrix from the SciPy | csc_matrix from the SciPy |