bundles / scipy latest / scipy / linalg / _sketches / clarkson_woodruff_transform
function
scipy.linalg._sketches:clarkson_woodruff_transform
source: /scipy/linalg/_sketches.py :57
Signature
def clarkson_woodruff_transform ( input_matrix , sketch_size , rng = None , * , seed = None ) Summary
Applies a Clarkson-Woodruff Transform/sketch to the input matrix.
Extended Summary
Given an input_matrix A of size (n, d), compute a matrix A' of size (sketch_size, d) so that
with high probability via the Clarkson-Woodruff Transform, otherwise known as the CountSketch matrix.
The documentation is written assuming array arguments are of specified "core" shapes. However, array argument(s) of this function may have additional "batch" dimensions prepended to the core shape. In this case, the array is treated as a batch of lower-dimensional slices; see linalg_batch for details.
Parameters
input_matrix: array_like, shape (..., n, d)Input matrix.
sketch_size: intNumber of rows for the sketch.
rng: {None, int, `numpy.random.Generator`}, optionalIf
rngis passed by keyword, types other than numpy.random.Generator are passed to numpy.random.default_rng to instantiate aGenerator. Ifrngis already aGeneratorinstance, then the provided instance is used. Specifyrngfor repeatable function behavior.If this argument is passed by position or
seedis passed by keyword, legacy behavior for the argumentseedapplies:If
seedis None (or numpy.random), the numpy.random.RandomState singleton is used.If
seedis an int, a newRandomStateinstance is used, seeded withseed.If
seedis already aGeneratororRandomStateinstance then that instance is used.
Returns
A': array_likeSketch of the input matrix
A, of size(sketch_size, d).
Notes
To make the statement
precise, observe the following result which is adapted from the proof of Theorem 14 of [2] via Markov's Inequality. If we have a sketch size sketch_size=k which is at least
Then for any fixed vector x,
with probability at least one minus delta.
This implementation takes advantage of sparsity: computing a sketch takes time proportional to A.nnz. Data A which is in scipy.sparse.csc_matrix format gives the quickest computation time for sparse input.
>>> import numpy as np >>> from scipy import linalg >>> from scipy import sparse >>> rng = np.random.default_rng() >>> n_rows, n_columns, density, sketch_n_rows = 15000, 100, 0.01, 200 >>> A = sparse.rand(n_rows, n_columns, density=density, format='csc') >>> B = sparse.rand(n_rows, n_columns, density=density, format='csr') >>> C = sparse.rand(n_rows, n_columns, density=density, format='coo') >>> D = rng.standard_normal((n_rows, n_columns)) >>> SA = linalg.clarkson_woodruff_transform(A, sketch_n_rows) # fastest >>> SB = linalg.clarkson_woodruff_transform(B, sketch_n_rows) # fast >>> SC = linalg.clarkson_woodruff_transform(C, sketch_n_rows) # slower >>> SD = linalg.clarkson_woodruff_transform(D, sketch_n_rows) # slowest
That said, this method does perform well on dense inputs, just slower on a relative scale.
Examples
Create a big dense matrix ``A`` for the example:import numpy as np from scipy import linalg n_rows, n_columns = 15000, 100 rng = np.random.default_rng() A = rng.standard_normal((n_rows, n_columns))✓
sketch_n_rows = 200 sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows, seed=rng) sketch.shape✓
linalg.norm(A) linalg.norm(sketch)✗
b = rng.standard_normal(n_rows) x = linalg.lstsq(A, b)[0] Ab = np.hstack((A, b.reshape(-1, 1))) SAb = linalg.clarkson_woodruff_transform(Ab, sketch_n_rows, seed=rng) SA, Sb = SAb[:, :-1], SAb[:, -1] x_sketched = linalg.lstsq(SA, Sb)[0]✓
linalg.norm(A @ x - b) linalg.norm(A @ x_sketched - b)✗
Aliases
-
scipy.linalg.clarkson_woodruff_transform