Scientific Python: NumPy

Last update: 19. Mar 2019

NumPy: numerical computing with powerful numerical arrays objects, and routines to manipulate them. http://www.numpy.org/

[http://scipy-lectures.org/intro/intro.html#the-scientific-python-ecosystem]

NumPy is:

Prerequisites

  • Python 3
  • Python IDE
  • Project folder w/ virtual environment set up
  • NumPy:
    $ pip install numpy
In [1]:
import numpy as np # NumPy import convention

Intro

NumPy arrays are like lists:

In [4]:
l = list(range(1_000_000))
print(type(l), l[:10])

a = np.array(l)
print(type(a), a[:10])
<class 'list'> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'numpy.ndarray'> [0 1 2 3 4 5 6 7 8 9]

... but fast:

In [3]:
import time

t_start = time.time()

l_sum = sum([i*i for i in l])

print("list comprehension mul + sum took",  time.time() - t_start, "seconds")
print(l_sum)


t_start = time.time()

a_sum = (a*a).sum()

print("ndarray mul + ndarray.sum took", time.time() - t_start, "seconds")
print(a_sum)
list comprehension mul + sum took 0.13187670707702637 seconds
333332833333500000
ndarray mul + ndarray.sum took 0.007658958435058594 seconds
333332833333500000

Note: NumPy methods (functions) are implemented in C. Python itself is intepreted, not compiled, hence slow. If you care for speed of your computations take care to use NumPy all along.

Note: For most of the methods/properties/operators of NumPy arrays, there exists an underlying universal NumPy function (ufunc), such as for the .sum() method above there exists np.sum() ufunc.

Exercise

How fast do you think sum(a*a) and np.sum(a*a) will be in comparison? Test it.

Dimensions

The numpy.ndarray type name stands for NumPy N-dimensional array. In Python terms these can be thought of as nested lists of numbers of equal lengths at each level of nesting.

numpy.ndarray has following size-related properties:

  • .ndim - number of dimensions,
  • .shape - size of each dimension,
  • .size - total number of elements in the array.

0D array = scalar (number), 1D array = vector (list), 2D array = matrix (list of lists), but 3D or more are also frequently used (e.g. as lists of lists of matrices rather than as tensors):

In [5]:
def print_dim_info(a):
    print(a)
    print("ndim =", a.ndim)
    print("shape =", a.shape)
    print("size =", a.size)
    print()

num = np.array(0)
print_dim_info(num)
0
ndim = 0
shape = ()
size = 1

In [6]:
vec = np.array([0, 1, 2])
print_dim_info(vec)
[0 1 2]
ndim = 1
shape = (3,)
size = 3

In [7]:
mat = np.array([
    [0, 1, 2],
    [3, 4, 5],
])
print_dim_info(mat)
[[0 1 2]
 [3 4 5]]
ndim = 2
shape = (2, 3)
size = 6

In [8]:
arr = np.array([
    [
        [0, 1, 2],
        [3, 4, 5],
    ],
    [
        [6, 7, 8],
        [9, 10, 11],
    ],
])
print_dim_info(arr)
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
ndim = 3
shape = (2, 2, 3)
size = 12

Excercise

  1. Without coding, can you guess output of the following commands: vec[1], mat[:,1], arr[:,:,1]), including shape? Check your answer.

  2. What does len() return for NumPy arrays?

Iterating over

for loop iterates over a first dimension of a np.ndarray array:

In [13]:
for i_el in vec:
    print(i_el)
print()

for i_row in mat:
    print(i_row)
print()

for i_mat in arr:
    print(i_mat)
print()
0
1
2

[0 1 2]
[3 4 5]

[[0 1 2]
 [3 4 5]]
[[ 6  7  8]
 [ 9 10 11]]

len() value and single dim indexing correspond to the for loop behaviour, much like you would expect from nested lists:

In [14]:
for i in range(len(mat)):
    print(mat[i])
print()
[0 1 2]
[3 4 5]

To iterate over all elements use np.nditer/np.ndenumerate

Note: NumPy arrays are stored row-wise (unlike in MATLAB where arrays are stored column-wise)

In [15]:
for i_el in np.nditer(arr):
    print(i_el, end=", ")
print()
print()

for i, i_el in np.ndenumerate(arr):
    print("a[{}] = {}".format(i, i_el))
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 

a[(0, 0, 0)] = 0
a[(0, 0, 1)] = 1
a[(0, 0, 2)] = 2
a[(0, 1, 0)] = 3
a[(0, 1, 1)] = 4
a[(0, 1, 2)] = 5
a[(1, 0, 0)] = 6
a[(1, 0, 1)] = 7
a[(1, 0, 2)] = 8
a[(1, 1, 0)] = 9
a[(1, 1, 1)] = 10
a[(1, 1, 2)] = 11

Looking for functions

  1. Your favourite Internet search engine
  2. On-line reference: https://docs.scipy.org/doc/numpy/reference/
  3. Built-in docs:
In [16]:
np.lookfor('iterate over elements')
np.nd*?
Search results for 'iterate over elements'
------------------------------------------
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.einsum_path
    Evaluates the lowest cost contraction order for an einsum expression by
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.

Ways of creating arrays

You've seen already that np.array() creates NumPy array from explicit nested lists. You will usually want to create NumPy arrays in more automated fashion.

Evenly spaced vectors by step size (default: 1), like range():

In [17]:
np.arange(1, 11, 2)
Out[17]:
array([1, 3, 5, 7, 9])

Evenly spaced vectors by number of points (default: 50):

In [18]:
np.linspace(0, 1, 10)
Out[18]:
array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

Single value N-dim arrays:

In [19]:
print( np.zeros(3), "\n" )
print( np.ones(3), "\n" )
print( np.zeros((3,3)), "\n" )
print( np.ones((3,3)) * 3, "\n" )
print( np.zeros((2,3,4)), "\n" )
[0. 0. 0.] 

[1. 1. 1.] 

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] 

[[3. 3. 3.]
 [3. 3. 3.]
 [3. 3. 3.]] 

[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]] 

Diagonal matrices:

In [20]:
print( np.diag([1, 2, 3]), "\n" )
print( np.eye(3), "\n" ) # same as: np.identity(3)
[[1 0 0]
 [0 2 0]
 [0 0 3]] 

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] 

Random samples:

In [21]:
# fix the random seed if needed
np.random.seed(123)

size = (2,3)
print( np.random.randint(low=0, high=3, size=size), "\n" )
print( np.random.uniform(size=size), "\n" ) # same as: np.random.rand(*size)
print( np.random.normal(size=size), "\n" ) # same as: np.random.randn(*size)
[[2 1 2]
 [2 0 2]] 

[[0.55131477 0.71946897 0.42310646]
 [0.9807642  0.68482974 0.4809319 ]] 

[[-1.61930007 -1.11396442 -0.44744072]
 [ 1.66840161 -0.14337247 -0.6191909 ]] 

Excercise

  1. Experiment with examples above.
  2. Create a lower triangular 9x9 matrix:
     | 1 0 ... 0 |
     | 2 1 ... 0 |
     ...
     | 8 7 ... 0 |
     | 9 8 ... 1 |
    Hint: check ?np.eye

(*) Grid of evenly spaced "coordinates" e.g to evaluate a 2D function on a grid:

In [23]:
nrow, ncol = 5, 3
x = np.linspace(0, 1, nrow)
y = np.linspace(0, 1, ncol)
xx, yy = np.meshgrid(x, y, indexing="ij")
print(xx)
print(yy)
#
# for i in range(nrow):
#     for j in range(ncol):
#         zz[i,j] = f(xx[i,j], yy[i,j])
#
[[0.   0.   0.  ]
 [0.25 0.25 0.25]
 [0.5  0.5  0.5 ]
 [0.75 0.75 0.75]
 [1.   1.   1.  ]]
[[0.  0.5 1. ]
 [0.  0.5 1. ]
 [0.  0.5 1. ]
 [0.  0.5 1. ]
 [0.  0.5 1. ]]

Basic data types

NumPy arrays elements have their data type objects numpy.dtype, corresponding to (machine-specific) C data types.

Data type of an element is accessible by .dtype property. By default, for numeric arrays, it's a float (notice the trailing .):

In [24]:
v1 = np.zeros(3)
print( v1 )
print( v1.dtype )
print( type(v1.dtype) )
[0. 0. 0.]
float64
<class 'numpy.dtype'>

but it can be implicitly inherited from Python objects:

In [25]:
v2 = np.array([0, 0, 0])
print( v2 )
print( v2.dtype )
[0 0 0]
int64

or specified explicitly via dtype keyword argument of many array-creating functions:

In [26]:
v3 = np.zeros(3, dtype=int)
print(v3)
print(v3.dtype)
[0 0 0]
int64

Data types determine a size of an array in the memory:

In [27]:
a10int = np.arange(10)
print( a10int.itemsize ) # in bytes; same as `a1.dtype.itemsize`
print( a10int.nbytes ) # .nbytes = .itemsize * .size
8
80

Some standard basic Python and NumPy data types and their corresponding sizes:

In [28]:
def print_itemsize(label, type_):
    print(label, "\titemsize =", np.dtype(type_).itemsize, "bytes")

print_itemsize("Python complex", complex)
print()
print_itemsize("Python bool", bool) # bool
print()
print_itemsize("Python int", int) # long long int
print_itemsize("NumPy uint", np.uint) # unsigned long long int
print_itemsize("NumPy int16", np.int16) # short int
print()
print_itemsize("Python float", float) # double
print_itemsize("NumPy float32", np.float32) # float
print()
print_itemsize("Python str", str) # unknown size
print_itemsize("NumPy 3 char", 'S3') # max-len (byte) strings
Python complex 	itemsize = 16 bytes

Python bool 	itemsize = 1 bytes

Python int 	itemsize = 8 bytes
NumPy uint 	itemsize = 8 bytes
NumPy int16 	itemsize = 2 bytes

Python float 	itemsize = 8 bytes
NumPy float32 	itemsize = 4 bytes

Python str 	itemsize = 0 bytes
NumPy 3 char 	itemsize = 3 bytes

String max-len is deduced, but note that Python3 strings are Unicode by default:

In [29]:
chararr = np.array(['abc', 'ef', 'g'])
print(chararr.dtype) # "<" = little-endian byte-order
print(chararr.itemsize)
<U3
12

Use np.chararray to create fixed size char arrays:

In [30]:
chararr2 = np.chararray(chararr.shape, itemsize=8)
chararr2[:] = chararr[:]
print(chararr2)
print(chararr2.dtype) # "|" = 'not applicable' byte-order
[b'abc' b'ef' b'g']
|S8

To change/cast types use .astype() method of an array:

In [31]:
print(chararr)
chararr3 = chararr.astype("S8")
print(chararr3)
print()

v1 = np.zeros(3)
print(v1)
v3 = v1.astype(int)
print(v3)
['abc' 'ef' 'g']
[b'abc' b'ef' b'g']

[0. 0. 0.]
[0 0 0]

See arrays.dtypes reference for data types string specifiers (like S3), or structured data types (with fields) for array elements.

Indexing and slicing

NumPy arrays use API for indexing and slicing compatible with Python (nested) lists:

In [32]:
arr = np.array([
    [
        [0, 1, 2],
        [3, 4, 5],
    ],
    [
        [6, 7, 8],
        [9, 10, 11],
    ],
])

mat = arr[0]
print(mat, "\n")

vec = mat[0]
print(vec, "\n")

num = vec[0]
print(num, "\n")
[[0 1 2]
 [3 4 5]] 

[0 1 2] 

0 

In [33]:
print(vec[-1])
print(vec[:2]) # start:end (note: not including last index)
print(vec[::2]) # start:end:step
print(vec[:0:-1]) # start is deduced as len(vec)-1; end=0 is not included
2
[0 1]
[0 2]
[2 1]

NumPy throws in multi-dim indexing, either using index tuples, or directly:

In [34]:
print( arr )
print( arr[0, 1, 2] ) # first matrix, second row, third column; same as: arr[(0, 1, 2)]
print( arr[0, 1] ) # first matrix, second row
print( arr[0, ::-1, 2] ) # first matrix, all rows reversed (start:end deduced), third column
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
5
[3 4 5]
[5 2]

Views vs. copies

Beware: slicing doesn't copy the original array in memory, but only creates a "view" on an existing memory block. There is no implicit copy-on-write behaviour here (like in MATLAB).

In [35]:
vec1 = np.arange(10)
print(vec1)
vec2 = vec1[::2]
print(vec2)
vec2[:] = vec2[::-1]
print(vec2)
print(vec1) # the original array has also changed!
[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[8 6 4 2 0]
[8 1 6 3 4 5 2 7 0 9]

Use np.may_share_memory() to check if two arrays may share memory (this is a heuristic check that may give false positives).

In [36]:
np.may_share_memory(vec1, vec2)
Out[36]:
True

You can make explicitly copies using .copy() method:

In [37]:
vec1 = np.arange(10)
vec2 = vec1[::2].copy() # copy, not a view
print(np.may_share_memory(vec1, vec2))
vec2[:] = vec2[::-1]
print(vec2)
print(vec1) # the original array is intact
False
[8 6 4 2 0]
[0 1 2 3 4 5 6 7 8 9]

"Fancy" indexing

"Fancy" indexing means indexing with boolean or integer masks, which is very convenient for filtering array elements:

In [38]:
vec = np.random.uniform(-1, 1, size=5)
print(vec)

mask = (vec < 0)
print(mask)

vec[vec < 0] = 0
print(vec)

vec = vec[vec != 0]
print(vec)
[ 0.47599081 -0.63501654 -0.64909649  0.06310275  0.06365517]
[False  True  True False False]
[0.47599081 0.         0.         0.06310275 0.06365517]
[0.47599081 0.06310275 0.06365517]

Using integer masks is like slicing, but with custom lists of indices for each dimension:

In [39]:
arr = np.diag(np.arange(1,6))
arr[:2, [0, 1]]
Out[39]:
array([[1, 0],
       [0, 2]])

When using NumPy arrays for indexing, shape of the index is used in the result:

In [40]:
vec = 1+np.arange(10)
idx = np.array([
    [1, 3],
    [4, 6],
])
vec[idx]
Out[40]:
array([[2, 4],
       [5, 7]])

Excercise

  1. Experiment with examples above. (What if mask has a different size? Can you combine slicing/lists and NumPy arrays in multi-dim indexing?)
  2. Write a matrix_elements_at function that takes a matrix and list of tuples of (i,j) positons in a matrix and returns a vector of elements at these positions. Test it: matrix_elements_at(np.diag(range(4)), [(3,3),(2,2),(0,1)]) should return a 3, 2, 0 vector.

Manipulating dimensions

Changes in dimensions of an array in most cases creates a view of the original array, not its copy.

Shape of an array can be changed using .reshape() method:

In [42]:
vec = np.arange(8)
print(vec, "\n")

mat = vec.reshape(2,4) # products of `.shape` must agree
print(mat, "\n")

arr = mat.reshape(2,2,2)
print(arr, "\n")

# .reshape() returns a view
arr[1,1,1] = 0
print(vec)
[0 1 2 3 4 5 6 7] 

[[0 1 2 3]
 [4 5 6 7]] 

[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]] 

[0 1 2 3 4 5 6 0]

You can conveniently get a vector view (or copy) of an array of any shape using .ravel() (or .flatten()):

In [43]:
print(arr.ravel()) # same as: arr.reshape(np.prod(arr.shape))
print(arr.flatten()) # same as: arr.ravel().copy()
[0 1 2 3 4 5 6 0]
[0 1 2 3 4 5 6 0]

To conveniently add dimension (of size 1) to an existing array you can use numpy.newaxis in combination with slicing:

In [44]:
vec = np.arange(10)
print(vec)
print(vec.shape)
print()

mat_row = vec[np.newaxis,:] # single row matrix; same as: vec.reshape(1, vec.shape[0])
print(mat_row.shape) 
print(mat_row)
print()

mat_col = vec[:,np.newaxis] # single column matrix (column vector)
print(mat_col.shape)
print(mat_col)
print()
[0 1 2 3 4 5 6 7 8 9]
(10,)

(1, 10)
[[0 1 2 3 4 5 6 7 8 9]]

(10, 1)
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]

In addition to .reshape() method there is also .resize() method which changes an array "in-place" and allows for size changes:

In [45]:
arr = np.arange(8)
print(arr, "\n")

arr.resize(2,5) # fill w/ zeros
print(arr, "\n")

arr.resize(2,2) # trim
print(arr, "\n")
[0 1 2 3 4 5 6 7] 

[[0 1 2 3 4]
 [5 6 7 0 0]] 

[[0 1]
 [2 3]] 

Side note: np.resize() function works differently - it creates a copy and fills-up added size with repeats of the original array (not zeros):

In [46]:
arr2 = np.resize(arr, (3,3))
print(arr2, "\n")
print(np.may_share_memory(arr, arr2))
[[0 1 2]
 [3 0 1]
 [2 3 0]] 

False

Transposition allows for easy re-shuffling of dimensions of an array:

In [47]:
mat = np.arange(6).reshape(3,2)
print(mat)
print(mat.T) # same as: mat.transpose()
[[0 1]
 [2 3]
 [4 5]]
[[0 2 4]
 [1 3 5]]

where .T always reverses order of dimensions and .transpose() allows for any permutation:

In [48]:
arr = mat.reshape(1,2,3)
print(arr.shape)
print(arr.T.shape)
print(arr.transpose(1, 2, 0).shape)
(1, 2, 3)
(3, 2, 1)
(2, 3, 1)

Concatenating and splitting

np.concatenate([...], axis=...) concatenates arrays along given axis (default: 0); np.hstack/np.vstack are convenience functions for a first (horizontal) and second (vectical) axis:

In [49]:
vec = np.arange(4)
vec2h = np.hstack([vec, vec]) # same as: np.concatenate([...])
print(vec2h)
vec2v = np.vstack([vec, vec])
print(vec2v)
[0 1 2 3 0 1 2 3]
[[0 1 2 3]
 [0 1 2 3]]
In [50]:
mat = vec.reshape(2,2)
mat2h = np.hstack([mat, mat]) # same as: np.concatenate([...])
print(mat2h)
mat2v = np.vstack([mat, mat]) # same as: np.concatenate([...], axis=1)
print(mat2v)
[[0 1 0 1]
 [2 3 2 3]]
[[0 1]
 [2 3]
 [0 1]
 [2 3]]

Opposite to concatenating functions are np.split, np.hsplit, np.vsplit, which take list of indices to split array with:

In [51]:
vec = np.arange(9)

# .split returns N+1 arrays, where N is nr of split indices
vec1, vec2, vec3 = np.split(vec, [3, 5])
print(vec1)
print(vec2)
print(vec3)
[0 1 2]
[3 4]
[5 6 7 8]
In [52]:
mat = vec.reshape(3,3)

mat_up, mat_down = np.vsplit(mat, [1])
print(mat_up)
print(mat_down)
print()

mat_left, mat_right = np.hsplit(mat, [1])
print(mat_left)
print(mat_right)
[[0 1 2]]
[[3 4 5]
 [6 7 8]]

[[0]
 [3]
 [6]]
[[1 2]
 [4 5]
 [7 8]]

Numerical operations

All operations are element-wise:

In [53]:
vec = np.arange(5)
print( 2 * vec )
print( vec + 1 )
print( vec + vec )
print( vec * vec ) # element-wise multiplication
[0 2 4 6 8]
[1 2 3 4 5]
[0 2 4 6 8]
[ 0  1  4  9 16]

For dot-product and matrix multiplication use @ or .dot:

In [54]:
vec = np.arange(3)
print( vec @ vec ) # same as: vec.dot(vec)
print()

mat = np.ones((3,3))
mat_id = np.eye(3)
print( mat * mat_id ) # element-wise multiplication
print( mat @ mat_id ) # same as: mat.dot(mat_id)
5

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
In [55]:
# vector times matrix is associative
print( vec @ mat_id )
print( mat_id @ vec )

# sanity check: matrix times matrix is not associative
print( mat_id @ vec.reshape(3,1) ) 
print( mat_id @ vec.reshape(1,3) )
[0. 1. 2.]
[0. 1. 2.]
[[0.]
 [1.]
 [2.]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-bab654a5456a> in <module>
      5 # sanity check: matrix times matrix is not associative
      6 print( mat_id @ vec.reshape(3,1) )
----> 7 print( mat_id @ vec.reshape(1,3) )

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 3)

Element-wise logical operations:

In [56]:
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)

print( np.logical_or(a, b) ) # for bool arrays same as bit-wise: a | b
print( np.logical_and(a, b) ) # for bool arrays asame as bit-wise: a & b

# but short-circuiting or/and won't work on arrays
a or b
[ True  True  True False]
[ True False False False]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-6d068b6cb513> in <module>
      6 
      7 # but short-circuiting or/and won't work on arrays
----> 8 a or b

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

NumPy has special numeric values np.inf, np.nan which propagate themselves as expected during vectorized numerical operations:

In [57]:
vec_div = np.hstack([np.ones(3), np.nan]) / np.arange(4)
print(vec_div)

print("is inf:", vec_div == np.inf)
print("is nan:", np.isnan(vec_div))
print("sum =", np.sum(vec_div))

# Note: there are built-in NaN-safe aggregations
print("nansum =", np.nansum(vec_div))
?np.nan*
[inf 1.  0.5 nan]
is inf: [ True False False False]
is nan: [False False False  True]
sum = nan
nansum = inf

All the basic math functions like np.abs, np.sqrt, np.log, np.sin etc. are implemented as vectorized functions.

Some basic array aggregations (reducing the dimension):

  • np.sum/np.prod: sum/product of elements
  • np.min/np.max: minimum/maximum value
  • np.argmin/np.argmax: index of minimum/maximum value
  • np.mean/np.median: mean of elements
  • np.std/np.var: standard deviation/variance of elements
  • np.any/np.all: check if any/all elements are True
  • np.array_equal/np.allclose: check if arrays are equal/approximately equal

Some basic array functions:

  • np.cumsum/np.cumprod: cumulative sum/product of elements
  • np.minimum/np.maximum: element-wise minimum/maximum
  • np.where/np.argwhere: find elements/indices of array satysfying a condition
  • np.logical_or/np.logical_and: elelemnt-wise or/and
  • np.sort/np.argsort: sort along an axis/using a mask

For linear see numpy.linalg module, but note that scipy.linalg module is recommended due to more efficient implementations (using BLAS and LAPACK).

If you know MATLAB, you might find this table of rough-euqivalents helpful.

Iterating to apply a function

If you have a pure-Python function, np.vectorize creates a convenience wrapper for a for loop

In [58]:
from math import sqrt

def transform(x, threshold=1):
    if x < threshold:
        return x*x
    return sqrt(abs(x))

vec = np.random.uniform(0, 2, size=6)
print( vec )

transform_each = np.vectorize(transform)
print( transform_each(vec) )

# using kwargs
print( transform_each(vec, threshold=0) )
[1.26880192 1.69886359 1.44891065 1.22204702 1.44488677 0.64591783]
[1.12641108 1.30340461 1.20370704 1.10546236 1.20203443 0.41720984]
[1.12641108 1.30340461 1.20370704 1.10546236 1.20203443 0.80369013]

Use np.apply_along_axis (np.apply_over_axes) to apply function along a chosen axis (over chosen axes):

In [59]:
mat = vec.reshape(2,3)
# apply row-wise
print( np.apply_along_axis(transform_each, 0, mat) )
# apply col-wise
print( np.apply_along_axis(transform_each, 1, mat) )

# Note: since our transfromation is axis-agnostic, we can just apply it directly to the matrix
print( transform_each(mat) )
[[1.12641108 1.30340461 1.20370704]
 [1.10546236 1.20203443 0.41720984]]
[[1.12641108 1.30340461 1.20370704]
 [1.10546236 1.20203443 0.41720984]]
[[1.12641108 1.30340461 1.20370704]
 [1.10546236 1.20203443 0.41720984]]

Remember: if possible, before using for loop / np.vectorize / np.apply_along_axis / np.apply_over_axes, check if there is a way to compute what you want using only NumPy functions.

Excercise

  1. Write a pure-NumPy implementation of transform_each function. Note: indexing creates views, not copies.
  2. For each matrix arr[i,:,:], where arr = np.arrange(12).reshape(2, 2, 3), compute a sum of its all elements. Try at least two different implementations (hint: np.lookfor('sum dimension')).

Broadcasting

NumPy can do operations on arrays of different sizes if they can be replicated in a dimension to match a common shape.

Src: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html

"replicated in a dimension" does not mean "cycled" - broadcasting only works with new or existing dimensions of size 1:

In [62]:
np.ones(8).reshape(2,4) + np.arange(2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-3ec60fce4a3c> in <module>
----> 1 np.ones(8).reshape(2,4) + np.arange(2)

ValueError: operands could not be broadcast together with shapes (2,4) (2,) 

Excercise

  1. Add a row number to each element in a random 3x4 matrix.
  2. Create a 3x5 matrix of odd numbers starting with 0 value at position (0,0) and increasing by 2 with both each row and each column:
     |  0  2  4  6  8 |
     |  2  4  6  8 10 |
     |  4  6  8 10 12 |