numpy
: provides data containers for linear algebra, eg vectors and matrices + basic linear algebra algorithmsmatplotlib
: provides plotting routinesscipy
: common algorithms on vectors and matrices, eg statistics and routines from numerical analysisscikits
for special topics, eg scikit.learn
for machine learning, others for time series analysis, image analysis, statistics, etcpandas
tries to mimic the R data.frame
type. In contrast to numpy
containers, every column has its own type.
http://pandas.pydata.org/pandas-docs/stable/10min.html#minBefore we start:
import numpy
, import scipy
and import matplotlib
work on your machine.numpy
: Python module with efficient data structures for vectors and matrices.
If you are not familar with vectors: like a list of floating point or integer numbers, all having the same type.
Before we start we import numpy
and give it a shorter name:
import numpy as np
print(np.__version__)
One way to create a vector is to construct it from a given Python list.
For example we use the list [1, 2, 3, 4]
for creating a vector using the array
function from numpy
:
a = np.array([1, 2, 3, 4])
print(a)
This looks like a list without commata.
To see what common type the elements have:
print(a.dtype)
So this is a vector with 64 bit integers.
But... in most / all cases I need a vector with floating point values !!
Either start with a list of floating point values:
a = np.array([1.0, 2.0, 3.0, 4.0])
print(a)
print(a.dtype)
Or instruct array
as follows:
a = np.array([1, 2, 3, 4], dtype=float)
print(a)
print(a.dtype)
Often you need an equispaced vector, so to
create a vector of 10
elements starting at 1.0
, ending with 3.0
the linspace
function will help us:
x = np.linspace(1.0, 3.0, 10)
print(x)
print(x.dtype)
If you have the starting value, the end value and the distance between two values of the vector arange
does the job:
x = np.arange(2, 4, .5) # upper limit is exclusive !!!
print(x)
A so called ufunc
(this is numpy slang for "universal function") works on a vector in an element-by-element fashion. That is it creates a new vector from a given one by transforming every element of the input vector.
This is shorter code than using lists, and much faster for large vectors:
print(x)
y = np.sin(x)
print(y)
Using the already introduced math
module does not work here:
import math
math.sin(y)
Other ufunc
functions: np.exp
etc, actually everything you find in math
module.
Operations as *
involving two vectors are element wise too:
print(x * x)
print(x / x)
print(x ** 3)
So called broadcasted operations: One arg is a traditional float, the other one is a numpy
array. Operations again are elementwise:
print(3 * x)
x -= 0.1
print(x)
Aggregation functions map a vector to a single number:
print(x)
print(np.min(x), np.max(x))
print(np.mean(x), np.std(x), np.std(x, ddof=1))
What is this ddof
parameter ? See
http://stackoverflow.com/questions/7482205/precision-why-do-matlab-and-python-numpy-give-so-different-outputs/7482413#7482413
np.sum(x), np.prod(x)
Vector length:
np.linalg.norm(x)
dot product, other names "scalar" or "inner" product:
np.dot(x, y)
Before we start, we print x
again, so it is easier to understand the output of the following examples:
print(x)
Slicing works as usual:
y = x[1: -1]
print(type(y))
print(y)
Select values at given positions:
print(x[[0, 2, 3]])
Select values with a certain property, eg values which are larger than 2.0
:
print(x[x > 2.0]) # read this as "x where x is greater than 2.0"
How does this work ?
First observation: we can index a numpy array with another numpy array of same size holding boolean values
idx = np.array([True, False, True, False])
print(x[idx])
Second observation: x > 2.0
delivers such an boolean array:
print(x > 2.0)
This syntax can be used on the left side of =
for assigning values:
print(x)
x[x > 2.0] = -1
print(x)
We handle some matrix stuff at the end of the script....
# IGNORE THE LINE BELOW, IT IS ONLY FOR CREATING THIS SCRIPT!
%matplotlib inline
x = np.linspace(0.0, 2 * np.pi, 20)
y = np.sin(x)
import pylab
pylab.plot(x, y)
pylab.show()
Blue is the standard color. To change this use:
pylab.plot(x, y, "green")
pylab.show()
To plot dots, eg red dots:
pylab.plot(x, y, "r.")
pylab.show()
pylab.plot(x, y)
pylab.plot(x, y, "ro") # "o" means: big dots
pylab.plot(x, np.cos(x), "green")
pylab.show()
pylab.plot(x, y)
pylab.grid(True)
pylab.show()
pylab.plot(x, y, label="wave")
pylab.plot(x, np.cos(x), label="shifted wave")
pylab.grid(True)
pylab.xlabel("time")
pylab.ylabel("amplitude")
pylab.title("sine wave over time")
pylab.legend() # activates the legent in the upper corner
pylab.show()
# 1 row, 2 columns ,first plot:
pylab.subplot(1, 2, 1)
pylab.plot(x, y, label="sin")
pylab.legend()
# 1 row, 2 columns, second plot:
pylab.subplot(1, 2, 2)
pylab.plot(x, np.cos(x), "green")
pylab.grid(True)
pylab.show()
pylab.plot(x, y)
pylab.savefig("sin.png")
!ls -l sin.png
N = 50
x = np.random.rand(N) # N randoms with uniform distribution in range 0 .. 1
y = np.random.rand(N) # dito
radiuses = 15 * np.random.rand(N) # N random uniform distributed radiuses in range 0 .. 15
areas = np.pi * radiuses ** 2 # you see numpy here ?
colors = np.random.rand(N) # and N random colors
pylab.scatter(x, y, s=areas, c=colors, alpha=0.5)
pylab.show()
N = 2500
values_1 = 0.5 * np.random.randn(N) # 500 normal distributes
values_2 = 0.2 + np.tan(0.2 * np.random.randn(N)) # some strange distribution
# alpha is transparency, "stipfilled" ommits the lines about the single rectangles:
pylab.hist(values_1, bins=50, color="green", alpha=0.3, histtype="stepfilled")
pylab.hist(values_2, bins=50, color="red", alpha=0.6)
pylab.show()
Thats not all folks, many more examples at:
Beyond matplotlib, there are some new libraries, see:
scipy provides stable and high quality algorithms for lots of math related tasks.
For example
overview: http://docs.scipy.org/doc/scipy/reference/tutorial/index.html
scipy.integrate.quad
takes the function to integrate and lower and upper integration limits:
import scipy.integrate
area, error_estimate = scipy.integrate.quad(np.sin, 0.0, np.pi)
print(area)
print(error_estimate)
An example with a more complicated function:
def strange(x):
return np.sin(np.cos(x) + 1.0)
print(scipy.integrate.quad(strange, 0.0, 1.0))
We demonstrate how to fit $$f(x) = a \sin(x) + b \cos(c + x)$$ to given data:
def function(x, a, b, c):
"""x is a vector holding the x values for evaluation
a, b and c are parameters of the function
"""
return a * np.sin(2 * x) + b * np.cos (c + x)
Generate test data:
a = 1.0
b = 2.0
c = 0.1
x = np.linspace(0.0, 6.0, 40)
# artificial y(x) for given parameters:
y = function(x, a, b, c)
# we add normal distributed "noise":
y_measured = y + 0.5 * np.random.randn(len(x))
pylab.plot(x, y)
pylab.plot(x, y_measured, "o")
pylab.show()
Now we want to calculate estimates for a
, b
and c
from x
and y_measured
:
import scipy.optimize
estimated, __ = scipy.optimize.curve_fit(function, x, y_measured)
print(a, b, c)
print(estimated)
a_est, b_est, c_est = estimated
y_estimated = function(x, a_est, b_est, c_est)
pylab.plot(x, y, label="y")
pylab.plot(x, y_estimated, label="y_estimated")
pylab.plot(x, y_measured, "o")
pylab.legend()
pylab.show()
numpy
¶To create a matrix you can start from a nested list, similar to the way we created vectors in the beginning of the script:
m = np.array([[1, 2, 3], [2, 3, 4]], dtype=float)
print(m)
m
has two rows and three columns:
print(m.shape)
For special matrices we have functions eye
, ones
and zeros
:
print(np.eye(3)) # identity matrix
print(np.ones((3, 2), dtype=float))
print(np.zeros((2, 3), dtype=float))
ufuncs
work again:
print(np.sin(m))
+
and *
too:
print(m + 1)
print(2 * m)
print(m + m)
*
is element-by-element-, not matrix-matrix-multiplication:
print(m * m)
.T
transposes:
print(m.T)
The dimentions of m
and m.T
fit for matrix-matrix multplication. As said *
does not work, we have to use dot
function:
print(np.dot(m, m.T))
Python 3.5 introduced the new operator @
solely for the purpose of simplifying matrix multiplication in numpy code:
print(m @ m.T)
This notation is much more readoble for longer expressions and very similar to the mathematical notation:
print(m @ m.T @ m @ m.T)
np.dot(m, np.dot(m.T, np.dot(m, m.T)))
To solve an equation system
$ x + 2 y + 3 z = 14 $
$ x + y = 3$
$ y + 2 z = 8 $
we start with the coefficient matrix
A = np.array( [[1, 2, 3], [1, 1, 0], [0, 1, 2]], dtype=float)
print(A)
The right hand side is
y = np.array([14, 3, 8], dtype=float)
And the solution is computed using linalg.solve
from numpy
:
x = np.linalg.solve(A, y)
print(x)
To check our result, we "insert x into the equation" and compare it to
y`:
print(np.dot(A, x) - y)
print(A @ x - y)
# THE LINES BELOW ARE JUST FOR FORMATTING THE INSTRUCTIONS ABOVE !
from IPython import utils
from IPython.core.display import HTML
import os
def css_styling():
"""Load default custom.css file from ipython profile"""
base = utils.path.get_ipython_dir()
styles = """<style>
@import url('http://fonts.googleapis.com/css?family=Source+Code+Pro');
@import url('http://fonts.googleapis.com/css?family=Kameron');
@import url('http://fonts.googleapis.com/css?family=Crimson+Text');
@import url('http://fonts.googleapis.com/css?family=Lato');
@import url('http://fonts.googleapis.com/css?family=Source+Sans+Pro');
@import url('http://fonts.googleapis.com/css?family=Lora');
body {
font-family: 'Lora', Consolas, sans-serif;
}
.rendered_html code
{
color: black;
background: #eaf0ff;
padding: 1pt;
font-family: 'Source Code Pro', Consolas, monocco, monospace;
}
.CodeMirror pre {
font-family: 'Source Code Pro', monocco, Consolas, monocco, monospace;
}
.cm-s-ipython span.cm-keyword {
font-weight: normal;
}
strong {
background: #ffe7e7;
padding: 1pt;
}
div #notebook {
# font-size: 10pt;
line-height: 145%;
}
li {
line-heigt: 145%;
}
div.output_area pre {
background: #fffdf0;
padding: 3pt;
}
h1, h2, h3, h4 {
font-family: Kameron, arial;
}
div#maintoolbar {display: none !important;}
</style>"""
return HTML(styles)
css_styling()