A standard Python installation not only provides the Python interpreter, but also a huge collection of modules.
The reference documentation can be found at https://docs.python.org/3/library/index.html.
The website "python module of the weeks" (https://pymotw.com/3/) introduces a filtered selection of the standard libary with more examples and a less techincal explanations.
import math
print(math.pi)
print(math.cos(math.atan(1)))
Python also supports complex numbers, the complex unit is written as j
:
1j ** 2
import cmath
print(cmath.sqrt(-1))
# exp(pi * i) == -1
print(cmath.exp(math.pi * 1j))
The statistics
module offers numerically robust implementations of basic statistics:
import statistics
print(statistics.median(range(12)))
print(statistics.variance(range(12)))
random
offers pseudo random generators for different distributions:
import random
print(random.gauss(mu=1.0, sigma=1.0))
print(random.uniform(2, 3))
f = 1.1 + 2.2
print("f is", f)
# supports fractions:
import fractions
print()
print("using fractions")
f = fractions.Fraction(11, 10) + fractions.Fraction(22, 10)
print("f is", f)
print("float(f) is", float(f))
# supports arbitrary precise floats
import decimal
print()
print("using decimal")
f = decimal.Decimal('1.1') + decimal.Decimal('2.2')
print("f is", f)
print("float(f) is", float(f))
A defaultdict
is a dictionary-like data structure with a specified default value for unkown keys.
The signature is defaultdict(function)
where function()
delivers the defaultvalue.
from collections import defaultdict
d = defaultdict(lambda: 3)
print(d[0])
Here are two typical use cases:
# int() results in 0:
int()
Thus defaultdict(int)
can be used to simplify counting:
data = "adffjjkjwet"
counter = defaultdict(int)
for c in data:
counter[c] += 1
print(counter.items())
And defaultdict(list)
for grouping data:
list()
grouped_values = defaultdict(list)
values = [1, 2, 3, 2, 1, 3, 4]
groups = [0, 1, 1, 0, 1, 1, 0]
for g, v in zip(groups, values):
grouped_values[g].append(v)
print(grouped_values)
for g, values in grouped_values.items():
print('average of group', g, 'is', sum(values) / len(values))
The previous example for counting can be simplified further:
from collections import Counter
c = Counter(data)
print(c)
print(c.most_common(1))
Python tuple
s are helpful four grouping data, the namedtuple
extends this by assigning names to the elements:
from collections import namedtuple
Point = namedtuple("Point", ["x", "y", "z"])
p = Point(1, 2, 3)
print(p.x)
print(p[0])
Similar to the builtin tuple
type, a namedtuple
is immutable:
p.y = 2
# only mentioned here
import queue
import heapq
import itertools # combinatorics and more
import time
# seconds since 1.1.1970:
print(time.time())
Number of function calls grows exponentially for this implementation of fibionacci numbers:
def fib(n):
if n < 2:
return 1
return fib(n - 1) + fib(n - 2)
started = time.time()
print(fib(36))
print(time.time() - started)
A cache prevents repeated function evaluations for known pairs of (arguments, return value):
import functools
@functools.lru_cache()
def fib_cached(n):
if n < 2:
return 1
return fib_cached(n - 1) + fib_cached(n - 2)
started = time.time()
print(fib_cached(36))
print(time.time() - started)
print(fib_cached.cache_info())
LRU
means "last recent update" which means that cache entries are discared based on their last usage.
The default lru_cache
holds up to 128 entries. This can me modified another numerical value:
@functools.lru_cache(maxsize=2)
def fib_cached(n):
if n < 2:
return 1
return fib_cached(n - 1) + fib_cached(n - 2)
started = time.time()
print(fib_cached(36))
print(time.time() - started)
print(fib_cached.cache_info())
Comment: if you use maxsize=None
the cache is unlimited, but this could use up all your memory !
Regular expressions are helpful for parsing complex strings.
Here we look for a sequence stating with a
, followed by 0 or more digits, and finally terminated by a second a
:
import re
m = re.search("a([0-9]*)a", "xyza12345abc")
m.group(0)
m = re.search("a([0-9]*)a", "xyza12345abca111ax")
print(m.group(0))
print(m.group(1))
m = re.search("a([0-9]*)a", "xyz12345abc111ax")
print(m)
More about regular expressions: https://developers.google.com/edu/python/regular-expressions and https://stackabuse.com/introduction-to-regular-expressions-in-python/
import os
import pprint # pretty print
current = os.getcwd()
os.chdir("/tmp")
print()
print("files in current folder", os.getcwd())
pprint.pprint(os.listdir("."))
print()
os.chdir(current)
if not os.path.exists("abc"):
print("abc does not exist")
# touch:
open("abc", "w").close()
if os.path.exists("abc"):
os.remove("abc")
print("deleted abc")
print(os.path.abspath("."))
print(os.path.dirname(os.path.abspath(".")))
print(os.path.basename(os.path.abspath(".")))
print(os.path.join("a", "b", "c.txt"))
print(os.path.splitext("abc.py"))
if not os.path.exists("/tmp/a/b/c/d"):
os.makedirs("/tmp/a/b/c/d")
Recent Python 3 versions also include the pathlib
module, which makes path manipulations more expressive compared to using os.path
.
import pathlib
here = pathlib.Path(".")
print(here)
print(here.resolve())
print(here.resolve().parent.parent)
print(here.resolve().parent.parent / "xyz")
Iterate over files in folders based on "globbing pattern":
import glob
for nb in glob.glob("/private/*/*.ipynb"):
print(nb)
# more advanced operations on file system, like recursive
# copying or deletion of folders:
import shutil
shutil.copytree
shutil.rmtree
shutil.copy;
from datetime import datetime
n = datetime.now()
print(type(n))
print(n)
print(n.month, n.hour)
n = datetime.now()
time.sleep(1)
delta = datetime.now() - n
print(type(delta))
print(delta)
Formatting and parsing data time values:
datetime.strftime
datetime.strptime;
import pickle
complex_data = {0: [1, 2], 1: {2: [1, 2, (3, 4)]}}
bytestream = pickle.dumps(complex_data)
print(bytestream)
back = pickle.loads(bytestream)
print(back)
print(back == complex_data)
pickle.dump
pickle.load
sqlite3
is a database system without management overhead. A sqlite database is just a plain file.
sqlite3
is not efficient for multi-client access, but very fast (also for large databases) when accessed from a single client, and as such is often used as a application data format. See also https://en.wikipedia.org/wiki/SQLite#Notable_users
import os
import sqlite3
if os.path.exists("data.db"):
os.remove("data.db")
db = sqlite3.connect("data.db")
db.execute("CREATE TABLE points (x REAL, y REAL, z REAL);")
points = [(i, i + 1, i + 2) for i in range(10)]
db.executemany("INSERT INTO points VALUES (?, ?, ?)", points)
db.commit()
query = db.execute("SELECT x, y, z, x + y + z FROM points WHERE x > 3 AND z < 8")
for row in query.fetchall():
print(row)
sqlite3
also shines for spatial data and fuzzy text search.
data = [1, [2, 3], 3]
data_copy = data
Assignment using =
only creates another name for the existing object. Thus:
data_copy is data
The copy
module provides shallow and deep copying usiong copy.copy
and copy.deepcopy
:
import copy
data_copy = copy.copy(data)
print(data_copy is data)
print(data_copy[1] is data[1])
data_copy = copy.deepcopy(data)
print(data_copy is data)
print(data_copy[1] is data[1])
import zlib, zipfile, gzip, tarfile
import csv
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import os
import time
def compute(argument):
started = time.time()
print("process", os.getpid(), "starts computation for argument", argument)
time.sleep(argument)
print("process", os.getpid(), "finished computation for argument", argument)
return (os.getpid(), argument, time.time() - started)
n = multiprocessing.cpu_count()
print("number cores=", n)
started = time.time()
with ProcessPoolExecutor(n - 1) as p_pool: # n workers might freeze machine until computations are finished.
for worker_id, argument, needed in p_pool.map(compute, (3, 1, 2, 1, 2, 3, 1, 1, 2)):
print(f"worker {worker_id} got argument {argument} and needed {needed:.2f} seconds")
print("overall time {:.2f} seconds".format(time.time() - started))
You can see that the 9 functions evaluations where distributed to 7 workers such that some workers performed multiple function evaluations.
In total the first evaluation needed longest and dominates the overall runtime.
The easiest way to call external executables is os.system
. You don't see generated output, the return value is 0 for successful execution.
The following example assumes that you work on linux, so you should adapt it if you work on Windows:
print(os.system("ls -al"))
The subprocess
module is more versatile and allows finer grained access to stdin and / or stdout of the executable:
import subprocess
p = subprocess.check_output("ls -al *.ipynb", shell=True)
print(str(p, "utf-8"))
Here we start a Python process (-i
is crucial to make this work), and remotely "enter" a line of code and capture the output.
p = subprocess.Popen("python -i -u -B", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
print(p.stdout.readline())
print(p.stdout.readline())
print(p.stdout.readline())
p.stdin.write(b"print(2 ** 10);\n") # remove this later and try again
p.stdin.flush()
print("read result")
print(p.stdout.readline())
p.terminate()
Such code is fragile and prone to hanging. Just remove the indicated line and you will see that the p.stdout.readline()
call hangs.
This does not mean, that this approach is not recommended, but you have to think about a communication protocol including error handling.
from urllib import request
response = request.urlopen('https://www.python.org/static/img/python-logo@2x.png')
with open("python_logo.png", "wb") as fh:
fh.write(response.read())
# this is jupyter notebook specific command and not supported
# by Python itself:
!ls -l python_logo.png
Note: urllib
is not easy to use in more complicated cases, like authentification, etc. In such cases install and use the requests library
Although Python is dynamically typed, one can add type information to variables and functions since Python 3.5.
Such type annotations DONT CHECK FOR TYPES. They can be used for documenting code and can be accessed by exteranl tools like mypy to check potential type conflicts.
There are also external libraries which perform the type checks during runtime: E.g. https://github.com/agronholm/typeguard
PyCharm also offers type checking, see this tutorial and the PyCharm Documentation
Values are annoatated with a :
, except return values we use ->
:
def add_integers(a: int, b:int) -> int:
c : int = a + b
return c
In case you are curious, the annotations are stored in the __annotations__
attribute of the function:
print(add_integers.__annotations__)
As one can see the type checks are not performed:
add_integers("a", "b")
The typing
from the standard library allows declaration of more complex types. For example this does not work:
def a(x: list[int]) -> int:
pass
Instead this works:
import typing
def add_many(values: typing.List[float]) -> float:
return sum(values)
There is also a more abstract type declaration from typing
which is more general:
def add_many(values: typing.Sequence[float]) -> float:
return sum(values)
Another example is types.Union
which can be read as "or":
Number = typing.Union[bool, int, float]
def add_numbers(a: Number, b: Number) -> Number:
return a + b