Some advanced topics

This script introduces some advanced topics and helpful functions to bridge the gap from the Python Introduction course to the Python Challenges.

More about functions

Python functions support so called "default" arguments

def greet(name, ending="!"):
    print("hi", name, ending)

If we now call greet with one argument, Python uses the default value of the second:

greet("jesus")
hi jesus !

But we still can overrun this:

greet("darling", "<3")
hi darling <3

This is very usefull to provide sensible default values of algorithms for "non-expert" users, which an expert still can overrun.

Python functions also support multiple return values. If a function computes more than one value, we list the results separated with ",":

def sum_and_diff(x, y):
    return x + y, x - y

If we call this function, we must use the same number of variables on the left side of =.

s, d = sum_and_diff(10, 3)
print(s, d)
13 7

Exercise:

  • Implement a function product which takes up to three values to compute their product. Using right default values helps. Called as product(1, 2, 3), product(2, 3) or product(6) it should always return 6.

  • Implement a function which returns sum and average of a given list of numbers.

Tuples and tuple unpacking

Tuples are "immutable lists". This means: once constructed you can not change them any more (no append, no replacement of single entries).

The main reason for this is that keys in dictionares must be immutable. The consequence is that lists are not allowed, but tuples.

To extract all elements of a tuple (or list) at the same time, we can write:

tp = (1, 2, 3)
a, b, c = tp

instead of

a = tp[0]
b = tp[1]
c = tp[2]

We can write tuples shorter without brackets:

tp = 1, 2, 3

So if a function returns multiple, comma separated values, it still returns one single item which is a tuple. And when we extract the return values, we do tuple unpacking:

# repeated from above

def sum_and_diff(x, y):
    # returns tuple with two elements:
    return x + y, x - y

# unpack tuple:
a, b = sum_and_diff(10, 3)

And as keys in a dictionary:

def matrix():
    result = {}
    for i in range(1, 5):
        for j in range(1, 5):
            result[i, j] = i * j   # tuple as key!
    return result

m = matrix()
print(m[2, 3])
6

When lists, when tuples ?

There is no clear rule when to use lists and when to use tuples. But a rule of thumb is:

  • use tuples for grouping heterogenous data (eg one entry of an address book)
  • use lists to collect items of same "type" (list of addresses)
address_1 = ("jesus", "heaven", 17)
address_2 = ("devil", "hell", 666)
address_3 = ("eth zurich", "rämistrasse", 1)

addresses = [address_1, address_2, address_3]

Sets

Python has a container data type set to represent mathematical sets. In a set all elements are unique and have no ordering.

numbers = {1, 2, 3}
print(numbers)
{1, 2, 3}

Using {} for an empty set is not possible, {} already is an empty dictionary. Instead we write:

a = set()

To add elements to the set:

a.add(1)
a.add(2)
a.add(1)  # does nothing, 1 is already in the set !
print(a)
{1, 2}

Sets are optimized for "membership lookup". The time to check if a given element is in a given set, is more or less independent of the size of the set, may it have 1000 or 1000.000 elements.

print(1 in a)
print(3 in a)
True
False

Further the data type supports standard mathematical set operation as intersection, union and set differences:

numbers = {1, 2, 3, 4}
even = {2, 4, 6, 8}

For set operations we have usually two alternatives

# intersection
print(numbers & even)
print(numbers.intersection(even))
{2, 4}
{2, 4}
# union
print(numbers | even)
print(numbers.union(even))
{1, 2, 3, 4, 6, 8}
{1, 2, 3, 4, 6, 8}
# difference
print(numbers - even)
print(numbers.difference(even))
{1, 3}
{1, 3}

More about iterators

Most iterators we have seen so far are used after in in a for statement:

for i in range(3):
    print(i)
0
1
2

Other examples for iterators are:

  • file handles (for line in fh:)
  • strings (for character in "abcde":)
  • dictionaries (for key in ...:) (always iterates over keys)

The functions list, set and tuple take arbitrary iterators and construct a list, set or tuple from them.

print(range(4))
print(list(range(4)))
range(0, 4)
[0, 1, 2, 3]
print(list((1, 2, 3)))
print(tuple([1, 2]))
[1, 2, 3]
(1, 2)
print(list("sfdkadjf"))
['s', 'f', 'd', 'k', 'a', 'd', 'j', 'f']

Another useful function is sorted which takes an iterator and computes a sorted list:

print(sorted("dskfj"))
['d', 'f', 'j', 'k', 's']

Together with set we can implement efficient operations:

li = [1, 2, 1, 3, 2]
unique_elements = set(li)
print(unique_elements)
{1, 2, 3}
def unique_elements(data):
    return len(set(data)) == len(data)

print(unique_elements("abcde"))
print(unique_elements("abcdea"))
print(unique_elements([1, 2, 3]))
True
False
True

List comprehensions

Python has some shortcuts ("syntactic sugar") for common list operations:

a = [i**2 for i in range(10)]
print(a)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

is the same as

a = []
for i in range(10):
    a.append(i**2)
print(a)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

and

b = [ai + 1 for ai in a if ai % 2 == 0]
print(b)
[1, 5, 17, 37, 65]

is the same as

b = []
for ai in a:
    if ai % 2 == 0:
        b.append(ai + 1)
print(b)
[1, 5, 17, 37, 65]

The general form is [ f(xi) for xi in <iterable>] resp. [ f(xi) for xi in <iterable> if <condition>].

Dictionary expressions

Constructing dictionarys works using dictionary comprehensions works similar to list comprehensions:

sizes = {word: len(word) for word in "a bc def ghij".split() if len(word) > 1}
print(sizes)
{'bc': 2, 'def': 3, 'ghij': 4}

Exercise

What do the following list comprehensions compute ?

[ai.upper() for ai in "hello you" if ai != "o"]
[i for i in range(10) if i % 2 == 0]
[word.upper() for word in ["hi", "you", "how", "do", "you", "do"] if word[0] in "hy"]

assert statement

assert checks if a given condition is True and "throws an exception" if not. It can be used to check if function arguments fullfill a given condition to ensure that the function works as expected. You also can specify a helpful error message.

The following function only works for non empty lists, else we divide by zero:

def average_1(li):
    return sum(li) / len(li)

To avoid this, we write:

def average_2(li):
    assert len(li) > 0, "this function only works for non-empty lists"
    return sum(li) / len(li)

We can also introduce type checks:

def average_3(li):
    assert isinstance(li, list), "this function only works for lists"
    assert len(li) > 0, "this function only works for non-empty lists"
    return sum(li) / len(li)

Exercise:

  • type the examples above
  • compare the error message of average_1([]) to the error message of average_2([])
  • compare the error message of average_1(17) to the error message of average_3(17)

Helpful modules: os, glob, collections.

os provides operating specific operations, mostly about file handling:

import os
print(os.path.exists("abc.txt"))
with open("abc.txt", "w") as fh:
    pass  # pass is needed for empty code blocks !
print(os.path.exists("abc.txt"))
os.remove("abc.txt")
print(os.path.exists("abc.txt"))
False
True
False
print([name for name in os.listdir(".") if name.endswith(".txt")])
['codon_table.txt', 'logistic_data.txt', 'logistic_data_multi.txt', 'requirements.txt']

glob allow iteration based on wildcards:

import glob
print(glob.glob("*.tx?"))
['codon_table.txt', 'logistic_data.txt', 'logistic_data_multi.txt', 'requirements.txt']

defaultdict can be used to define default values for undefined keys.

# without defaultdict
counts = {}
for letter in "abcdabcda":
    if letter not in counts.keys():
        counts[letter] = 0
    counts[letter] += 1
    
print(counts)
{'a': 3, 'b': 2, 'c': 2, 'd': 2}
from collections import defaultdict

dd = defaultdict(int)
print(dd[3])  # creates "0" entry for unknown key
print(dd)     # now you see the entry 
0
defaultdict(<class 'int'>, {3: 0})
counts = defaultdict(int)
for letter in "abcdabcda":
    counts[letter] += 1
    
print(counts)
defaultdict(<class 'int'>, {'a': 3, 'b': 2, 'c': 2, 'd': 2})

The argument of defaultdict is a function, which returns the default value:

# for defaultdict(int):
print(int())
0
dd = defaultdict(list)
print(dd[0])
[]
# just for fun, quite useless

import random

dd = defaultdict(random.random)

for i in range(5):
    print(i, dd[i])

print(dd)
0 0.5306118160044974
1 0.27278478920589166
2 0.12325868154686914
3 0.7767983744253266
4 0.9034008545348685
defaultdict(<built-in method random of Random object at 0x7fb602862e18>, {0: 0.5306118160044974, 1: 0.27278478920589166, 2: 0.12325868154686914, 3: 0.7767983744253266, 4: 0.9034008545348685})

String formatting

To fill in place holders in a string by given values we can use the .format method of strings. The simples form of a place holder is {}:

print("{} + {} = {}".format(1, 2, 3))
1 + 2 = 3

We can instruct formatting like this: :e is scientific notation, :f for floats and :d for integer numbers:

print("{:e} + {:f} = {:d}".format(1, 2, 3))
1.000000e+00 + 2.000000 = 3

The specification for such format instructions is quite complex, see https://mkaz.tech/code/python-string-format-cookbook/

A few more examples: + indicates to print always the sign, .2 means two numbers after the decimal point, and 7 means "at least 7 characters":

print("{:+e} + {:.2f} = {:7d}".format(1, 2, 3))
+1.000000e+00 + 2.00 =       3

You can also use different order when substituting the "template":

print("{2} = {0} + {1}".format(1, 2, 3))
3 = 1 + 2
print("{c} = {a} + {b}".format(a=1, b=2, c=3))
3 = 1 + 2

And mixed : (Name or position before :, format specification after :):

# field names:
print("{c:+e} {b:7.2f}".format(b=3.14141, c=12))

# positions:
print("{1:+e} {0:7.2f}".format(3.14141, 12))
+1.200000e+01    3.14
+1.200000e+01    3.14