Looping with for

Programming languages as C and C++ only know "couting loops": loop from a starting value to an and value by a given stepsize.

In C for loops always look like this:

for (int i=0; i < 10: i+=2) {
...
}

The Python equivalent is:

for i in range(0, 10, 2):
    print(i)
0
2
4
6
8

But Python loops are more versatile, there are many objects you can "loop over":

for c in "abcde":
    print(c)
a
b
c
d
e
data = [1, 2, 4, -2]
for item in data:
    print(item, end=" ")
1 2 4 -2 

The previous example is much shorter than the counting loop version:

data = [1, 2, 4, -2]
for index in range(len(data)):
    item = data[index]
    print(item, end=" ")
1 2 4 -2 

tuples also work:

for item in (1, 2, 4, -2):
    print(item, end=" ")
1 2 4 -2 

And if we iterate over a dictionary, we iterate over the keys:

data = {1: 1, 2:4, 3: 9}
for number in data:
    print(number)
1
2
3

Finally we can also iterate over a text file:

file_name = 'text.txt'

# prepare example file

with open(file_name, 'wt') as fh:
    print('hi jo', file=fh)
    print('second line', file=fh)
    
# now read

print('using readlines')       
with open(file_name, 'rt') as fh:
    print(fh.readlines())
    
print()
print('using for loop')
with open(file_name, 'rt') as fh:
    for line in fh:
        print(repr(line))
using readlines
['hi jo\n', 'second line\n']

using for loop
'hi jo\n'
'second line\n'

Comment: Using a for to iterate over a text file allows processing of files which don't fit into your computers memory

enumerate and zip

enumerate iterates of an iterator and also provides the iteration number:

with open(file_name, 'rt') as fh:
    for i, line in enumerate(fh):
        print('line', i, 'is', line.rstrip())
line 0 is hi jo
line 1 is second line
for i, c in enumerate("ABCD"):
    print('character', i, 'is', c)
character 0 is A
character 1 is B
character 2 is C
character 3 is D

zip allows to iterate over two or more iterables at the same time (like a "zipper"), the shortest iterable determines the number of iterations:

for ai, bi in zip('abcd', range(2, 10, 3)):
    print(ai, bi)
a 2
b 5
c 8
for ai, bi, ci in zip('abcd', range(2, 10, 3), [4,3,2,1]):
    print(ai, bi, ci)
a 2 4
b 5 3
c 8 2

Catch exceptions as expected

First we create some data required in the following example:

with open("abc", "w") as fh:
    print("hi", file=fh)

Just a demo function:

def read(path):
    with open(path, "r") as fh:
        return f.read()
try:
    read("abc")
except Exception:
    print("can not read 'abc'")
can not read 'abc'
try:
    read("abc")
except IOError:
    print("can not read 'abc")

Some basic design decisions of Python

import this is a one of Pythons Easter Eggs (by the way: did you try import antigravity already ?). This command shows some fundamental design decisions of the inventors of Python:

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The ones I like most as general principles of software development:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Readability counts.
  • Errors should never pass silently.
  • In the face of ambiguity, refuse the temptation to guess.

This has some improtant consequences for general programming:

  • the simpler your solution the better, if you have a solution, rethink it and if you come up with a simpler, more elegant and maybe more general approach consider to start again.
  • don't overdesign your software, only implement needed and requested features. But keep future features in mind when you think about the general layout / architecture of your program.
  • don't implement "magic" to handle corner cases. In case you experience something unsual throw an exception and stop. Don't try to correct invalid configuration files, better indicate the configuration is not valid
  • Readablity counts: You read code more often than you write it, so make sure that you understand your code if you want to fix it in a few weeks or month. Colleagues also might want to fix or extend your code !
    • read your code again when finished and try to find dirty corners which will cause head pain in a few weeks
    • use clear variable and function names
    • write many and short functions
  • Be defensive: check functions arguments for being valid, else raise an exception. (You might use assert statements for this)

One real life example

One problem I worked on recently:

  • A software package for cosmological computations internal config handling had default values which were hidden in the software
  • Users could provide config files to overwrite parts of the config, else default value were taken.
  • Some configuration values where in some cases computed based on other configuration settings.

Negative effects:

  • pretty painful to debug and to understand the produced results, actually used config settings were not explicit
  • In case default settings where changed, existing scripts depending on the software might break or produce different results.

My solution is:

  • no default values
  • replace warning messages by exceptions.
  • every config file must specify all available parameters
  • every config file is checked for consistency (are all setting specified ? does the config file use invalid settings ? are all settings consistent ?)

This caused many (early) failures for the users, but the users could directly track what they did wrong. Before this change computations could finish without actually understanding the relation between results and configuration.

About readability

The official recommendations for good style of Python code are named https://www.python.org/dev/peps/pep-0008/ PEP8.

Most important rules:

  • don't mix tabs and spaces, usually a modern editor does tab to spaces conversion autmatically or you can enable this
  • use multiples of four spaced for indentation
  • prefer lower case letters + underscores for function and variable names (co called snake case):

    e.g. this_is_my_counter = 0.

  • User uppercase + lower case letters for classes,

    e.g. class MyCounter:

  • Use spaces around algebraic operations and + :

    x = y + z instead of x=y+z.

  • Use space after comma:

    my_func(1, 2, 3) instead of my_func(1,2,3).

  • Official PEP 8 says "max 80 characters per line", many find a limit of 100 more practical.

swap values

No temporary variable required:

a = 3
b = 7
a, b = b, a
print(a, b)
7 3

Building large strings

In some circumstances and for some Python versions, the following code can get slow for very long strings:

result = ""
for i in range(100000):
    result += str(i) * 100

Better collect your parts as a list of strings, and finally use "".join(..) to construct the final string:

parts = []
for i in range(100000):
    parts.append(str(i) * 100)

result2 = "".join(parts)
assert result == result2

dicts and sets

Dictionaries and sets in Python are very fast. Lookup time is approximately constant, independent of the size of the actual dict or set (but at the cost of some overhead memory consumption): In contrary the time for checking if a given element occurs in a given list is on average proptional to the size of the list.

Further set operations can be very expressive:

def check_for_duplicates(collection):
    return len(set(collection)) == len(collection)

print(check_for_duplicates("abcde"))
print(check_for_duplicates("abcdea"))

numbers = [1, 2, 3, 1]
print(check_for_duplicates(numbers))
True
False
False
data1 = [1, 2, 3, 4]
data2 = [2, 3, 6]
n_common = len((set(data1) & set(data2)))
print('both lists have', n_common, 'elements in common')
both lists have 2 elements in common
required = {'epsilon', 'num_iter', 'use_optimizations'}

config = {'num_iter': 1000, 'use_optimizations': True, 'delta': 1e-6}

missing = required - set(config)
unknown = set(config) - required

if missing:
    print("setting(s) for {} are missing".format(", ".join(missing)))
if unknown:
    print("setting(s) for {} not known".format(", ".join(unknown)))
setting(s) for epsilon are missing
setting(s) for delta not known

any and all

print(all([True, False]))
print(all([True, True]))
print(all([False, False]))
False
True
False
print(any([True, False]))
print(any([True, True]))
print(any([False, False]))
True
True
False
def average(data):
    assert isinstance(data, list), "data must be of type list"
    assert all(isinstance(item, (int, float)) for item in data), "data must be numbers"
    if not data:
        return None
    return sum(data) / len(data)
print(average([1, 2.0, 3]))
2.0
print(average([1, 2.0, 3, "4"]))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-27-46c4c8f3f0b3> in <module>()
----> 1 print(average([1, 2.0, 3, "4"]))

<ipython-input-25-59e214767e2d> in average(data)
      1 def average(data):
      2     assert isinstance(data, list), "data must be of type list"
----> 3     assert all(isinstance(item, (int, float)) for item in data), "data must be numbers"
      4     if not data:
      5         return None

AssertionError: data must be numbers

Implicit conversion to bool

Some data are considered as True or False when used within logical checks.

For example None, "", [], (), {}, set(), 0, 0.0 are considered as False, other values as True.

def check(value):
    if value:
        print("{!r} is interpreted as True".format(value))
    else:
        print("{!r} is interpreted as False".format(value))
    
check([])
check([1])
check(())
check((1,2))
check("")
check("abc")
check({})
check({1: 2})
check(set())
[] is interpreted as False
[1] is interpreted as True
() is interpreted as False
(1, 2) is interpreted as True
'' is interpreted as False
'abc' is interpreted as True
{} is interpreted as False
{1: 2} is interpreted as True
set() is interpreted as False