1. Preface

address reader with you.

more and more people code, learn basics of progamming language, focussed on solving actual problems, no / little education in fields as computer science and software enngineering.

our history

existing books: target programmers with working experience, principles of object oriented programming must be understood, technical language, too many aspects which are not required for scientific programmers

Main goal: reduce work, robustness, reusability

2. About this book

Main goal: reduce work, robustness, reusability

Gain control over the process.

Why should scientists care about professional software development ? (reproducability crisis, publications with code, reusability, clear process of distribution and reuse.

Techincal dept, speed over time

2.1. On the shoulders of giants

Clean Code / Pragmatic Programmer / GOF / …​

existing books: target programmers with working experience, principles of object oriented programming must be understood, technical language, too many aspects which are not required for scientific programmers

2.2. Motivation, Why this book ?

filtered introductions into several aspects of professional software development. books intention is to provide good basics and introductions so that interested programmers can deepen their knowledge on their own.

links to advanced books provided.

2.3. How to read this book

chapters / sections are inteded to be short so that they can be consumed as bites. Book should be read several times, only reading is not sufficient, pratice is king.

2.4. But this so much to learn…​

First focus on solving problems, Learn one technique until automized, then proceed.

2.5. Why learning to code is so hard

constant learning: reserve some time for reading blogs, books, python standard library, tutorials

2.6. when to switch from prototype to propper development

3. The programming process

3.1. Meta thinking

The answer is that they paid attention to what they were doing while they were doing it — and then they tried to do it better.
— from the foreword of The Pragmatic Programmer

Main thread of thinking is solving the problem. But also think about what you do when programming.

To consider:

  • Is this the only solution ? Can I find another approach which might be better ?

  • What could go wrong if this code runs ?

  • Will I understand my code when I work on it in a few weeks / months ?

  • Will others be able to work on this code without my assistance ?

Numpy is not broken.

Numpy is not broken

3.2. Programming by incidence / (better coincidence ?)

program works but you dont know why, piles up and finally you cant fix it. not tested, depends on environment, no clear contracts, no decoupling

dont ignore warnings

benefit: improves programming skills !

3.3. Fail early

Also in development process: be brave to thrash your first approach if you

3.4. Have a plan, what is a prototype, when to prototype

3.5. The broken window principle

3.6. Version control

3.7. Testing Harness

Test Your Software, or Your Users Will

3.8. Mindset

making mistakes is crutial part of programming process, see it as a challenge and part of your usual work if debugging: realy understand what goes wrong, avoid fixing by incidence

3.9. Benefit of small changes

3.10. What is refactoring ?

3.11. Learn to use an Debugger / IDE, Know your tools.

also: learn some bash, use on IDE and learn it, learn to use a debugger.

3.12. Rubber ducking

4. General principles and techniques

4.1. DRY

4.2. KISS / YAGNI

4.4. Design by contract / modularization / decoupling / orthogonalization

Example: encapsulation of code from SO. Changes should be localized.

4.5. Why global variables are bad

bla bla bla

4.6. Defensive programming

Others will not use your code as you intended. Using asserts

4.7. The power of abstraction

Binary search ?

4.8. Trace bullets

5. Clean code

example: obfuscated hello world.

"programming style as configuration"

You read code more often than you write it.

Code communicates and shares ideas and concepts.

separate overall idea from impelemntation details

5.1. How does good code look like

  • Little scrolling needed

  • Minimal level of identation

  • Clear naming

  • Directly undertandable

  • No suprises

  • Brief, important parts stand out.

5.2. General comments about code layout

PEP 8

About and why PEP 8, max 100 chars per line, snake case, camel case
About bike shedding, style wars, there is no optimal style.
spaces around operators
space after comma.

Use empty lines to group lines

The code style recommendations of PEP 8 are about readability in code editors usually showing at least 25 lines per page. As the layout of a book is not the same, and as we have text refering to code lines, we often do not follow PEP 8 recommendations and use only one empty line to separate functions and classes from other code.

How to avoid very long lines

Python examples with brackets, strings, backslash

About Python PEPs
Minimize clutter

DRY

5.3. Names

Good names minimize comments.

Avoids clutter, duplicate changes of code and comments.

Meaningful names
def compute(a, b):
    return a + b

i, j, k for loops, n for number.

no puns. pronouncable names, ..

Good names avoid mental mapping

Some examples here

Abstract names support reusability

Example

Explanatory Variables

Example

Consistant naming

Benefits and examples.

Avoid puns

example

speakable names

example

aux variables

clarifies understanding

5.4. Comments

comments can become lies !

Comment the unexpected

Examples, Add links to SO solutions in comments.

Don’t comment what the code acutally tells you

Example

Will others understand my comments ?

Example

Use docstrings, no comments before function

Example including exceptions and assertions !

Don’t comment if you can use a good variable or function name instead

Example

No dead code

Why ?, Version control !

5.5. Minimize identation

Functions help

More later

early return

Example

also helps to break out of two or more nested loops

Use break and continue

Example

5.6. Exceptions

Instead return codes don’t catch Exception unless you have good reasons.

5.7. Code smells

bad code needs many comments.

dictionaries in Python can replace long if/else chains

5.8. Functions

small do one thing

example

reduce sections to functions

example with comments for sections

one level of abstraction

examples

reduce number of arguments

nametuples, dicts, parameter objects

example: distance of two 3d points or: tuples for nd-points

connsitent return values

exampl

no global side effects

use classes if needed

no dead functions

cleanup your code, broken windows principle, git manages it

scientific1 functions

use default values for arguemnts, use named argments when calling if list is still long.

extra: no mutables as default values

5.9. Consistency

consistent naming, consistant error handling vs None.

6. Git introduction

git is a distributed version control system.

birth in 2005 from linux developer community.

not all explanations here are 100% exact, but provide basic abstraction to understand git.

only basics, local git without branches is still better than not git.

6.1. Install git

windows vs linux vs mac

6.2. basics

patch management tool, commit, history, staging area

no huge binary files (git annex, git lsf instead).

6.3. First time setup

plus: line ending setup for windows vs unix

6.4. git init

edit .gitignore

everything else from handson, plus:

  • git rm

  • rename / move files

  • git commit --amend

7. Automated code testing

Benefits of testing, how does programming without tests look like, personal experiences.

from harrys book:

I became scared of making changes to my code. I was no longer sure what depended on what, and what might happen if I changed this code over here, oh gosh, I think that bit over there inherits from it—no, it doesn’t, it’s overriden. Oh, but it depends on that class variable. Right, well, as long as I override the override it should be fine. I’ll just check—but checking was getting much harder. There were lots of sections to the site now, and clicking through them all manually was starting to get impractical. Better to leave well enough alone, forget refactoring, just make do.

Soon I had a hideous, ugly mess of code. New development became painful.

minimal example for a unit test

7.1. Kinds of tests

Unit tests

bla bla

Regression tests

bla bla

CI tests

ensure code runs independent of development machine, can run longer

7.2. Test runners

  • Test discovery

  • Test reporting

  • Good diagnostic messages

7.3. Principles of unit tests

  • Independent

  • Order does not matter

  • fixtures

7.4. What to test

  • expected behaviour

  • corner cases

  • error cases / error handling

7.5. Test coverage

example

7.6. Testable code is good code

small units

small functions benefits of small functions, execution paths, …​

what is hard to test

GUI, …​

maximize testability

refactoring example

7.7. Regression tests

Example

7.8. Testing strategies

How to fix bugs
    tests
.How to existing code unter tests.

8. The Python standard library

8.1. General overview

8.2. Helpful modules

os

blafase

glob

blafase

csv

blafase

namedtuple

blafase

Counter

blafase

defaultdict

blafase

pathlib

blafasel

shutil

blafasek

subprocess

blafasel

math

math.fsum, hypot, atan2, …​

statistics

blafasel

9. The scientific Python stack

9.1. conda introduction

9.2. virtual envs

9.3. numpy

…​

9.4. scipy

…​

9.5. pandas

…​

9.6. matplotlib

…​

9.7. scikit learn

…​

9.8. snakemake

…​

10. Practical setup of Python Projects

  • which python version to use ?

  • python2 vs python 3

  • minmal setup for python 2 compatibility

10.1. virtualenvs

incl. upload to pypi with twine

10.3. pytest

10.4. tox

10.5. sphinx

10.6. gitlab-ci / travis

10.7. code linters

TODO: Python pitfalls

11. Object oriented Programming

11.1. What is it about ?

Every variable we use in Python refers to an object. Let it be a number, a string, a list or a numpy array. Also modules and functions are objects.

A class is a recipe or template for creating an object. For example Python offers one class str but as many objects of this class as you need.

An object is also called an instance of a class.

Object oriented programming refers to a programming style solving a programming problem by implementing own classes.

In the 90s object oriented programming was considered as the silver bullet to implement reusable code. Thus some programming languages such as JAVA require that you implement everything, even a program which only prints your name, as a class. Python and other programming languages are more flexible and the use of classes is not always the best solution for a programming problem.

11.1.1. Inspection of the attributes of a class.

Methods of Python objects are also attributes. The only difference is that methods can be called like functions. To inspect the available attributes, Python offers the dir function:

1 2 3 4 5 6 7 8 9
# dir_example.py def add(x, y): return x + y data = {1 : 2} print("methods of add", dir(add)) print("methods of data", dir(data))

The output shows

1 2 3
$ python dir_example.py methods of add ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] methods of data ['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

All methods of the function add start and end with double underscores __ which are internal methods not indented for general use. But if you look at the methods of data you will recognize some known methods.

11.1.2. Encapsulation

Usually an object holds some data and offers methods to work with this data. The internal data is also called internal state of the object.

Objects allows grouping data and operations on this data in one place. Another advantage is that you can use classes and objects without caring about their internal details. Instead a good implemented class offers a simple interface to the user and thus supports reusability. Such an interface is also called an

API which is an abbreviation for Application Programming Interface. A class API is a specification of the methods of a class including their functionality and description of arguments and return values. The code using such an API is also named client code.

An API offers an abstract layer for the user of a class. Client code does not need to be changed if internal details of the class change.

Example

A Python dictionary allows handling of a mapping data structure without caring about the internal details how this data is organized. Skilled computer scientists implemented the details and took care that lookup time is very fast without bothering you with such details. If computer scientists develop better algorithms for handling dictionaries these could be implemented in future Python versions without demanding changes in existing code.

Use cases for classes

Your code benefits from using classes in following cases:

  • You have a set of functions which either pass complex data structures around or your functions and their argument lists would become simpler by using global variables (see also Why global variables are bad). Classes offer a good solution in this situation.

  • You want to implement your own data types.

  • You work with data records, e.g. postal address and address books. An address could be implemented as a class, and the address book would be a class to manage the addresses. In this situation a simple solution would be to resort to tuples or [namedtuple]s for the records and a list for the address book, but a class based implementation would hide implementation details.

  • You want to encapsulate complex operations and offer an easy to use API.

Example Address Book

Your address record object could offer attributes like first_name, second_name, street, street_number, zip_code, city and country. The address book object could implement methods like add, find_by_name, delete and save_to_file.

Example Web Service

Communication with a web service, for example a web service providing weather data, consists of many technical details. A class offering methods like current_temperature_in(city_name) can be used to built up tools for global weather analysis focussing on the details of the analysis. Switching to another web service with faster response time or more accurate data would only affect the internals of the class leaving the client code untouched.

11.2. Introduction to Python classes

The next sections will teach you the basics of how to implement classes in Python.

11.2.1. Our first class

1 2 3 4 5 6 7 8
# __file_ _ class Greeter: (1) def say_hello(self): (2) print("hi you") g = Greeter() (3) g.say_hello() (4)
1 the class statement followed by the name of the class starts the definition of the class, we indent every declaration related to the class, for example declaration of methods.
2 methods are declared similar to funtions, ignore the self, we explain this later.
3 we create an instance of class Greeter
4 we call the method say_hello of g, we don’t specify the self argument from the definition.

And this is the output:

1 2
$ python greeter.py hi you

11.2.2. What is self ?

Next we investigate what self is:

1 2 3 4 5 6 7 8 9 10 11 12 13
# what_is_self.py class WhatIsSelf: def print_self(self): print("self is", self) u = WhatIsSelf() print("u is", u) u.print_self() v = WhatIsSelf() print("v is", v) v.print_self()

And this is the generated output:

1 2 3 4 5
$ python what_is_self.py u is <__main__.WhatIsSelf object at 0x7f3a6e986a20> self is <__main__.WhatIsSelf object at 0x7f3a6e986a20> v is <__main__.WhatIsSelf object at 0x7f3a6e986be0> self is <__main__.WhatIsSelf object at 0x7f3a6e986be0>

Lines 2 to 5: the values after at are the hexadecimal representations of the memory location of the corresponding object.

Comparing the code and the created output we conclude:

Insights
  1. self is the current instance of the class

  2. We always must declare self as the first argument of a method.

  3. We don’t specify the value for self when we call a method. Python inserts the current object for us.

11.2.3. Initializing an object

Python special methods start and end with two underscores __ and have a predefined function. Such methods are also called dunder methods (abbreviation for double underscore methods).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# point_2d.py class Point2D: def __init__(self, x, y): (1) self.x = x self.y = y def move(self, delta_x, delta_y): self.x += delta_x self.y += delta_y if __name__ == "__main__": p = Point2D(1, 2) (2) print(p.x, p.y) (3) p.move(41, 21) print(p.x, p.y)
1 The __init__ method initializes an object. Here we set x and y as attributes of the current instance of the class.
2 We pass values 1 and 2 as arguments to __init__
3 Based on the implementation of __init__ both values are accessible as attributes of the object.

And this is the output of the program:

1 2 3
$ python point_2d.py 1 2 42 23

11.3. Inheritance

We can use existing classes to create new classes based on existing classes. This is called inheritance. If we inherit from a given class we inherit its methods and can add new methods or overwrite existing ones.

The class we inherit from is called base class, classes inheriting from a base class are called sub classes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# vector_2d.py import math from point_2d import Point2D class Vector2D(Point2D): (1) def length(self): return math.hypot(self.x, self.y) def add(self, other): result = Vector2D(self.x, self.y) result.move(other.x, other.y) (2) return result if __name__ == "__main__": v1 = Vector2D(3, 4) (3) v2 = v1.add(v1) print(v1.length()) print(v2.x, v2.y)
1 The declaration class Vector2D(Point2D): defines a new class Vector2D starting with the methods of the existing class Point2D. Here we add two extra methods length and add.
2 As we inherit the method move from the base class Point2D we can use the move method.
3 Works because we inherit the dunder method __init__ from Point2D.
1 2 3
$ python vector_2d.py 5.0 6 8

11.4. Special methods *

Python offers more dunder methods than __init__. These support syntactic sugar to make client code look more pythonic. This means such methods are not required for object oriented programming, but can help to offer a more convenient API.

Here we show a few examples:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
# vector_2d_fancy.py import math from point_2d import Point2D class Vector2D(Point2D): def length(self): return math.hypot(self.x, self.y) def __str__(self): (1) return "Vector2D({}, {})".format(self.x, self.y) def __add__(self, other): (2) result = Vector2D(self.x, self.y) result.move(other.x, other.y) return result def __getitem__(self, index): (3) if index not in (0, 1): raise IndexError("index {} must be 0 or 1".format(index)) return (self.x, self.y)[index] if __name__ == "__main__": v1 = Vector2D(3, 4) v2 = Vector2D(4, 5) print(v1, v2) (4) v3 = v1 + v2 (5) print(v3[0], v3[1]) (6)
1 __str__ implements string conversion for instances of the class. This happens if we call str(v1) or if we print an object. See also comment for line 26.
2 __add__ implements + for an object of the class. See also comment for line 27. Python lists and tuples also implement this method.
3 __getitem__ implements indexed access using square brackets. See also comment for line 28. Python strings, lists, tuples and dictionaries also implement this method.
4 print tries to convert the argument to a string if possible, thus calls __str__ in this case. You see the result in the output below.
5 Is the same as v3 = v1.__add__(v2).
6 v3[0] is the same as v3.__getitem__[0].

Compare the output to the code above and check the previous explanations again:

1 2 3
$ python vector_2d_fancy.py Vector2D(3, 4) Vector2D(4, 5) 7 9

11.5. Private methods

Sometimes we implement methods which are used from within other methods of the class but are not intended to be used by client code. Such methods are called private methods. Programming languages such as JAVA or C++ provide syntax to mark methods are private.

Python lacks supporting syntax, instead most Python programmers mark such methods by preceding the name with a single underscore _. This serves as a signal for clients "don’t call me, and if you do, don’t expect that this works as expected".

Implementing methods with names starting but not ending with double underscores have a different purpose and should be avoided, unless you exactly understand what you do. Nevertheless you will find bad examples using such names on the web.

11.6. Open/Closed Principle

In object-oriented programming, the open/closed principle states "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification"; that is, such an entity can allow its behaviour to be extended without modifying its source code.
— https://en.wikipedia.org/wiki/Open/closed_principle

This might sound theoretical and slightly vague, so we better go on with some examples.

11.6.1. Template Method Pattern

The so called template method pattern is a strategy to support the open/closed principle. The basic idea is to implement modifications of a base "operation" by means of sub classes.

The following base class Adder adds numbers of a given list or tuple of numbers, the sub class Multiplier implements a variant:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# template_method_pattern.py class Adder: def accumulate(self, values): (1) result = values[0] for value in values[1:]: result = self.operation(result, value) (2) return result def operation(self, value_1, value_2): (3) return value_1 + value_2 class Multiplier(Adder): def operation(self, value_1, value_2): (4) return value_1 * value_2 data = [1, 2, 3, 4] adder = Adder() print("sum of", data, "is", adder.accumulate(data)) multiplier = Multiplier() print("product of", data, "is", multiplier.accumulate(data))
1 accumulate adds up given numbers based on the method operation.
2 accumulate uses the operation method.
3 operation just adds two numbers.
4 The Multiplier class reuses the accumulate method but operation implements multiplication of two numbers.

And this is the output:

1 2 3
$ python template_method_pattern.py sum of [1, 2, 3, 4] is 10 product of [1, 2, 3, 4] is 24

This is a very simple example for an algorithm which is open for extension while the core code stays closed for modification.

Benefits
  • No "monster classes" with if-elif-else constructs to switch between algorithm variants.

  • Instead of scattering implementation details of a variant over multiple methods in a single class, these details can be grouped within sub classes. This supports the idea of avoiding divergent changes.

The code above would fail if we pass other data than lists of tuples of numbers or if this collection is empty (see line 5). A better implementation would check data first. See also defensive programming.
Abstract base classes

An abstract base class is a class which does not implement all methods required to make it work. Here we could implement a base class Accumulator with method accumulate but without the method operation. Instead we would implement base classes Adder and Multiplier offering this missing method.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
# abstract_base_class.py class AbstractAccumulator: def accumulate(self, values): result = values[0] for value in values[1:]: result = self.operation(result, value) return result def operation(self, value_1, value_2): (1) raise NotImplementedError() class Adder(AbstractAccumulator): def operation(self, value_1, value_2): (2) return value_1 + value_2 class Multiplier(AbstractAccumulator): def operation(self, value_1, value_2): (4) return value_1 * value_2 data = [1, 2, 3, 4] adder = Adder() print("sum of", data, "is", adder.accumulate(data)) multiplier = Multiplier() print("product of", data, "is", multiplier.accumulate(data))
1 We avoid using the abstract base class directly. A better, but longer example would use the abc module from the standard library.

11.6.2. Strategy pattern

In contrast to the template method pattern we configure the behaviour of a class by passing objects to the initalizer:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# strategy_pattern.py class Accumulator: def __init__(self, operation): self._operation = operation (2) def accumulate(self, values): result = values[0] for value in values[1:]: result = self._operation(result, value) (3) return result data = [1, 2, 3, 4] def add(a, b): return a + b adder = Accumulator(add) (1) print("sum of", data, "is", adder.accumulate(data)) def multiply(a, b): return a * b multiplier = Accumulator(multiply) (4) print("product of", data, "is", multiplier.accumulate(data))
1 To configure the object We pass the function add to the initializer.
2 We set this function as a private attribute.
3 Within acuumulate we use this attribute, in this example to sum the given values.
4 We can re use the same class but with a different behaviour by providing a different function.

This is a simple example using functions, in more complex situations we would pass one or also more objects with specified methods to __init__.

Strategy pattern outside object oriented programming

The basic idea of the strategy pattern can be found in numerical algorithms without requiring object oriented programming.

The following function approx_zero implements the bisection method to find a zero of a given function:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# bisect.py def approx_zero(f, a, b, eps=1e-8): assert f(a) * f(b) < 0 (2) while abs(a - b) > eps: middle = (a + b) / 2 if f(middle) * f(a) > 0: (3) a = middle else: b = middle return (a + b) / 2.0 def polynome(x): return x ** 2 - 2 sqrt2 = approx_zero(polynome, 1, 2) (1) print("zero is", sqrt2, "with residual", polynome(sqrt2))
1 We want to approximate the square root of 2 by approximating the zero of the given function polynome. We know that the searched for value lies between 1 and 2.
2 To ensure that the algorithm works we must ensure that f takes different signs for a and b.
3 This checks if f(middle) and f(a) have the same sign.
1 2
$ python bisect.py zero is 1.414213564246893 with residual 5.29990096254096e-09

This was not possible in older programming languages such as Fortran 77, where the function name was hard coded in the bisection method implementation and reusing the algorithm in different places was error prone.

12. Basics of Code optimization

12.1. An example

The following function count_matches takes two lists reference and data and counts how many elements of reference occur also in data. This is a simple and slightly inrealistic example, but good enough to explain a few fundamental principles of code performance and optimization.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# time_measurement.py import random import time def count_matches(reference, data): """counts how many items in data are in present in reference""" matches = 0 for value in reference: if value in data: matches += 1 return matches for n in (2000, 4000, 8000, 16000): reference = [random.randint(0, n) for _ in range(n)(1) data = [random.randint(0, 2 * n) for _ in range(n)] started = time.time() (2) count_matches(reference, data) needed = time.time() - started (3) print("n = {:5d} time: {:5.2f} seconds".format(n, needed))
1 random.randint(0, n) creates a pseudo random number in the range 0 to n (limits included).
2 time.time() returns the number of seconds (including figures after the decimal point) since 1st of January 1970 (this date is also called the Unix epoch).
3 Calling time.time() and subtracting the value of started we get the observed run time for executing count_matches.

Timing measuremens in this chapter may vary, so don’t expect to get exactly the same output here and in following examples:

1 2 3 4 5
$ python time_measurement.py n = 2000 time: 0.07 seconds n = 4000 time: 0.28 seconds n = 8000 time: 1.12 seconds n = 16000 time: 4.49 seconds

We observe that every time we double the size, the overall runtime approximately increases by a factor of four. This is the same as saying that the runtime is proportional to n2.

A simple calculation shows that n = 250000 would result in about 10 minutes processing time, running the code with n = 610000 would need about one hour to finish !
Profiling

Tools called profilers provide means to understand how the overall runtime of a program is distributed over functions and individual lines of code. At the time of writing this book, the Python package line_profiler is our preferred profiling tool. measurement.

To analyze code with line_profiler we first have to install it using pip install line_profiler.[1]

Next we decorate the function we want to inspect with @profile:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# profiling.py import random import time @profile def count_matches(reference, data): matches = 0 for value in reference: if value in data: matches += 1 return matches n = 2000 reference = [random.randint(0, n) for _ in range(n)] data = [random.randint(0, 2 * n) for _ in range(n)] count_matches(reference, data)

Installing line_profiler also installs a command line tool called kernprof which we use as follows: Unresolved directive in chapter_12_basics_of_code_optimization/main.adoc - include::run_kernprof_output.adoc[]

The output shows us:

  • Starting from line 13 we see measuerments per code line of the decorated function.

  • Column Hits counts how often the actual line was executed.

  • Column Time is the overall time spent executing this line in µs.

  • Column Per Hit is the average execution time of this line in µs.

  • Column % Time is the fraction of the overall time running the function.

So we see that about 98% of the runtime is spent in if value in data !

Analysis

Checking if an item is element of a list is pretty slow. The Python interpreter has in the worst case to iterate through the full list to do this check. So on average we can expect that the runtime of this line is proportional to the lenght of the list which is n. We further see that the specific line is executed n times which supports our observation that the runtime is proportional to n2.

How to fix this ?

The appropriate and optimized data type for checking membership is a set and not a list. For sets such an operations run time is on average a constant independent of n.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# profiling_optimized.py import random import time @profile def count_matches(reference, data): matches = 0 data = set(data) (1) for value in reference: if value in data: matches += 1 return matches n = 2000 reference = [random.randint(0, n) for _ in range(n)] data = [random.randint(0, 2 * n) for _ in range(n)] count_matches(reference, data)
1 This is the modification. Unresolved directive in chapter_12_basics_of_code_optimization/main.adoc - include::run_kernprof_optimized_output.adoc[]

12.2. Some theory

Example:

O(n) etc O(n2):: We say a program has runtime O(n2) if the overall runtime is proportional to n2 plus some terms of lower order, e.g. n.

io bound vs cpu bound

microoptimzations: mesaure multiple times, see python std library

12.3. Python dictionaries

13. Misc

13.1. Floating point basics

13.2. Python code as configuration

13.3. Poor mans data management

Glossary

select is not broken

…​..

bike shedding

…​..

pythonic

…​.

syntactic sugar

…​.

Profiler

..

Pseudo Random

..

References

  • [Hunt 1997] Andy Hunt & Dave Thomas. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley. 1999.

  • [2] Erich Gamma, Richard Helm, Ralph Johnson & John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. 1994.

14. Some Asciidoctor Examples

14.1. Example 1

Rubies are red, Topazes are blue. My eye is black abc

14.2. Example 2

this is a sidebar

this is another sidebar

This is a note block with complex content

A list
  • item

  • item

  • item

This is anote asdfl ja ljd ldakfsj dalfjdlkjfaljdflöja adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj
Single line tip
Don’t forget…​
Watch out for…​
Ensure that…​

14.3. Quote

Four score and seven years ago our fathers brought forth on this continent a new nation…​

14.4. TODO

  • asciidoc comdline + file observer + liver reload im browser, themes ? pygments

    • subitem

  • grobsturktur aufsetzen

  • in gitlab einchecken

14.5. With numbers

  1. asciidoc comdline + file observer + liver reload im browser, themes ? pygments

    1. subitem

  2. grobsturktur aufsetzen

  3. in gitlab einchecken

14.6. Source code listing

Code listings look cool with Asciidoctor and pygments.

1 2 3 4 5 6 7
#!/usr/bin/env python import antigravity try: antigravity.fly() (1) except FlytimeError as e: (2) # um...not sure what to do now. pass
1 does not work
2 this is a fake exception

14.7. In-document refernes

A statement with a footnote.[2]

bla bla bla
bla bla bla
bla bla bla
bla bla bla

14.8. Other formatting

indented by one line in adoc file
indented by one line in adoc file

bold italic

This word is in backticks.

“Well the H2O formula written on their whiteboard could be part of a shopping list, but I don’t think the local bodega sells E=mc2,” Lazarus replied.

Werewolves are allergic to cinnamon.

14.9. Tables

Table 1. An example table
Col 1 Col 2 Col 3

6

Three items

d

1

Item 1

a

2

Item 2

b

3

Item 3

c

14.10. Images

This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.

14.11. Quotes

  • Quotes:

    It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness…​
    — Charles Dickens
    A Tale of Two Cities

End of quote

14.12. Chapter with some references

The Pragmatic Programmer [Hunt 1997] should be required reading for all developers. To learn all about design patterns, refer to the book by the “Gang of Four” [2].


1. In case you are still using Python 2.7, first run pip install 'ipython<6', else installing line_profiler will fail.
2. Clarification about this statement.