1. Preface
address reader with you.
more and more people code, learn basics of progamming language, focussed on solving actual problems, no / little education in fields as computer science and software enngineering.
our history
existing books: target programmers with working experience, principles of object oriented programming must be understood, technical language, too many aspects which are not required for scientific programmers
Main goal: reduce work, robustness, reusability
2. About this book
Main goal: reduce work, robustness, reusability
Gain control over the process.
Why should scientists care about professional software development ? (reproducability crisis, publications with code, reusability, clear process of distribution and reuse.
Techincal dept, speed over time
2.1. On the shoulders of giants
Clean Code / Pragmatic Programmer / GOF / …
existing books: target programmers with working experience, principles of object oriented programming must be understood, technical language, too many aspects which are not required for scientific programmers
2.2. Motivation, Why this book ?
filtered introductions into several aspects of professional software development. books intention is to provide good basics and introductions so that interested programmers can deepen their knowledge on their own.
links to advanced books provided.
2.3. How to read this book
chapters / sections are inteded to be short so that they can be consumed as bites. Book should be read several times, only reading is not sufficient, pratice is king.
2.4. But this so much to learn…
First focus on solving problems, Learn one technique until automized, then proceed.
2.5. Why learning to code is so hard
constant learning: reserve some time for reading blogs, books, python standard library, tutorials
2.6. when to switch from prototype to propper development
3. The programming process
3.1. Meta thinking
The answer is that they paid attention to what they were doing while they were doing it — and then they tried to do it better.
Main thread of thinking is solving the problem. But also think about what you do when programming.
To consider:
-
Is this the only solution ? Can I find another approach which might be better ?
-
What could go wrong if this code runs ?
-
Will I understand my code when I work on it in a few weeks / months ?
-
Will others be able to work on this code without my assistance ?
Numpy is not broken
3.2. Programming by incidence / (better coincidence ?)
program works but you dont know why, piles up and finally you cant fix it. not tested, depends on environment, no clear contracts, no decoupling
dont ignore warnings
benefit: improves programming skills !
3.3. Fail early
Also in development process: be brave to thrash your first approach if you
3.4. Have a plan, what is a prototype, when to prototype
3.5. The broken window principle
3.6. Version control
3.7. Testing Harness
Test Your Software, or Your Users Will
3.8. Mindset
making mistakes is crutial part of programming process, see it as a challenge and part of your usual work if debugging: realy understand what goes wrong, avoid fixing by incidence
3.9. Benefit of small changes
3.10. What is refactoring ?
3.11. Learn to use an Debugger / IDE, Know your tools.
also: learn some bash, use on IDE and learn it, learn to use a debugger.
3.12. Rubber ducking
4. General principles and techniques
4.1. DRY
4.2. KISS / YAGNI
4.3. Avoid divergent changes
4.4. Design by contract / modularization / decoupling / orthogonalization
Example: encapsulation of code from SO. Changes should be localized.
4.5. Why global variables are bad
bla bla bla
4.6. Defensive programming
Others will not use your code as you intended. Using asserts
4.7. The power of abstraction
Binary search ?
4.8. Trace bullets
5. Clean code
example: obfuscated hello world.
"programming style as configuration"
You read code more often than you write it.
Code communicates and shares ideas and concepts.
separate overall idea from impelemntation details
5.1. How does good code look like
-
Little scrolling needed
-
Minimal level of identation
-
Clear naming
-
Directly undertandable
-
No suprises
-
Brief, important parts stand out.
5.2. General comments about code layout
About and why PEP 8, max 100 chars per line, snake case, camel case
About bike shedding, style wars, there is no optimal style.
spaces around operators
space after comma.
Use empty lines to group lines
The code style recommendations of PEP 8 are about readability in code editors usually showing at least 25 lines per page. As the layout of a book is not the same, and as we have text refering to code lines, we often do not follow PEP 8 recommendations and use only one empty line to separate functions and classes from other code. |
Python examples with brackets, strings, backslash
About Python PEPs |
DRY
5.3. Names
Avoids clutter, duplicate changes of code and comments.
def compute(a, b):
return a + b
i
, j
, k
for loops, n
for number.
no puns. pronouncable names, ..
Some examples here
Example
Example
Benefits and examples.
example
example
clarifies understanding
5.4. Comments
comments can become lies !
Examples, Add links to SO solutions in comments.
Example
Example
Example including exceptions and assertions !
Example
Why ?, Version control !
5.5. Minimize identation
More later
Example
also helps to break out of two or more nested loops
Example
5.6. Exceptions
Instead return codes don’t catch Exception unless you have good reasons.
5.7. Code smells
bad code needs many comments.
dictionaries in Python can replace long if/else chains
5.8. Functions
example
example with comments for sections
examples
nametuples, dicts, parameter objects
example: distance of two 3d points or: tuples for nd-points
exampl
use classes if needed
cleanup your code, broken windows principle, git manages it
use default values for arguemnts, use named argments when calling if list is still long.
extra: no mutables as default values
5.9. Consistency
consistent naming, consistant error handling vs None.
6. Git introduction
git is a distributed version control system.
birth in 2005 from linux developer community.
not all explanations here are 100% exact, but provide basic abstraction to understand git.
only basics, local git without branches is still better than not git.
6.1. Install git
windows vs linux vs mac
6.2. basics
patch management tool, commit, history, staging area
no huge binary files (git annex, git lsf instead).
6.3. First time setup
plus: line ending setup for windows vs unix
6.4. git init
edit .gitignore
everything else from handson, plus:
-
git rm
-
rename / move files
-
git commit --amend
7. Automated code testing
Benefits of testing, how does programming without tests look like, personal experiences.
from harrys book:
I became scared of making changes to my code. I was no longer sure what depended on what, and what might happen if I changed this code over here, oh gosh, I think that bit over there inherits from it—no, it doesn’t, it’s overriden. Oh, but it depends on that class variable. Right, well, as long as I override the override it should be fine. I’ll just check—but checking was getting much harder. There were lots of sections to the site now, and clicking through them all manually was starting to get impractical. Better to leave well enough alone, forget refactoring, just make do.
Soon I had a hideous, ugly mess of code. New development became painful.
minimal example for a unit test
7.1. Kinds of tests
bla bla
bla bla
ensure code runs independent of development machine, can run longer
7.2. Test runners
-
Test discovery
-
Test reporting
-
Good diagnostic messages
7.3. Principles of unit tests
-
Independent
-
Order does not matter
-
fixtures
7.4. What to test
-
expected behaviour
-
corner cases
-
error cases / error handling
7.5. Test coverage
example
7.6. Testable code is good code
small functions benefits of small functions, execution paths, …
GUI, …
refactoring example
7.7. Regression tests
Example
7.8. Testing strategies
tests .How to existing code unter tests.
8. The Python standard library
8.1. General overview
8.2. Helpful modules
blafase
blafase
blafase
blafase
blafase
blafase
blafasel
blafasek
blafasel
math.fsum, hypot, atan2, …
blafasel
9. The scientific Python stack
9.1. conda introduction
9.2. virtual envs
9.3. numpy
…
9.4. scipy
…
9.5. pandas
…
9.6. matplotlib
…
9.7. scikit learn
…
9.8. snakemake
…
10. Practical setup of Python Projects
-
which python version to use ?
-
python2 vs python 3
-
minmal setup for python 2 compatibility
10.1. virtualenvs
10.2. packaging, cookie cutter
incl. upload to pypi with twine
10.3. pytest
10.4. tox
10.5. sphinx
10.6. gitlab-ci / travis
10.7. code linters
TODO: Python pitfalls
11. Object oriented Programming
11.1. What is it about ?
Every variable we use in Python refers to an object. Let it be a number, a string, a list or a numpy array. Also modules and functions are objects.
A class is a recipe or template for creating an object. For example Python
offers one class An object is also called an instance of a class. Object oriented programming refers to a programming style solving a programming problem by implementing own classes. |
In the 90s object oriented programming was considered as the silver bullet to implement reusable code. Thus some programming languages such as JAVA require that you implement everything, even a program which only prints your name, as a class. Python and other programming languages are more flexible and the use of classes is not always the best solution for a programming problem.
11.1.1. Inspection of the attributes of a class.
Methods of Python objects are also attributes. The only difference is that
methods can be called like functions. To inspect the available attributes,
Python offers the dir
function:
1
2
3
4
5
6
7
8
9 # dir_example.py
def add(x, y):
return x + y
data = {1 : 2}
print("methods of add", dir(add))
print("methods of data", dir(data))
The output shows
1
2
3 $ python dir_example.py
methods of add ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
methods of data ['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
All methods of the function add
start and end with double underscores __
which are internal methods not indented for general use. But if you look at the
methods of data
you will recognize some known methods.
11.1.2. Encapsulation
Usually an object holds some data and offers methods to work with this data. The internal data is also called internal state of the object.
Objects allows grouping data and operations on this data in one place. Another advantage is that you can use classes and objects without caring about their internal details. Instead a good implemented class offers a simple interface to the user and thus supports reusability. Such an interface is also called an
API which is an abbreviation for Application Programming Interface. A class API is a specification of the methods of a class including their functionality and description of arguments and return values. The code using such an API is also named client code. |
An API offers an abstract layer for the user of a class. Client code does not need to be changed if internal details of the class change.
A Python dictionary allows handling of a mapping data structure without caring about the internal details how this data is organized. Skilled computer scientists implemented the details and took care that lookup time is very fast without bothering you with such details. If computer scientists develop better algorithms for handling dictionaries these could be implemented in future Python versions without demanding changes in existing code.
Your code benefits from using classes in following cases:
-
You have a set of functions which either pass complex data structures around or your functions and their argument lists would become simpler by using global variables (see also Why global variables are bad). Classes offer a good solution in this situation.
-
You want to implement your own data types.
-
You work with data records, e.g. postal address and address books. An address could be implemented as a class, and the address book would be a class to manage the addresses. In this situation a simple solution would be to resort to tuples or [namedtuple]s for the records and a list for the address book, but a class based implementation would hide implementation details.
-
You want to encapsulate complex operations and offer an easy to use API.
Your address record object could offer attributes like first_name
, second_name
,
street
, street_number
, zip_code
, city
and country
. The address book object
could implement methods like add
, find_by_name
, delete
and save_to_file
.
Communication with a web service, for example a web service providing weather data,
consists of many technical details. A class offering methods like
current_temperature_in(city_name)
can be used to built up tools for global
weather analysis focussing on the details of the analysis. Switching to another
web service with faster response time or more accurate data would only affect
the internals of the class leaving the client code untouched.
11.2. Introduction to Python classes
The next sections will teach you the basics of how to implement classes in Python.
11.2.1. Our first class
1
2
3
4
5
6
7
8 # __file_ _
class Greeter: (1)
def say_hello(self): (2)
print("hi you")
g = Greeter() (3)
g.say_hello() (4)
1 | the class statement followed by the name of the class starts the definition of the class,
we indent every declaration related to the class, for example declaration of methods. |
2 | methods are declared similar to funtions, ignore the self , we explain this later. |
3 | we create an instance of class Greeter |
4 | we call the method say_hello of g , we don’t specify the self argument from the definition. |
And this is the output:
1
2 $ python greeter.py
hi you
11.2.2. What is self
?
Next we investigate what self
is:
1
2
3
4
5
6
7
8
9
10
11
12
13 # what_is_self.py
class WhatIsSelf:
def print_self(self):
print("self is", self)
u = WhatIsSelf()
print("u is", u)
u.print_self()
v = WhatIsSelf()
print("v is", v)
v.print_self()
And this is the generated output:
1
2
3
4
5 $ python what_is_self.py
u is <__main__.WhatIsSelf object at 0x7f3a6e986a20>
self is <__main__.WhatIsSelf object at 0x7f3a6e986a20>
v is <__main__.WhatIsSelf object at 0x7f3a6e986be0>
self is <__main__.WhatIsSelf object at 0x7f3a6e986be0>
Lines 2
to 5
: the values after at
are the hexadecimal representations of the memory location
of the corresponding object.
Comparing the code and the created output we conclude:
Insights
|
11.2.3. Initializing an object
Python special methods start and end with two underscores __
and have a predefined function.
Such methods are also called dunder methods (abbreviation for double underscore methods).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 # point_2d.py
class Point2D:
def __init__(self, x, y): (1)
self.x = x
self.y = y
def move(self, delta_x, delta_y):
self.x += delta_x
self.y += delta_y
if __name__ == "__main__":
p = Point2D(1, 2) (2)
print(p.x, p.y) (3)
p.move(41, 21)
print(p.x, p.y)
1 | The __init__ method initializes an object. Here we set x and y as attributes of
the current instance of the class. |
2 | We pass values 1 and 2 as arguments to __init__ |
3 | Based on the implementation of __init__ both values are accessible as attributes of the object. |
And this is the output of the program:
1
2
3 $ python point_2d.py
1 2
42 23
11.3. Inheritance
We can use existing classes to create new classes based on existing classes. This is called inheritance. If we inherit from a given class we inherit its methods and can add new methods or overwrite existing ones.
The class we inherit from is called base class, classes inheriting from a base class are called sub classes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 # vector_2d.py
import math
from point_2d import Point2D
class Vector2D(Point2D): (1)
def length(self):
return math.hypot(self.x, self.y)
def add(self, other):
result = Vector2D(self.x, self.y)
result.move(other.x, other.y) (2)
return result
if __name__ == "__main__":
v1 = Vector2D(3, 4) (3)
v2 = v1.add(v1)
print(v1.length())
print(v2.x, v2.y)
1 | The declaration class Vector2D(Point2D): defines a new class Vector2D starting with
the methods of the existing class Point2D . Here we add two extra methods length and add . |
2 | As we inherit the method move from the base class Point2D we can use the move method. |
3 | Works because we inherit the dunder method __init__ from Point2D . |
1
2
3 $ python vector_2d.py
5.0
6 8
11.4. Special methods *
Python offers more dunder methods than __init__
. These support
syntactic sugar to make client code look more
pythonic. This means such methods are not required for object
oriented programming, but can help to offer a more convenient API.
Here we show a few examples:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 # vector_2d_fancy.py
import math
from point_2d import Point2D
class Vector2D(Point2D):
def length(self):
return math.hypot(self.x, self.y)
def __str__(self): (1)
return "Vector2D({}, {})".format(self.x, self.y)
def __add__(self, other): (2)
result = Vector2D(self.x, self.y)
result.move(other.x, other.y)
return result
def __getitem__(self, index): (3)
if index not in (0, 1):
raise IndexError("index {} must be 0 or 1".format(index))
return (self.x, self.y)[index]
if __name__ == "__main__":
v1 = Vector2D(3, 4)
v2 = Vector2D(4, 5)
print(v1, v2) (4)
v3 = v1 + v2 (5)
print(v3[0], v3[1]) (6)
1 | __str__ implements string conversion for instances of the class.
This happens if we call str(v1) or if we print an object. See also comment for line 26. |
2 | __add__ implements + for an object of the class. See also comment for line 27. Python lists
and tuples also implement this method. |
3 | __getitem__ implements indexed access using square brackets. See also comment
for line 28. Python strings, lists, tuples and dictionaries also implement this method. |
4 | print tries to convert the argument to a string if possible, thus calls
__str__ in this case. You see the result in the output below. |
5 | Is the same as v3 = v1.__add__(v2) . |
6 | v3[0] is the same as v3.__getitem__[0] . |
Compare the output to the code above and check the previous explanations again:
1
2
3 $ python vector_2d_fancy.py
Vector2D(3, 4) Vector2D(4, 5)
7 9
11.5. Private methods
Sometimes we implement methods which are used from within other methods of the class but are not intended to be used by client code. Such methods are called private methods. Programming languages such as JAVA or C++ provide syntax to mark methods are private.
Python lacks supporting syntax, instead most Python programmers mark such methods by
preceding the name with a single underscore _
. This serves as a signal for clients
"don’t call me, and if you do, don’t expect that this works as expected".
Implementing methods with names starting but not ending with double underscores have a different purpose and should be avoided, unless you exactly understand what you do. Nevertheless you will find bad examples using such names on the web.
11.6. Open/Closed Principle
In object-oriented programming, the open/closed principle states "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification"; that is, such an entity can allow its behaviour to be extended without modifying its source code.
This might sound theoretical and slightly vague, so we better go on with some examples.
11.6.1. Template Method Pattern
The so called template method pattern is a strategy to support the open/closed principle. The basic idea is to implement modifications of a base "operation" by means of sub classes.
The following base class Adder
adds numbers of a given
list or tuple of numbers, the sub class Multiplier
implements a variant:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 # template_method_pattern.py
class Adder:
def accumulate(self, values): (1)
result = values[0]
for value in values[1:]:
result = self.operation(result, value) (2)
return result
def operation(self, value_1, value_2): (3)
return value_1 + value_2
class Multiplier(Adder):
def operation(self, value_1, value_2): (4)
return value_1 * value_2
data = [1, 2, 3, 4]
adder = Adder()
print("sum of", data, "is", adder.accumulate(data))
multiplier = Multiplier()
print("product of", data, "is", multiplier.accumulate(data))
1 | accumulate adds up given numbers based on the method operation . |
2 | accumulate uses the operation method. |
3 | operation just adds two numbers. |
4 | The Multiplier class reuses the accumulate method but operation implements
multiplication of two numbers. |
And this is the output:
1
2
3 $ python template_method_pattern.py
sum of [1, 2, 3, 4] is 10
product of [1, 2, 3, 4] is 24
This is a very simple example for an algorithm which is open for extension while the core code stays closed for modification.
-
No "monster classes" with
if
-elif
-else
constructs to switch between algorithm variants. -
Instead of scattering implementation details of a variant over multiple methods in a single class, these details can be grouped within sub classes. This supports the idea of avoiding divergent changes.
The code above would fail if we pass other data than lists of tuples of numbers or if this collection is empty (see line 5). A better implementation would check data first. See also defensive programming. |
An abstract base class is a class which does not implement all methods required
to make it work. Here we could implement a base class Accumulator
with method
accumulate
but without the method operation
. Instead we would implement
base classes Adder
and Multiplier
offering this missing method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 # abstract_base_class.py
class AbstractAccumulator:
def accumulate(self, values):
result = values[0]
for value in values[1:]:
result = self.operation(result, value)
return result
def operation(self, value_1, value_2): (1)
raise NotImplementedError()
class Adder(AbstractAccumulator):
def operation(self, value_1, value_2): (2)
return value_1 + value_2
class Multiplier(AbstractAccumulator):
def operation(self, value_1, value_2): (4)
return value_1 * value_2
data = [1, 2, 3, 4]
adder = Adder()
print("sum of", data, "is", adder.accumulate(data))
multiplier = Multiplier()
print("product of", data, "is", multiplier.accumulate(data))
1 | We avoid using the abstract base class directly. A better, but longer example would use the abc module from the standard library. |
11.6.2. Strategy pattern
In contrast to the template method pattern we configure the behaviour of a class by passing objects to the initalizer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 # strategy_pattern.py
class Accumulator:
def __init__(self, operation):
self._operation = operation (2)
def accumulate(self, values):
result = values[0]
for value in values[1:]:
result = self._operation(result, value) (3)
return result
data = [1, 2, 3, 4]
def add(a, b):
return a + b
adder = Accumulator(add) (1)
print("sum of", data, "is", adder.accumulate(data))
def multiply(a, b):
return a * b
multiplier = Accumulator(multiply) (4)
print("product of", data, "is", multiplier.accumulate(data))
1 | To configure the object We pass the function add to the initializer. |
2 | We set this function as a private attribute. |
3 | Within acuumulate we use this attribute, in this example to sum the given values. |
4 | We can re use the same class but with a different behaviour by providing a different function. |
This is a simple example using functions, in more complex situations we would pass
one or also more objects with specified methods to __init__
.
The basic idea of the strategy pattern can be found in numerical algorithms without requiring object oriented programming.
The following function approx_zero
implements the
bisection method
to find a zero of a given function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 # bisect.py
def approx_zero(f, a, b, eps=1e-8):
assert f(a) * f(b) < 0 (2)
while abs(a - b) > eps:
middle = (a + b) / 2
if f(middle) * f(a) > 0: (3)
a = middle
else:
b = middle
return (a + b) / 2.0
def polynome(x):
return x ** 2 - 2
sqrt2 = approx_zero(polynome, 1, 2) (1)
print("zero is", sqrt2, "with residual", polynome(sqrt2))
1 | We want to approximate the square root of 2 by approximating the zero
of the given function polynome . We know that the searched for value lies between
1 and 2 . |
2 | To ensure that the algorithm works we must ensure that f takes different
signs for a and b . |
3 | This checks if f(middle) and f(a) have the same sign. |
1
2 $ python bisect.py
zero is 1.414213564246893 with residual 5.29990096254096e-09
This was not possible in older programming languages such as Fortran 77, where the function name was hard coded in the bisection method implementation and reusing the algorithm in different places was error prone.
12. Basics of Code optimization
12.1. An example
The following function count_matches
takes two lists reference
and data
and counts
how many elements of reference
occur also in data
. This is a simple and slightly
inrealistic example, but good enough to explain a few fundamental principles of code
performance and optimization.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 # time_measurement.py
import random
import time
def count_matches(reference, data):
"""counts how many items in data are in present in reference"""
matches = 0
for value in reference:
if value in data:
matches += 1
return matches
for n in (2000, 4000, 8000, 16000):
reference = [random.randint(0, n) for _ in range(n)(1)
data = [random.randint(0, 2 * n) for _ in range(n)]
started = time.time() (2)
count_matches(reference, data)
needed = time.time() - started (3)
print("n = {:5d} time: {:5.2f} seconds".format(n, needed))
1 | random.randint(0, n) creates a pseudo random number in the range
0 to n (limits included). |
2 | time.time() returns the number of seconds (including figures after the decimal point)
since 1st of January 1970 (this date is also called the Unix epoch). |
3 | Calling time.time() and subtracting the value of started we get the observed
run time for executing count_matches . |
Timing measuremens in this chapter may vary, so don’t expect to get exactly the same output here and in following examples:
1
2
3
4
5 $ python time_measurement.py
n = 2000 time: 0.07 seconds
n = 4000 time: 0.28 seconds
n = 8000 time: 1.12 seconds
n = 16000 time: 4.49 seconds
We observe that every time we double the size, the overall runtime approximately increases by a factor of four. This is the same as saying that the runtime is proportional to n2.
A simple calculation shows that n = 250000 would result in about
10 minutes processing time, running the code with n = 610000 would need about one hour to
finish !
|
Tools called profilers provide means to understand how the overall runtime of a program is distributed over functions and individual lines of code. At the time of writing this book, the Python package line_profiler is our preferred profiling tool. measurement.
To analyze code with line_profiler
we first have to install it using
pip install line_profiler
.[1]
Next we decorate the function we want to inspect with @profile
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 # profiling.py
import random
import time
@profile
def count_matches(reference, data):
matches = 0
for value in reference:
if value in data:
matches += 1
return matches
n = 2000
reference = [random.randint(0, n) for _ in range(n)]
data = [random.randint(0, 2 * n) for _ in range(n)]
count_matches(reference, data)
Installing line_profiler
also installs a command line tool called kernprof
which
we use as follows:
Unresolved directive in chapter_12_basics_of_code_optimization/main.adoc - include::run_kernprof_output.adoc[]
The output shows us:
-
Starting from line 13 we see measuerments per code line of the decorated function.
-
Column
Hits
counts how often the actual line was executed. -
Column
Time
is the overall time spent executing this line in µs. -
Column
Per Hit
is the average execution time of this line in µs. -
Column
% Time
is the fraction of the overall time running the function.
So we see that about 98%
of the runtime is spent in if value in data
!
Checking if an item is element of a list is pretty slow.
The Python interpreter has in the worst case to iterate through the full list to
do this check. So on average we can expect that the runtime of this line is proportional
to the lenght of the list which is n
. We further see that the specific line
is executed n
times which supports our observation that the runtime is
proportional to n2
.
The appropriate and optimized data type for checking membership is a set
and not
a list
. For sets such an operations run time is on average a constant
independent of n
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 # profiling_optimized.py
import random
import time
@profile
def count_matches(reference, data):
matches = 0
data = set(data) (1)
for value in reference:
if value in data:
matches += 1
return matches
n = 2000
reference = [random.randint(0, n) for _ in range(n)]
data = [random.randint(0, 2 * n) for _ in range(n)]
count_matches(reference, data)
1 | This is the modification. Unresolved directive in chapter_12_basics_of_code_optimization/main.adoc - include::run_kernprof_optimized_output.adoc[] |
12.2. Some theory
Example:
O(n) etc O(n2):: We say a program has runtime O(n2) if the overall runtime is proportional to n2 plus some terms of lower order, e.g. n.
io bound vs cpu bound
microoptimzations: mesaure multiple times, see python std library
12.3. Python dictionaries
13. Misc
13.1. Floating point basics
13.2. Python code as configuration
13.3. Poor mans data management
Glossary
- select is not broken
-
…..
- bike shedding
-
…..
- pythonic
-
….
- syntactic sugar
-
….
- Profiler
-
..
- Pseudo Random
-
..
References
14. Some Asciidoctor Examples
14.1. Example 1
Rubies are red, Topazes are blue. My eye is black abc
14.2. Example 2
This is a note block with complex content A list
|
This is anote asdfl ja ljd ldakfsj dalfjdlkjfaljdflöja adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj adsflkja dfl kjads öljdf ölajdf öldfjöldaf jadlf jadlfj dlfj |
Single line tip |
Don’t forget… |
Watch out for… |
Ensure that… |
14.3. Quote
Four score and seven years ago our fathers brought forth on this continent a new nation…
14.4. TODO
-
asciidoc comdline + file observer + liver reload im browser, themes ? pygments
-
subitem
-
-
grobsturktur aufsetzen
-
in gitlab einchecken
14.5. With numbers
-
asciidoc comdline + file observer + liver reload im browser, themes ? pygments
-
subitem
-
-
grobsturktur aufsetzen
-
in gitlab einchecken
14.6. Source code listing
Code listings look cool with Asciidoctor and pygments.
1
2
3
4
5
6
7 #!/usr/bin/env python
import antigravity
try:
antigravity.fly() (1)
except FlytimeError as e: (2)
# um...not sure what to do now.
pass
1 | does not work |
2 | this is a fake exception |
14.7. In-document refernes
A statement with a footnote.[2]
bla bla bla
bla bla bla
bla bla bla
bla bla bla
Go to line with footnote !
Go to previous Python example !
14.8. Other formatting
indented by one line in adoc file indented by one line in adoc file
bold italic
This word
is in backticks.
“Well the H2O formula written on their whiteboard could be part of a shopping list, but I don’t think the local bodega sells E=mc2,” Lazarus replied.
Werewolves are allergic to cinnamon.
14.9. Tables
Col 1 | Col 2 | Col 3 |
---|---|---|
6 |
Three items |
d |
1 |
Item 1 |
a |
2 |
Item 2 |
b |
3 |
Item 3 |
c |
Goto An example table.
14.10. Images
This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.
This should be some floating text.
14.11. Quotes
-
Quotes:
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness…
— Charles Dickens
A Tale of Two Cities
End of quote
14.12. Chapter with some references
The Pragmatic Programmer [Hunt 1997] should be required reading for all developers. To learn all about design patterns, refer to the book by the “Gang of Four” [2].