from IPython.core.display import HTML
HTML(open("custom.html", "r").read())
We learned:
for .. in range(..):
if
et alwhile
break
while
loopsWe introduced strings in the second script. Here is a short summary about strings:
"
len
function+
they are concatenated123
and the string "123"
are different things although they result in the same output when printed input
function always returns a string, so we have to use type conversion if we ask the user for numbers which are used in subsequent numerical computations[]
.In case you do not understand all points of the list first repeat the introductions in the second script !
When we said that strings are delimited by double quotes "
we only introduced a part of the truth. We can choose other delimiters which are '
(single quote), """
(three double quotes) and '''
(three single quotes).
The only restriction is that we use the same delimiter on both ends of the string.
This is handy as "
itself is a character and thus may occur as a part of the string.
print('here we have a " in a string')
Using a "
as delimiter in this example would confuse Python because the "
inside the string would be interpreted as a delimiter and the following in a string")
as Python code which is syntactically incorrect:
print("here we have a " in a string")
This works the other way round too:
print("here we have a ' in a string")
There are some "special" characters, which are encoded using the so called escape character \
. So the two characters \n
result in a line break when printed:
print("hi\nyou")
Although we type two characters, \n
is interpreted as a single character, the so called new line character:
print(len("hi\nyou"))
To "see" what a string really "contains", we can use the repr
function:
a = "line 1\nline 2"
print(a)
print(repr(a))
\n
is the most used special character and is the only one we will face in this course.
The short excursion to special characters helps to explain how the delimiters """
and '''
work: In contrast to the one character delimiters they can delimit strings over multiple lines !
sequence = """GCA ATC GCT TTA GGA CCT
GCA ATC GCT TTA GGA CCT"""
print(repr(sequence))
If you look at the output you can see the line break followed by some spaces.
The following snippet is valid Python code with fewer spaces in the multi line string:
sequence = """GCA ATC GCT TTA GGA CCT
GCA ATC GCT TTA GGA CCT"""
print(repr(sequence))
We can compare strings the same way as we did it for numbers:
print("abc" == "ABC")
print("abc" != "ABC")
Using <
for strings works as well. We consider one string to be smaller than another string if we the first string would appear before the second string in a phone book:
print("abcde" < "abcdfg")
This is called lexicographical or phone book ordering.
In this system capital letters are smaller than lower case letters:
print("ABC" <"abc")
Other comparison operators work the same way:
print("abc" >= "ABC")
Attention: Before we go on make sure that you know what argument(s) of a function and return value mean, if not repeat the according section in the second script !
When we introduced the import
statement we said that the imported functions and values are attributes of the module. So cos
and pi
are attributes of the module math
:
import math
print(math.cos(math.pi))
In general when we see expressions like x.y
in Python we say y is an attribute of x.
Not only modules have attributes, most data types in Python have attributes as well ! So the str
type has attributes which can be used like functions. This kind of attribute is called method.
For example the count
method takes one argument and returns an integer value:
sequence = "GCA ATC GCT TTA GGA CCT"
print(sequence.count("G"))
Here we counted the number of occurrences of "G"
in sequence
.
Another example is the upper
method which has zero arguments and computes a new string where all alphabetical characters are converted to upper case letters:
x = "Hi You !"
y = x.upper()
print(y)
If you call a function or method with zero arguments you still have to use ()
to call the function resp. method as we did in the previous example. If you forget this you Python behaves as follows:
print("hi".upper)
This tells you that upper
is a method of the string "hi"
but does not execute the method because we forgot to append ()
for calling this method.
Another helpful method is replace
which takes two arguments and computes a new string:
sequence = "GCA ATC GCT TTA GGA CCT"
print(sequence.replace(" ", "-"))
Here all occurrences of spaces were replaced by -
. This can be used to delete certain characters. In the following code snippet we replaces spaces by empty strings:
sequence = "GCA ATC GCT TTA GGA CCT"
print(sequence.replace(" ", ""))
We can call methods directly on strings:
print("Hi".upper())
And we can assign the results of a method call to variables as usual:
greeting = "Hi You !"
upper_greeting = greeting.upper()
print(upper_greeting)
Further we can chain an arbitrary number of method calls, which are executed in listed order:
print("abcA".upper().count("A"))
The evaluation in the previous example is as follows:
"abcA".upper()
evaluates to "ABCA"
which is a new (intermediate) stringcount("A")
method of this intermediate string is called which evaluates to 2
.Another example is to "clean up" multi line strings:
sequence = """GCA ATC GCT TTA GGA CCT
GCA ATC GCT TTA GGA CCT"""
short_sequence = sequence.replace("\n", "").replace(" ", "")
print(short_sequence)
GC
content we did before so that lower case inputs are handled just as their upper case equivalent and spaces are ignored. The solution should not contain a for
loop anymore.T
, C
, A
and G
. Finally the program prints an appropriate message. (Hint: count the number of A
, T
, G
, and C
symbols in the given sequence. For a correct sequence the sum of these counts is the same as the length of the sequence)GC
content. If the user input is invalid (as implemented in the preceding exercise) first an appropriate message should be printed and then user is asked again. (Tip: infinite loop)Remember: In Python we can access single characters of a string using square brackets. The notation [i]
where i
is zero or a positive integer number (called index) extracts the character at position i
.
Indexing starts with zero, so the first character is accessed with [0]
, the second character with [1]
and so on !
print("abc"[0])
name = "uwe"
print(name[1])
Negative indices count from the end of the string:
print("uwe"[-1])
print("uwe"[-2])
print("uwe"[-3])
You must not use the bracket notation to replace a given character:
seq = "TGCAG"
seq[2] = "?"
To solve this we need so called slicing (like "slicing bread).
The general form of slicing is [n:m]
which computes a substring starting and index n
up to m
(exclusive !):
print("012345"[1:4])
print("012345"[2:-1])
Python knows to abbreviations [:n]
and [m:]
. The first one starts at the beginning, the second one goes until the end:
print("012345"[:3])
print("012345"[4:])
We can use this to replace a character of a given string by computing a new string:
seq = "TGCAG"
seq_new = seq[:2] + "?" + seq[3:]
print(seq)
print(seq_new)
The following program checks if a given string is a palindrome (so if it reads the same forwards and backwards):
txt = "racecar"
found_invalid_pair = False
for i in range(len(txt)):
i_back = len(txt) - i - 1
if txt[i] != txt[i_back]:
found_invalid_pair = True
break
if found_invalid_pair:
print(txt, "is not a palindrome")
else:
print(txt, "is a palindrome")
What does the following statement display ? First use pen and paper then use Python to check your result:
text = "abcdefghijk"
print(text[:2] + text[3:4] < text[0:2] + text[3:len(text)].upper())
Repeat the examples above
Try to understand the palindrome check. It helps to simulate the computer using pen and paper and running the palindrome check for inputs "ABCBA"
and "ABCDA"
.
Why does the program still works without the break
?
Implement an alternative solution by first computing the reverse of the given string, then use ==
to check if both are the same.
Use a for
loop to simulate the count
method: The user provides a string and your program counts the number of spaces in the string. (You need a variable for counting spaces. Initialize it with 0
and increment it for every hit).
Write a program which asks the user for a valid nucleotide sequence and prints all positions of G
followed by C
. The output for input AGCCCGCAGC
should be similar to
found GC starting at position 1
found GC starting at position 5
found GC starting at position 8
Hint: you have to check for every position if the character at that position is G
and if the following character is C
. Use a for
loop but pay attention with the upper limit of the range
function !
if
and friends. (Lookup the definition of "reverse complement" if you don't know what this means).1 ... 100
. (The collatz update rule for a given n
was to compute n//2
for even numbers and 3 * n + 1
for odd numbers until we reach the sentinel 1
).Internally the computer stores characters as numbers in the range 0
to 255
. So A
is stored as 65
and B
as 66
. The numbers are called ASCII code
: (ASCII is an acronym for "American Standard Code for Information Interchange")
32 | 44 , | 56 8 | 68 D | 80 P | 92 \ | 104 h | 116 t
33 ! | 45 - | 57 9 | 69 E | 81 Q | 93 ] | 105 i | 117 u
34 " | 46 . | 58 : | 70 F | 82 R | 94 ^ | 106 j | 118 v
35 # | 47 / | 59 ; | 71 G | 83 S | 95 _ | 107 k | 119 w
36 $ | 48 0 | 60 < | 72 H | 84 T | 96 ` | 108 l | 120 x
37 % | 49 1 | 61 = | 73 I | 85 U | 97 a | 109 m | 121 y
38 & | 50 2 | 62 > | 74 J | 86 V | 98 b | 110 n | 122 z
39 ' | 51 3 | 63 ? | 75 K | 87 W | 99 c | 111 o | 123 {
40 ( | 52 4 | 64 @ | 76 L | 88 X | 100 d | 112 p | 124 |
41 ) | 53 5 | 65 A | 77 M | 89 Y | 101 e | 113 q | 125 }
42 * | 54 6 | 66 B | 78 N | 90 Z | 102 f | 114 r | 126 ~
43 + | 55 7 | 67 C | 79 O | 91 [ | 103 g | 115 s | 127
Comment: You remember how strings are ordered in Python ? A
is considered to be smaller as a
because the corresponding number codes have this ordering !
Python provides two functions for conversion from number to character and vice versa:
ord
computes the ASCII code from a given character:
print(ord("A"))
And chr
computes a character from a given code:
print(chr(66))
This can be used to transform strings. The following example transforms letters in a string to their uppercase equivalents:
name = "uwe"
new_name = ""
for i in range(len(name)):
c = name[i]
code = ord(c)
if 97 <= code <= 122:
code = code - 32
new_name = new_name + chr(code)
print(new_name)
This is an educational example, usually we would use the .upper
method we learned already.
Again we see the pattern: we start with an empty string and assemble the result during the iterations.
A
to B
, B
to C
, ... Z
to A
. So the rot+1 transformation of BABEL
is CBCFM
. This is a very simple encryption method. Hint: First write a script transforming a single character, you need to handle the shift Z
to A
separately. find
method of strings¶The find
method looks for occurrences of a given string in another string:
print("abcdef".find("cde"))
So the string "cde"
appears in "abcdef"
at index 2
. For multiple matches only the first occurrence is computed.
print("abcdefabcdef".find("def"))
The value -1
indicates a missing match:
print("abcdef".find("xzy"))
To find all occurrences we make use of an extra feature of find
: we can provide the starting position to look for a match:
print("abcdefabcdef".find("def", 4))
sequence = "GCTGGCAGTCATGCCAACGGGCATGC"
pattern = "GC"
position = sequence.find(pattern)
while position > -1:
print(position)
position = sequence.find(pattern, position + 1)
Rewrite the last example to use an infinite while
loop instead. The following sketch might help:
while True:
# look for occurrence at a given starting position
# if not found: stop looping
# else: report match and update starting position
.count(substring)
counts non overlapping occurrences of substring
.replace(a_string, b_string)
replaces all occurrences of a_string
by b_string
.lower()
and .upper()
convert characters to upper resp. lower case.strip()
removes all white-spaces (space, tab and new line characters) from both ends of the string.strip(characters)
removes all single characters occurring in characters
from both ends of the string..lstrip()
as .strip()
but only from the beginning of the string.rstrip()
as .strip()
but only from the end of the string.startswith(txt)
checks if the given strings starts with txt
.endswith(txt)
checks if the given string ends with txt
."ABABAB".count("AB")
"ABABABA".replace("AB", "x")
"abAB".lower()
"abAB".upper()
" abcd cde\n ".strip()
" abcd cde\n ".lstrip()
"ABCAxBCBA".rstrip("ABC")
"my name".startswith("my")
"my name".endswith("uwe")