Example solutions for script 05_sequences

Exercise 2.2

txt = input("gimme some text: ")
if txt == txt.upper():
    print("all upper case !")
gimme some text: ABC
all upper case !

Exercise 2.3

Skipped, see solution for excersise 2.4

Exercise 2.4

seq = input("please enter sequence: ")
seq = seq.upper().replace(" ", "")

gc_count = seq.count("G") + seq.count("C")
print("GC content is", 100 * gc_count / len(seq), "%")
please enter sequence: GC gc agc
GC content is 85.71428571428571 %

Exercise 2.5

while True:
    
    seq = input("please enter sequence: ")

    if seq.count("A") + seq.count("T") + seq.count("G") + seq.count("C") == len(seq):
        break
    else:
        print("the sequence contains invalid symbols, try again !")
        
gc_count = seq.count("G") + seq.count("C")
print("GC content is", 100 * gc_count / len(seq), "%")
please enter sequence: TGCX
the sequence contains invalid symbols, try again !
please enter sequence: TGCA
GC content is 50.0 %

Exercise 3.3

The code works, because as soon as the first conflicting pair of letters if found found_invalid_pair is set to True which will never be changed for the following iterations, even if we find other conflicting pairs.

The drawback of the version without break is that it loops more often than actually required.

Exercise 3.4

txt = "racecarx"

txt_reverted = ""

for i in range(len(txt)):
    txt_reverted += txt[len(txt) - i - 1]
    
if txt == txt_reverted:
    print(txt, "is a palindrome")
else:
    print(txt, "is not a palindrome")
racecarx is not a palindrome

Exercise 3.5

seq = input("gime some text: ")

count = 0
for i in range(len(seq)):
    if seq[i] == " ":
        count = count + 1

print("the text contains", count, "spaces")
gime some text: hi how are you ?
the text contains 4 spaces

Exercise 3.6

seq = input("give me a sequence: ")

for i in range(len(seq) - 1):
    if seq[i] == "G" and seq[i + 1] == "C":
        print("found GC at position", i)
give me a sequence: GCTCGCAG
found GC at position 0
found GC at position 4

Comment: The upper limit of the for loop is essential. In case you choose len(seq) as upper limit and the given sequence ends with G the other check for C tries to access an symbol beyond the string and thus cause an error message.

To be clearer:

If we would iterate like for i in range(len(seq)):, the i in the last iteration is len(seq) - 1. So the following seq[i + 1] would access the character at position len(seq) which is invalid as we start counting string positions with 0.

Exercise 3.7

seq = input("give me a sequence: ")

reverse_complement = ""

for i in range(len(seq)):
    
    symbol = seq[i]
    
    if symbol == "G":
        complement = "C"
    elif symbol == "C":
        complement = "G"
    elif symbol == "T":
        complement = "A"
    elif symbol == "A":
        complement = "T"
    else:
        print("found invalid symbol", symbol, "at position", i)
        complement = "."
    
    reverse_complement = complement + reverse_complement
    
print("the reverse complement of", seq, "is", reverse_complement)
    
give me a sequence: ATCCAXXY
found invalid symbol X at position 5
found invalid symbol X at position 6
found invalid symbol Y at position 7
the reverse complement of ATCCAXXY is ...TGGAT

Exercise 3.8

txt = input("gimme some text: ")

is_palindrome = True

for i in range(len(txt)):
    if txt[i] != txt[len(txt) - 1 - i]:
        is_palindrome = False
        break
        
print("this is", end=" ")
if not is_palindrome:
    print("not", end=" ")
print("a palindrome")
gimme some text: reliefpfeilerr
this is not a palindrome

Exercise 4.3 to 4.5

plaintext = "PYTHON IS FUN"

# a shift of 13 is the cesar cipher:
shift = 13

encrypted = ""
for i in range(len(plaintext)):
    c = plaintext[i]
    code = ord(c)
    if ord('A') <= code <= ord('Z'):
        code = code + shift
        if code > ord('Z'):
            code = code - (ord('Z') - ord('A'))
            
    encrypted = encrypted + chr(code)
    
print("rot", shift, "encryption of", plaintext, "is", encrypted)

decrypted = ""
for i in range(len(encrypted)):
    c = encrypted[i]
    code = ord(c)
    if ord('A') <= code <= ord('Z'):
        code = code - shift
        if code < ord('A'):
            code = code + (ord('Z') - ord('A'))
            
    decrypted = decrypted + chr(code)
    
print("rot", shift, "decryption of", encrypted, "is", decrypted)
rot 13 encryption of PYTHON IS FUN is DMHUCB VG SIB
rot 13 decryption of DMHUCB VG SIB is PYTHON IS FUN

Exercise 5.3

sequence = "GCTGGCAGTCATGCCAACGGGCATGC"
pattern = "GC"

position = 0

while True:
    position = sequence.find(pattern, position)
    if position == -1:
        break
    print(position)
    position = sequence.find(pattern, position + 1)
0
4
12
20
24