Example solutions for script 06_introduction_to_files

Exercise 2.3

You will see a long string with mysterious symbols because every file is a string of symbols, nevertheless if is is a Word document, Excel sheet or Python program. It is up to the associated application to interpret those symbols when opening a file. For example a Word document is much more than only the typed text but contains formatting and structure information and a plain text file only containing the pure text could not represent this.

Exercise 3.2

acc = 0

fh = open("numbers.txt", "r")
for line in fh:
    acc += int(line.rstrip())

fh.close()

print(acc)
15

Exercise 3.3

fh = open("short.fasta", "r")
for line in fh:
    if line[0] == ">":
        print(line.rstrip())
fh.close()
>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765657|emb|Z78532.1|CCZ78532 C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765656|emb|Z78531.1|CFZ78531 C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765655|emb|Z78530.1|CMZ78530 C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765654|emb|Z78529.1|CLZ78529 C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765652|emb|Z78527.1|CYZ78527 C.yatabeanum 5.8S rRNA gene and ITS1 and ITS2 DNA

Exercise 4.2

acc = 0

with open("numbers.txt", "r") as fh:
    for line in fh:
        acc += int(line.rstrip())

print(acc)
15
with open("short.fasta", "r") as fh:
    for line in fh:
        if line[0] == ">":
            print(line.rstrip())
>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765657|emb|Z78532.1|CCZ78532 C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765656|emb|Z78531.1|CFZ78531 C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765655|emb|Z78530.1|CMZ78530 C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765654|emb|Z78529.1|CLZ78529 C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DNA
>gi|2765652|emb|Z78527.1|CYZ78527 C.yatabeanum 5.8S rRNA gene and ITS1 and ITS2 DNA

Exercise 4.3

with open("status_lines.txt", "w") as fh_out:
    with open("short.fasta", "r") as fh_in:
        for line in fh_in:
            if line[0] == ">":
                print(line.rstrip(), file=fh_out)

Exercise 5.1

If you want to understand how a program works, or why a program does not work as intended you can trace the flow of execution and the the current state of variables by inserting appropriate print function calls:

with open("short.fasta", "r") as fh:
    for line in fh:
        line = line.rstrip()
      
        print()
        print("line:", line)
        
        if len(line) > 0 and line[0] == ">":
            last_status = line
            count = 0
        elif line == "":
            print("COUNT:", count, last_status)
        else:
            count += len(line)
        print("last_status:", last_status)
        print("count:", count)

I ommited the long output.

The plain print() creates an empty line, this enhances the readability. Further I marked the "regular" output with COUNT: to distinguish this from the other lines.