from IPython.core.display import HTML
HTML(open("custom.html", "r").read())
\n
is called "line break" character and creates a line break when printed.repr
function helps us to examine the exact content of a string..count
, .upper
and .replace
.x.y
we say y
is an attribute of x
.import math
we can use sin
or pi
as attributes of math
."hi".upper()
)import math
print(math.sin(math.e))
print("hey joe".replace(" ", "-"))
You can imagine a file on disk as a string: All files are sequences of single symbols. Even complex files as word documents consist of a sequence of single characters.
If we want to access a file we first have to "open" it. The open
function accepts two string arguments: the first one is the name of the file, the second is the so called "access mode":
"r"
opens a file for reading,"w"
opens a file for writing. If the file already exists it is first deleted !"a"
opens a file for appending. If the file does not exist yet it is created. If it exists writing to the file will append new content at the end. (we will not use "a"
in the exercises)The return value of open
is a so called file handle. A file handle
fh = open("test.txt", "w")
print(type(fh))
fh.write("hi")
fh.write("you")
fh.close()
Type and run the previous example, we explain the details later. Finally you should see a new file in PyCharm next to your script. You can open it with PyCharm with a double mouse click.
Explanations:
fh
write
to write a string to the fileclose
to finalize our operations on the file.fh
is just a proper variable name, you might choose other names as you like.
To close a file is important: If you forget to close a file the content might be damaged.
If you close a file further operations (as another call of write
) are not allowed:
fh = open("test.txt", "w")
fh.write("hi")
fh.close()
fh.write("you")
To read the full content of a file we use the read
method:
fh_in = open("test.txt", "r")
content = fh_in.read()
print(content)
fh_in.close()
A more convenient way to write to a file is a variant of print
. The extra argument file=fh
in the following example redirects the output to the given file:
fh = open("numbers.txt", "w")
for number in range(1, 6):
print(number, file=fh)
fh.close()
The syntax file=
is fixed and must appear at the end of print(...)
. The variable name fh
can be arbitrary, but must refer to a file opened in writing mode.
This works for all variants of print
, for example:
fh = open("square_numbers.txt", "w")
for number in range(1, 6):
print(number, "squared is", number ** 2, file=fh)
fh.close()
read
and repr
to examine the content of "numbers.txt"
.open
with mode "rb"
(not introduced before) and then read
to display the content of the file as a string.for
to read from a file¶When we write for x in range(10)
we say "we iterate over range(10)
". Python is very flexible in this respect, and there are other objects we can iterate over.
So we can iterate over the characters of a string. Instead of
txt = "abc"
for i in range(len(txt)):
print(txt[i])
we can write:
txt = "abc"
for char in txt:
print(char)
Objects we can iterate over with for
are called iterables. Beyond range
and str
objects the file handle we introduced above is another iterable !
If we loop over a file handle we iterate over the lines of a file:
fh = open("numbers.txt", "r")
for line in fh:
print(line)
fh.close()
If you wonder why we see the empty lines in the output you can modify the snippet to use repr
which provides details about the actual content of the lines:
fh = open("numbers.txt", "r")
for line in fh:
print(repr(line))
fh.close()
So you can see that if we iterate over the lines in a file with for
we get the full line including the line breaks !
We can get rid of trailing line breaks and spaces using the .rstrip
method of strings:
fh = open("numbers.txt", "r")
for line in fh:
print(line.rstrip())
fh.close()
Using print(..., file=...)
instead of write
has some advantages:
write
only accepts strings.write
you have to include line break character \n
, print
does this for youWe introduced read
and write
for didactical reasons, in practice print
and using the for
base approach are more powerful and easier to use.
"numbers.txt"
and computes the sum of the given numbers.>
. Suppress empty lines in the output ! Lines starting with >
are called status or description lines."abcde".find("bc")
and "abcde".find("gx")
? Try to forecast the result before you check it with Python. find
method + while
to find all positions GC
in the sequence GCTGGCAGTCATGCCAACGGGCATGC
We can work with several files at the same time. The following script iterates over the numbers in numbers.txt
and writes the squares of the numbers to a new file:
fh_in = open("numbers.txt", "r")
fh_out = open("squared.txt", "w")
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
fh_in.close()
fh_out.close()
fh = open("squared.txt", "r")
for line in fh:
print(line.rstrip())
fh.close()
Explanations:
fh_in
and fh_out
for this. As said it is up to you to choose meaningful and descriptive variable names.Since version 2.5 Python provides an alternative method to work with files and which prevents forgetting to close the file. The following snippet replaces the previous one with the new syntax:
with open("numbers.txt", "r") as fh:
for line in fh:
print(repr(line.rstrip()))
The with
statement "protects" the following code block (here the block has two lines): As soon as the execution of the code block ends Python takes care to close the file automatically. This is why you do not see a fh.close()
call anymore.
Using with
is highly recommended . We introduced the other method for didactical reasons, and if you read other peoples code you still may find the outdated approach.
If we want to work with two open files at the same time we have to nest with
statements: The first with
protects the following four lines, the second with
the following three lines:
with open("numbers.txt", "r") as fh_in:
with open("squared.txt", "w") as fh_out:
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
Another option is to chain multiple open
after with separated by ,
like this:
with open("numbers.txt", "r") as fh_in, open("squared.txt", "w") as fh_out:
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
A quick and dirty check of the result file:
print(open("squared.txt", "r").read())
with
approach.The FASTA file we downloaded in exercise 3 has the nice property that it contains a blank line after every sequence. This is not always the case for FASTA files, but helps us here to implement code which displays for every sequence the overall length of the sequence followed by the according status line.
Open the FASTA file and inspect it before you continue !
The strategy is as follows: We iterate over the lines using for
and:
Open the FASTA file and read this strategy again !
with open("short.fasta", "r") as fh:
for line in fh:
line = line.rstrip()
# line might be empty after rstrip, in this case line[0] would be an error:
if len(line) > 0 and line[0] == ">":
last_status = line
count = 0
elif line == "":
print("symbol count:", count, "in", last_status)
else:
# the current line is neither a status line nor empty,
# thus line must be part of the current sequence:
count = count + len(line)
The if
, elif
and else
correspond one-to-one to the three items in the list above describing the strategy!
print
statements to show the values of the variables in every iteration).GC
content of every sequence and write this to the csv file as an additional column.If you want to access (read or write) a file at a different place than next to your Python script you will have to provide a so called path to open
. This is a string describing the location of a file on your computer. You have to follow the file system hierarchy folder by folder as seen in the examples below.
Example: you want to write to a file data.txt
to a sub-folder Documents
in your home folder of your machine.
For Windows:
Usually you have to navigate from the top of drive C:
to the folder Windows
, then to Users
and the to the folder with your name and finally to the Documents
folder. Using a path this writes as:
with open("C:\\Windows\\Users\\uweschmitt\\Documents\\data.txt", "w") as fh:
print("hi", file=fh)
For Mac OS the folder structure differs and the path is:
with open("/Users/uweschmitt/Documents/data.txt", "w") as fh:
print("hi", file=fh)
And on Linux (it fails here, because I work with a Mac):
with open("/home/uweschmitt/Documents/data.txt", "w") as fh:
print("hi", file=fh)
So the delimiter for the folders depends on the operating system and the location of the home folder as well.
Python provides helpful data types which collect data and these types are often called container types. list
is one of them.
A list in Python starts with an opening [
and a closing ]
, the following example uses a list holding three values of type int
, namely 1
, 2
and 3
:
data = [1, 2, 3]
print(data)
print(type(data))
Similar to using quotes for delimiting a string, square brackets are used to delimit the elements of a list.
The types of the items in a list are arbitrary and can be mixed:
mixed_list = [1, "2", 3.14]
To compute the number of items in a list, we use the len
function:
print(len(mixed_list))
The empty list is written []
:
print(len([]))
To access elements in a list we use []
as we did it to access characters in a string, again indexing starts with $0$:
print([1, 2, 3][0])
print(mixed_list[2])
So we see the use of brackets in different situations:
Python ships with a module named csv
which helps to read and write .csv
files.
Why to use this module ?
csv
module is able to handle all variants (so called "dialects") of this file format and also all special cases, e.g. when the actual delimiter is part of a cell.csv
module representa a row of a .csv.
as a list contating the cell elements. This simplifies handling of .csv
files.Thus it is recommended to use this module instead of resorting to manual string handling as we did it in exercise block 5.
Again we first have to import csv
to access its attributes.
To write to .csv
files we use the writer
function from the csv
module:
csv.writer
function requires a file handle to a file opened in write mode. writerow
method.csv
file and all interactions with this file are executed using methods of this object:import csv
with open("example.csv", "w", newline="") as fh:
w = csv.writer(fh, delimiter=",")
w.writerow(["a", "b", "c"])
w.writerow([1, "2", ","])
w.writerow([2, 3, 7])
About the previous example:
open
the file with an extra argument newline=""
which is required on Windows and does no harm on Mac OS or Linux. csv
writing handle w
by calling csv.writer(fh)
. w
is just a variable name and may be modified.w.writerow
accepts a single Python list representing a row, the types of the cells (list elements) are arbitrary.w.writerow
shows why self written csv handling code might fail: we have a cell containing the ,
delimiter as dataComment: if you look at the previous example you see that writerow
accepts a list where the type of the values can be mixed. You also see, that the writerow
just writes the cell contents and doesn't care about the chosen delimiter.
Now we display the result from the previous script, you should see the csv file in the project explorer of PyCharm as well (if you repeat the example the output might slightly differ on your machine depending on your operating system):
with open("example.csv", "r", newline="") as fh:
for line in fh:
print(repr(line))
You can see that the cell with ,
is written as ","
. This is according to the csv
file format specification.
Reading from a csv
file can be done by iterating with for
over the handle object returned by csv.reader
:
In this case the for
iterates over the lines of the input file and transforms the contents of the cells of the current line to a list. So for every iteration you get a list of cell contents:
import csv
with open("example.csv", "r", newline="") as fh:
for row in csv.reader(fh):
print("current row as list is", row)
Comments:
,
is retrieved correctly.,
¶Often .csv
files have ;
as delimiters, or .tsv
file tab characters. In this case you can specify these when calling csv.reader
and csv.writer
with the extra names parameter delimiter
. For example:
import csv
with open("example2.csv", "w", newline="") as fh:
w = csv.writer(fh, delimiter="\t") ### this is new !
w.writerow(["a", "b", "c"])
w.writerow([1, "2", ","])
w.writerow([2, 3, 7])
with open("example2.csv", "r", newline="") as fh:
for line in csv.reader(fh, delimiter="\t"): ### this is new !
print("current line as list is", line)
Comments:
"\t"
by ";"
, rerun the code and look at the created fileTo skip the header line we can use the next
function. This function reads one row and returns it. A following for
will start from the current row, not from the beginning:
import csv
line_number = 0
with open("example2.csv", "r", newline="") as fh:
r = csv.reader(fh, delimiter="\t")
header = next(r) # reads one row from the csv file
print("header is", header)
for line in r:
a = line[0]
b = line[1]
print("a+b is", int(a) + int(b))
csv
module.csv
module to display the content of this file. csv
module, check for every row (=list) if the one-letter symbol matches and if this is the case extract the needed information from the current list (row) !