from IPython.core.display import HTML
HTML(open("custom.html", "r").read())
\n is called "line break" character and creates a line break when printed.repr function helps us to examine the exact content of a string..count, .upper and .replace.x.y we say y is an attribute of x.import math we can use sin or pi as attributes of math."hi".upper())import math
print(math.sin(math.e))
print("hey joe".replace(" ", "-"))
You can imagine a file on disk as a string: All files are sequences of single symbols. Even complex files as word documents consist of a sequence of single characters.
If we want to access a file we first have to "open" it. The open function accepts two string arguments: the first one is the name of the file, the second is the so called "access mode":
"r" opens a file for reading,"w" opens a file for writing. If the file already exists it is first deleted !"a" opens a file for appending. If the file does not exist yet it is created. If it exists writing to the file will append new content at the end. (we will not use "a" in the exercises)The return value of open is a so called file handle. A file handle
fh = open("test.txt", "w")
print(type(fh))
fh.write("hi")
fh.write("you")
fh.close()
Type and run the previous example, we explain the details later. Finally you should see a new file in PyCharm next to your script. You can open it with PyCharm with a double mouse click.
Explanations:
fhwrite to write a string to the fileclose to finalize our operations on the file.fh is just a proper variable name, you might choose other names as you like.
To close a file is important: If you forget to close a file the content might be damaged.
If you close a file further operations (as another call of write) are not allowed:
fh = open("test.txt", "w")
fh.write("hi")
fh.close()
fh.write("you")
To read the full content of a file we use the read method:
fh_in = open("test.txt", "r")
content = fh_in.read()
print(content)
fh_in.close()
A more convenient way to write to a file is a variant of print. The extra argument file=fh in the following example redirects the output to the given file:
fh = open("numbers.txt", "w")
for number in range(1, 6):
print(number, file=fh)
fh.close()
The syntax file= is fixed and must appear at the end of print(...). The variable name fh can be arbitrary, but must refer to a file opened in writing mode.
This works for all variants of print, for example:
fh = open("square_numbers.txt", "w")
for number in range(1, 6):
print(number, "squared is", number ** 2, file=fh)
fh.close()
read and repr to examine the content of "numbers.txt".open with mode "rb" (not introduced before) and then read to display the content of the file as a string.for to read from a file¶When we write for x in range(10) we say "we iterate over range(10)". Python is very flexible in this respect, and there are other objects we can iterate over.
So we can iterate over the characters of a string. Instead of
txt = "abc"
for i in range(len(txt)):
print(txt[i])
we can write:
txt = "abc"
for char in txt:
print(char)
Objects we can iterate over with for are called iterables. Beyond range and str objects the file handle we introduced above is another iterable !
If we loop over a file handle we iterate over the lines of a file:
fh = open("numbers.txt", "r")
for line in fh:
print(line)
fh.close()
If you wonder why we see the empty lines in the output you can modify the snippet to use repr which provides details about the actual content of the lines:
fh = open("numbers.txt", "r")
for line in fh:
print(repr(line))
fh.close()
So you can see that if we iterate over the lines in a file with for we get the full line including the line breaks !
We can get rid of trailing line breaks and spaces using the .rstrip method of strings:
fh = open("numbers.txt", "r")
for line in fh:
print(line.rstrip())
fh.close()
Using print(..., file=...) instead of write has some advantages:
write only accepts strings.write you have to include line break character \n, print does this for youWe introduced read and write for didactical reasons, in practice print and using the for base approach are more powerful and easier to use.
"numbers.txt" and computes the sum of the given numbers.>. Suppress empty lines in the output ! Lines starting with > are called status or description lines."abcde".find("bc") and "abcde".find("gx")? Try to forecast the result before you check it with Python. find method + while to find all positions GC in the sequence GCTGGCAGTCATGCCAACGGGCATGCWe can work with several files at the same time. The following script iterates over the numbers in numbers.txt and writes the squares of the numbers to a new file:
fh_in = open("numbers.txt", "r")
fh_out = open("squared.txt", "w")
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
fh_in.close()
fh_out.close()
fh = open("squared.txt", "r")
for line in fh:
print(line.rstrip())
fh.close()
Explanations:
fh_in and fh_out for this. As said it is up to you to choose meaningful and descriptive variable names.Since version 2.5 Python provides an alternative method to work with files and which prevents forgetting to close the file. The following snippet replaces the previous one with the new syntax:
with open("numbers.txt", "r") as fh:
for line in fh:
print(repr(line.rstrip()))
The with statement "protects" the following code block (here the block has two lines): As soon as the execution of the code block ends Python takes care to close the file automatically. This is why you do not see a fh.close() call anymore.
Using with is highly recommended . We introduced the other method for didactical reasons, and if you read other peoples code you still may find the outdated approach.
If we want to work with two open files at the same time we have to nest with statements: The first with protects the following four lines, the second with the following three lines:
with open("numbers.txt", "r") as fh_in:
with open("squared.txt", "w") as fh_out:
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
Another option is to chain multiple open after with separated by , like this:
with open("numbers.txt", "r") as fh_in, open("squared.txt", "w") as fh_out:
for line in fh_in:
current_number = int(line.rstrip())
print(current_number ** 2, file=fh_out)
A quick and dirty check of the result file:
print(open("squared.txt", "r").read())
with approach.The FASTA file we downloaded in exercise 3 has the nice property that it contains a blank line after every sequence. This is not always the case for FASTA files, but helps us here to implement code which displays for every sequence the overall length of the sequence followed by the according status line.
Open the FASTA file and inspect it before you continue !
The strategy is as follows: We iterate over the lines using for and:
Open the FASTA file and read this strategy again !
with open("short.fasta", "r") as fh:
for line in fh:
line = line.rstrip()
# line might be empty after rstrip, in this case line[0] would be an error:
if len(line) > 0 and line[0] == ">":
last_status = line
count = 0
elif line == "":
print("symbol count:", count, "in", last_status)
else:
# the current line is neither a status line nor empty,
# thus line must be part of the current sequence:
count = count + len(line)
The if, elif and else correspond one-to-one to the three items in the list above describing the strategy!
print statements to show the values of the variables in every iteration).GC content of every sequence and write this to the csv file as an additional column.If you want to access (read or write) a file at a different place than next to your Python script you will have to provide a so called path to open. This is a string describing the location of a file on your computer. You have to follow the file system hierarchy folder by folder as seen in the examples below.
Example: you want to write to a file data.txt to a sub-folder Documents in your home folder of your machine.
For Windows:
Usually you have to navigate from the top of drive C: to the folder Windows, then to Users and the to the folder with your name and finally to the Documents folder. Using a path this writes as:
with open("C:\\Windows\\Users\\uweschmitt\\Documents\\data.txt", "w") as fh:
print("hi", file=fh)
For Mac OS the folder structure differs and the path is:
with open("/Users/uweschmitt/Documents/data.txt", "w") as fh:
print("hi", file=fh)
And on Linux (it fails here, because I work with a Mac):
with open("/home/uweschmitt/Documents/data.txt", "w") as fh:
print("hi", file=fh)
So the delimiter for the folders depends on the operating system and the location of the home folder as well.
Python provides helpful data types which collect data and these types are often called container types. list is one of them.
A list in Python starts with an opening [ and a closing ], the following example uses a list holding three values of type int, namely 1, 2 and 3:
data = [1, 2, 3]
print(data)
print(type(data))
Similar to using quotes for delimiting a string, square brackets are used to delimit the elements of a list.
The types of the items in a list are arbitrary and can be mixed:
mixed_list = [1, "2", 3.14]
To compute the number of items in a list, we use the len function:
print(len(mixed_list))
The empty list is written []:
print(len([]))
To access elements in a list we use [] as we did it to access characters in a string, again indexing starts with $0$:
print([1, 2, 3][0])
print(mixed_list[2])
So we see the use of brackets in different situations:
Python ships with a module named csv which helps to read and write .csv files.
Why to use this module ?
csv module is able to handle all variants (so called "dialects") of this file format and also all special cases, e.g. when the actual delimiter is part of a cell.csv module representa a row of a .csv. as a list contating the cell elements. This simplifies handling of .csv files.Thus it is recommended to use this module instead of resorting to manual string handling as we did it in exercise block 5.
Again we first have to import csv to access its attributes.
To write to .csv files we use the writer function from the csv module:
csv.writer function requires a file handle to a file opened in write mode. writerow method.csv file and all interactions with this file are executed using methods of this object:import csv
with open("example.csv", "w", newline="") as fh:
w = csv.writer(fh, delimiter=",")
w.writerow(["a", "b", "c"])
w.writerow([1, "2", ","])
w.writerow([2, 3, 7])
About the previous example:
open the file with an extra argument newline="" which is required on Windows and does no harm on Mac OS or Linux. csv writing handle w by calling csv.writer(fh). w is just a variable name and may be modified.w.writerow accepts a single Python list representing a row, the types of the cells (list elements) are arbitrary.w.writerow shows why self written csv handling code might fail: we have a cell containing the , delimiter as dataComment: if you look at the previous example you see that writerow accepts a list where the type of the values can be mixed. You also see, that the writerow just writes the cell contents and doesn't care about the chosen delimiter.
Now we display the result from the previous script, you should see the csv file in the project explorer of PyCharm as well (if you repeat the example the output might slightly differ on your machine depending on your operating system):
with open("example.csv", "r", newline="") as fh:
for line in fh:
print(repr(line))
You can see that the cell with , is written as ",". This is according to the csv file format specification.
Reading from a csv file can be done by iterating with for over the handle object returned by csv.reader:
In this case the for iterates over the lines of the input file and transforms the contents of the cells of the current line to a list. So for every iteration you get a list of cell contents:
import csv
with open("example.csv", "r", newline="") as fh:
for row in csv.reader(fh):
print("current row as list is", row)
Comments:
, is retrieved correctly.,¶Often .csv files have ; as delimiters, or .tsv file tab characters. In this case you can specify these when calling csv.reader and csv.writer with the extra names parameter delimiter. For example:
import csv
with open("example2.csv", "w", newline="") as fh:
w = csv.writer(fh, delimiter="\t") ### this is new !
w.writerow(["a", "b", "c"])
w.writerow([1, "2", ","])
w.writerow([2, 3, 7])
with open("example2.csv", "r", newline="") as fh:
for line in csv.reader(fh, delimiter="\t"): ### this is new !
print("current line as list is", line)
Comments:
"\t" by ";", rerun the code and look at the created fileTo skip the header line we can use the next function. This function reads one row and returns it. A following for will start from the current row, not from the beginning:
import csv
line_number = 0
with open("example2.csv", "r", newline="") as fh:
r = csv.reader(fh, delimiter="\t")
header = next(r) # reads one row from the csv file
print("header is", header)
for line in r:
a = line[0]
b = line[1]
print("a+b is", int(a) + int(b))
csv module.csv module to display the content of this file. csv module, check for every row (=list) if the one-letter symbol matches and if this is the case extract the needed information from the current list (row) !