from IPython.core.display import HTML

HTML(open("custom.html", "r").read())

8. File I/O¶

Open a text file (the traditional way) for writing¶

The following example creates a text file containing two lines hi\n and ho\n:

fh = open("say_hi.txt", "w")
print("hi", file=fh)
print("ho", file=fh)

# always close a file because data may be in a buffer
# instead of actually beeing written to file:
fh.close()

open(path, mode) returns an file object (file handle) which we use for manipulating the given file.
mode maybe "r", "w", "a" for "reading", "writing" and "appending" text files (there are more if needed).
the file=fp named parameter when calling print redirects the output to the file.
fp.close closes the file.

# this no python, but a jupyter feature to show the content
# of a file:
!cat say_hi.txt

hi
ho

This "traditional way" is dangerous if you forget to close the file or if an error resumes program execution before the close method is called !

Background: operating systems do not write immediately to disk if you call write but collect data until an internal memory region (buffer) is filled. So you never now exactly what is still in the buffer and what is on disk. Only after closing or calling fp.flush() you can be sure that your data is on disk.

Open a file (modern way) using `with`:¶

Since Python 2.5 the with statement is supported. This statement executes the following body in a secure way, so that the file is always closed, even in case of an error inside the body.

with open("say_hi.txt", "w") as fh:
    print("hi", file=fh)
    print("ho", file=fh)

!cat say_hi.txt

hi
ho

Reading from a text file¶

For text files there are two ways to read: readlines returns the file line by line in a list of strings:

with open("say_hi.txt", "r") as fh:
    print(fh.readlines())

['hi\n', 'ho\n']

Comment: there is also a method called readline (no s at the end !) which only reads one line. So take care the use the right method name.

A very convenient and readable feature of Python is that you can loop over the lines in a file using for:

with open("say_hi.txt", "r") as fh:
    for line in fh:
        print(line)

hi

ho

Why those empty lines ? We still can access the latest value of line:

print(repr(line))

'ho\n'

So line also contains the line break \n from the file. And as print automatically starts a new line when done, we get the empty extra lines.

To get rid of the \n we can use the .rstrip method of strings which removes trailing white-spaces (a white space is a character which "you don't see", like regular spaces, tabs, line breaks):

with open("say_hi.txt", "r") as fh:
    for line in fh:
        line = line.rstrip()
        print(line)

hi
ho

Performance tip: use the for loop for iterating over a file. For huge files this only reads as much bytes as needed in every iteration and thus works for files which are larger than your computers memory !

Again: if you are not sure what an iterator produces you may use list (unless the iterator is infinite):

with open("say_hi.txt", "r") as fh:
    print(list(fh))

['hi\n', 'ho\n']

Working with multiple files at the same time:

with open("say_hi.txt", "r") as fh_in:
    with open("say_hi_upper.txt", "w") as fh_out:
        for line in fh_in:
            print(line.rstrip().upper(), file=fh_out)

with open("say_hi.txt", "r") as fh_in, open("say_hi_upper.txt", "w"), as fh_out:
    for line in fh_in:
        print(line.rstrip().upper(), file=fh_out)

  Cell In[12], line 1
    with open("say_hi.txt", "r") as fh_in, open("say_hi_upper.txt", "w"), as fh_out:
                                                                          ^
SyntaxError: invalid syntax

!cat say_hi_upper.txt

HI
HO

Reading and writing csv files¶

If you work with csv files, do not implement your own reader and writer, use the csv module, there are some corner cases which are tricky (for example you have a cell which contains a "," or line-break "\n"), and there are some variations (dialects).

This is not covered in this script, you find instructions https://pymotw.com/3/csv/index.html!

Another option is to install pandas, a library for handling so called "data frames". pandas can read and write from / to multiple sources, like csv and xlsx files, but also tables from relational databases.

!cat data/test.csv

a,b
1,3
2,4
3,5

We can load this file as a data frame and print it, more about pandas and data frames in later chapters:

import pandas as pd

data = pd.read_csv("data/test.csv", delimiter=",")
print(data)

And we can write it in a different format:

data.to_csv("test2.csv", sep=";", index=False)
!cat test2.csv

a;b
1;3
2;4
3;5

Reading and writing binary files¶

For binary files, like images, the modes for opening are "rb", "wb" and "ab" (reading, writing and appending).

In addition to the methods we introduced above the file handle has methods read and write for interaction. These are mostly used for binary files and not for text files.

Exercise block 8¶

Reread the examples above carefully.

Programming exercise¶

Write a script which writes square numbers 1, 4, 9, ..., 100 line by line to a text file, check the content with your file system explorer then write some code to read the numbers again and compute their product.

Optional exercises*¶

Lookup how to use the csv module and use it to write a 10 x 10 multiplication table to a csv file
Use the same module to read the data from the file again and compute the sum of all entries.