File Management
Working with Files
The term file management in the context of computers refers to the manipulation of data in a file
or files and documents on a computer. Though everybody has an understanding of the term file, we
present a formal definition anyway:
A file or a computer file is a chunk of logically related data or information which can be used by
computer programms. Usually a file is kept on a permanent storage media, e.g. a hard drive disk.
A unique name and path is used by human users or in programs or scripts to access a file
for reading and modification purposes.
The term "file" as we have described it paragraph above appeared in the history of computers
very early. Usage can be tracked down to the year 1952, when punch cards where used.
A programming language without the capability to store and retrieve previously stored
information would be hardly useful.
The most basic tasks involved in file manipulation are reading data from files and
writing or appending data to files.
The syntax for reading and writing files in Python is similar to programming languages like C,
C++ or Perl, but a lot easier to handle.
In our first example we want to show how to read data from a file. The way of telling Python
that we want to read from a file is to use the open function. The first parameter is the name
of the file we want to read and with the second parameter, assigned to the value "r", we are
stating that we want to read from the file:
fobj = open("ad_lesbiam.txt", "r")
The "r" is optional. An open() command with just a filename is opened for reading per default.
The open() function returns a file object, which offers attributes and methods.
fobj = open("ad_lesbiam.txt")
After we have finished working with a file, we have to close it again by using the
file object method close():
fobj.close()
Now we want to finally open and read a file. The method rstrip() in the following example is used to strip of whitespaces (newlines included) from the right side of the string "line":
fobj = open("ad_lesbiam.txt")
for line in fobj:
print(line.rstrip())
fobj.close()
If we save this script and call it, we get the following output, provided that the textfile is available:
$ python file_read.py V. ad Lesbiam VIVAMUS mea Lesbia, atque amemus, rumoresque senum severiorum omnes unius aestimemus assis! soles occidere et redire possunt: nobis cum semel occidit breuis lux, nox est perpetua una dormienda. da mi basia mille, deinde centum, dein mille altera, dein secunda centum, deinde usque altera mille, deinde centum. dein, cum milia multa fecerimus, conturbabimus illa, ne sciamus, aut ne quis malus inuidere possit, cum tantum sciat esse basiorum. (GAIUS VALERIUS CATULLUS)By the way, the poem above is a love poem of Catull, who was hopelessly in love with a woman called Lesbia.
Write into a File
Writing to a file is as easy as reading from a file. To open a file for writing
we use for the second parameter a "w" instead of a "r". To actually write the data
into this file, we use the method write() of the file handle object.
Example for simultaneously reading and writing:
fobj_in = open("ad_lesbiam.txt")
fobj_out = open("ad_lesbiam2.txt","w")
i = 1
for line in fobj_in:
print(line.rstrip())
fobj_out.write(str(i) + ": " + line)
i = i + 1
fobj_in.close()
fobj_out.close()
Every line of the input text file is prefixed by its line number. So the result looks
like this:
$ more ad_lesbiam2.txt 1: V. ad Lesbiam 2: 3: VIVAMUS mea Lesbia, atque amemus, 4: rumoresque senum severiorum 5: omnes unius aestimemus assis! 6: soles occidere et redire possunt: 7: nobis cum semel occidit breuis lux, 8: nox est perpetua una dormienda. 9: da mi basia mille, deinde centum, 10: dein mille altera, dein secunda centum, 11: deinde usque altera mille, deinde centum. 12: dein, cum milia multa fecerimus, 13: conturbabimus illa, ne sciamus, 14: aut ne quis malus inuidere possit, 15: cum tantum sciat esse basiorum. 16: (GAIUS VALERIUS CATULLUS)There is one possible problem, which we have to point out: What happens if we open a file for writing, and this file already exists. You can consider yourself fortunate, if the content of this file was of no importance, or if you have a backup of it. Otherwise you have a problem, because as soon as an open() with a "w" has been executed the file will be removed. This is often what you want, but sometimes you just want to append to the file, like it's the case with logfiles.
If you want to append something to an existing file, you have to use "a" instead of "w".
Reading in one go
So far we worked on files line by line by using a for loop. Very often, especially if the file is not too large, it's more convenient to read the file into a complete data structure, e.g. a string or a list. The file can be closed after reading and the work is accomplished on this data structure:
>>> poem = open("ad_lesbiam.txt").readlines()
>>> print(poem)
['V. ad Lesbiam \n', '\n', 'VIVAMUS mea Lesbia, atque amemus,\n', 'rumoresque senum severiorum\n', 'omnes unius aestimemus assis!\n', 'soles occidere et redire possunt:\n', 'nobis cum semel occidit breuis lux,\n', 'nox est perpetua una dormienda.\n', 'da mi basia mille, deinde centum,\n', 'dein mille altera, dein secunda centum,\n', 'deinde usque altera mille, deinde centum.\n', 'dein, cum milia multa fecerimus,\n', 'conturbabimus illa, ne sciamus,\n', 'aut ne quis malus inuidere possit,\n', 'cum tantum sciat esse basiorum.\n', '(GAIUS VALERIUS CATULLUS)']
>>> print(poem[2])
VIVAMUS mea Lesbia, atque amemus,
In the above example, the complete poem is read into the list poem. We can acces e.g. the 3rd line with
poem[2].
Another convenient way to read in a file might be the method read() of open. With this method we can read the complete file into a string, as we can see in the next example:
>>> poem = open("ad_lesbiam.txt").read()
>>> print(poem[16:34])
VIVAMUS mea Lesbia
>>> type(poem)
<type 'str'>
>>>
This string contains the complete content of the file, which includes the carriage returns and line feeds.
"How to get into a Pickle"
We don't mean what the heading says. On the contrary, we want to prevent any nasty situation, like
loosing the data, which your Python program has calculated. So, we will show you, how you can save
your data in an easy way, that you or better your programm can reread them at a later date again.
We are "pickling" the data, so that nothing gets lost.
Python offers for this purpose a module, which is called "pickle"
With the algorithms of the pickle module we can serialize and de-serialize Python object structures.
"Pickling" denotes the process which converts a Python object hierarchy into a byte stream,
and "unpickling" on the other hand is the inverse operation, i.e. the byte stream is converted back
into an object hierarchy. What we call pickling (and unpickling) is also known as "serialization"
or "flattening" a data structure.
An object can be dumped with the dump method of the pickle module:
pickle.dump(obj, file[,protocol, *, fix_imports=True])dump() writew a pickled representation of obj to the open file object file. The optional protocol argument tells the pickler to use the given protocol:
- Protocol version 0 is the original (before Python3) human-readable (ascii) protocol and is backwards compatible with previous versions of Python
- Protocol version 1 is the old binary format which is also compatible with previous versions of Python.
- Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.
- Protocol version 3 was introduced with Python 3.0. It has explicit support for bytes and cannot be unpickled by Python 2.x pickle modules. It's the recommended protocol of Python 3.x.
Objects which have been dumped to a file with pickle.dump can be reread into a program by using the method pickle.load(file). pickle.load recognizes automatically, which format had been used for writing the data.
A simple example:
>>> cities = ["Paris", "Dijon","Lyon","Strasbourg"]
>>> fh = open("data.pkl","bw")
>>> pickle.dump(cities,fh)
>>> fh.close()
The file data.pkl can be read in again by Python in the same or another session or
by a different program:
>>> import pickle
>>> f = open("data.pkl","rb")
>>> villes = pickle.load(f)
>>> print(villes)
['Paris', 'Dijon', 'Lyon', 'Strasbourg']
>>>
Only the objects and not their names are saved. That's why we use the assignment to villes in the
previous example, i.e.data = pickle.load(f).

