File Management


Working with Files


Files The term file management in the context of computers refers to the manipulation of data in a file or files and documents on a computer. Though everybody has an understanding of the term file, we present a formal definition anyway:

A file or a computer file is a chunk of logically related data or information which can be used by computer programms. Usually a file is kept on a permanent storage media, e.g. a hard drive disk. A unique name and path is used by human users or in programs or scripts to access a file for reading and modification purposes.

The term "file" as we have described it paragraph above appeared in the history of computers very early. Usage can be tracked down to the year 1952, when punch cards where used.

A programming language without the capability to store and retrieve previously stored information would be hardly useful.

The most basic tasks involved in file manipulation are reading data from files and writing or appending data to files.

The syntax for reading and writing files in Python is similar to programming languages like C, C++ or Perl, but a lot easier to handle.

In our first example we want to show how to read data from a file. The way of telling Python that we want to read from a file is to use the open function. The first parameter is the name of the file we want to read and with the second parameter, assigned to the value "r", we are stating that we want to read from the file:

fobj = open("ad_lesbiam.txt", "r")
The "r" is optional. An open() command with just a filename is opened for reading per default. The open() function returns a file object, which offers attributes and methods.
fobj = open("ad_lesbiam.txt")
After we have finished working with a file, we have to close it again by using the file object method close():
fobj.close()

Now we want to finally open and read a file. The method rstrip() in the following example is used to strip of whitespaces (newlines included) from the right side of the string "line":
fobj = open("ad_lesbiam.txt")
for line in fobj:
    print(line.rstrip())
fobj.close()

If we save this script and call it, we get the following output, provided that the textfile is available:
$ python file_read.py 
V. ad Lesbiam

VIVAMUS mea Lesbia, atque amemus,
rumoresque senum severiorum
omnes unius aestimemus assis!
soles occidere et redire possunt:
nobis cum semel occidit breuis lux,
nox est perpetua una dormienda.
da mi basia mille, deinde centum,
dein mille altera, dein secunda centum,
deinde usque altera mille, deinde centum.
dein, cum milia multa fecerimus,
conturbabimus illa, ne sciamus,
aut ne quis malus inuidere possit,
cum tantum sciat esse basiorum.
(GAIUS VALERIUS CATULLUS)
By the way, the poem above is a love poem of Catull, who was hopelessly in love with a woman called Lesbia.



Write into a File

Writing to a file is as easy as reading from a file. To open a file for writing we use for the second parameter a "w" instead of a "r". To actually write the data into this file, we use the method write() of the file handle object.
Example for simultaneously reading and writing:

fobj_in = open("ad_lesbiam.txt")
fobj_out = open("ad_lesbiam2.txt","w")
i = 1
for line in fobj_in:
    print(line.rstrip())
    fobj_out.write(str(i) + ": " + line)
    i = i + 1
fobj_in.close()
fobj_out.close()

Every line of the input text file is prefixed by its line number. So the result looks like this:
$ more ad_lesbiam2.txt 
1: V. ad Lesbiam 
2: 
3: VIVAMUS mea Lesbia, atque amemus,
4: rumoresque senum severiorum
5: omnes unius aestimemus assis!
6: soles occidere et redire possunt:
7: nobis cum semel occidit breuis lux,
8: nox est perpetua una dormienda.
9: da mi basia mille, deinde centum,
10: dein mille altera, dein secunda centum,
11: deinde usque altera mille, deinde centum.
12: dein, cum milia multa fecerimus,
13: conturbabimus illa, ne sciamus,
14: aut ne quis malus inuidere possit,
15: cum tantum sciat esse basiorum.
16: (GAIUS VALERIUS CATULLUS)
There is one possible problem, which we have to point out: What happens if we open a file for writing, and this file already exists. You can consider yourself fortunate, if the content of this file was of no importance, or if you have a backup of it. Otherwise you have a problem, because as soon as an open() with a "w" has been executed the file will be removed. This is often what you want, but sometimes you just want to append to the file, like it's the case with logfiles.

If you want to append something to an existing file, you have to use "a" instead of "w".

Reading in one go

So far we worked on files line by line by using a for loop. Very often, especially if the file is not too large, it's more convenient to read the file into a complete data structure, e.g. a string or a list. The file can be closed after reading and the work is accomplished on this data structure:
>>> poem = open("ad_lesbiam.txt").readlines()
>>> print(poem)
['V. ad Lesbiam \n', '\n', 'VIVAMUS mea Lesbia, atque amemus,\n', 'rumoresque senum severiorum\n', 'omnes unius aestimemus assis!\n', 'soles occidere et redire possunt:\n', 'nobis cum semel occidit breuis lux,\n', 'nox est perpetua una dormienda.\n', 'da mi basia mille, deinde centum,\n', 'dein mille altera, dein secunda centum,\n', 'deinde usque altera mille, deinde centum.\n', 'dein, cum milia multa fecerimus,\n', 'conturbabimus illa, ne sciamus,\n', 'aut ne quis malus inuidere possit,\n', 'cum tantum sciat esse basiorum.\n', '(GAIUS VALERIUS CATULLUS)']
>>> print(poem[2])
VIVAMUS mea Lesbia, atque amemus,
In the above example, the complete poem is read into the list poem. We can acces e.g. the 3rd line with poem[2].

Another convenient way to read in a file might be the method read() of open. With this method we can read the complete file into a string, as we can see in the next example:
>>> poem = open("ad_lesbiam.txt").read()
>>> print(poem[16:34])
VIVAMUS mea Lesbia
>>> type(poem)
<type 'str'>
>>> 
This string contains the complete content of the file, which includes the carriage returns and line feeds.

"How to get into a Pickle"

Pickle We don't mean what the heading says. On the contrary, we want to prevent any nasty situation, like loosing the data, which your Python program has calculated. So, we will show you, how you can save your data in an easy way, that you or better your programm can reread them at a later date again. We are "pickling" the data, so that nothing gets lost.

Python offers for this purpose a module, which is called "pickle" With the algorithms of the pickle module we can serialize and de-serialize Python object structures. "Pickling" denotes the process which converts a Python object hierarchy into a byte stream, and "unpickling" on the other hand is the inverse operation, i.e. the byte stream is converted back into an object hierarchy. What we call pickling (and unpickling) is also known as "serialization" or "flattening" a data structure.

An object can be dumped with the dump method of the pickle module:

pickle.dump(obj, file[,protocol, *, fix_imports=True])
dump() writew a pickled representation of obj to the open file object file. The optional protocol argument tells the pickler to use the given protocol: The default protocol of Python 3 is 3.
Objects which have been dumped to a file with pickle.dump can be reread into a program by using the method pickle.load(file). pickle.load recognizes automatically, which format had been used for writing the data.
A simple example:
>>> cities = ["Paris", "Dijon","Lyon","Strasbourg"]
>>> fh = open("data.pkl","bw")
>>> pickle.dump(cities,fh)
>>> fh.close()
The file data.pkl can be read in again by Python in the same or another session or by a different program:
>>> import pickle
>>> f = open("data.pkl","rb")
>>> villes = pickle.load(f)
>>> print(villes)
['Paris', 'Dijon', 'Lyon', 'Strasbourg']
>>>
Only the objects and not their names are saved. That's why we use the assignment to villes in the previous example, i.e.data = pickle.load(f).