Synthetical Test Data With Python



Definition of Synthetical Data

Chernoff Faces

There is hardly any engineer or scientist who doesn't understand the need for synthetical data, also called synthetic data. But some may have asked themselves what do we understand by synthetical test data? There are lots of situtations, where a scientist or an engineer needs learn or test data, but it is hard or impossible to get real data, i.e. a sample from a population obtained by measurement. The task or challenge of creating synthetical data consists in producing data which resembles or comes quite close to the intended "real life" data. Python is an ideal language for easily producing such data, because it has powerful numerical and linguistic functionalities.

Synthetic data are also necessary to satisfy specific needs or certain conditions that may not be found in the "real life" data. Another use case of synthetical data is to protect privacy of the data needed.

In our previous chapter "Python, Numpy and Probability", we have written some functions, which we will need in the following:

You should be familiar with the way of working of these functions.

We saved the functions in a module with the name bk_random.



Definition of the Scope of Synthetic Data Creation

We want to provide solutions to the following task:

We have n finite sets containing data of various types:

D1, D2, ... Dn

The sets Di are the data sets from which we want to deduce our synthetical data.

In the actual implementation, the sets will be tuples or lists for practical reasons.

The process of creating synthetic data can be defined by two functions "synthesizer" and "synthesize". Usually, the word synthesizer is used for a computerized electronic device which produces sound. Our synthesizer produces strings or alternatively tuples with data, as we will see later.

The function synthesizer creates the function synthesize:

synthesize = synthesizer( (D1, D2, ... Dn) )

The function synthesize, - which may also be a generator like in our implementation, - takes no arguments and the result of a function call sythesize() will be

Let us start with a simple example. We have a list of firstnames and a list of surnames. We want to hire employees for an institute or company. Of course, it will be a lot easier in our synthetical Python environment to find and hire specialsts than in real life. The function "cartesian_choice" from the bk_random module and the concatenation of the randomly drawn firstnames and surnames is all it takes.

import bk_random 
firstnames = ["John", "Eve", "Jane", "Paul", 
              "Frank", "Laura", "Robert", 
              "Kathrin", "Roger", "Simone",
              "Bernard", "Sarah", "Yvonne"]
surnames = ["Singer", "Miles", "Moore", 
            "Looper", "Rampman", "Chopman", 
            "Smiley", "Bychan", "Smith",
            "Baker", "Miller", "Cook"]
   
number_of_specialists = 15
    
employees = set()
while len(employees) < number_of_specialists:
    employee = bk_random.cartesian_choice(firstnames, surnames)
    employees.add(" ".join(employee))
print(employees)
{'Laura Smith', 'Yvonne Miles', 'Sarah Cook', 'Jane Smith', 'Paul Moore', 'Jane Miles', 'Jane Looper', 'Frank Singer', 'Frank Miles', 'Jane Cook', 'Frank Chopman', 'Laura Cook', 'Yvonne Bychan', 'Eve Miles', 'Simone Cook'}

This was easy enough, but we want to do it now in a more structured way, using the synthesizer approach we mentioned before. The code for the case in which the parameter "weights" is not None is still missing in the following implementation:

import bk_random 
firstnames = ["John", "Eve", "Jane", "Paul", 
              "Frank", "Laura", "Robert", 
              "Kathrin", "Roger", "Simone",
              "Bernard", "Sarah", "Yvonne"]
surnames = ["Singer", "Miles", "Moore", 
            "Looper", "Rampman", "Chopman", 
            "Smiley", "Bychan", "Smith",
            "Baker", "Miller", "Cook"]
def synthesizer( data, weights=None, format_func=None, repeats=True):
    """
    data is a tuple or list of lists or tuples containing the 
    data
    weights is a list or tuple of lists or tuples with the 
    corresponding weights of the data lists or tuples
    format_func is a reference to a function which defines
    how a random result of the creator function will be formated. 
    If None, "creator" will return the list "res".
    If repeats is set to True, the results of helper will not be unique
    """
    def synthesize():
        if not repeats:
            memory = set()
        while True:
            res = bk_random.cartesian_choice(*data)
            if not repeats:
                sres = str(res)
                while sres in memory:
                    res = bk_random.cartesian_choice(*data)
                    sres = str(res)
                memory.add(sres)
            if format_func:
                yield format_func(res)
            else:
                yield res
    return synthesize
        
recruit_employee = synthesizer( (firstnames, surnames), 
                                 format_func=lambda x: " ".join(x),
                                 repeats=False)
employee = recruit_employee()
for _ in range(15):
    print(next(employee))
    
Sarah Baker
Frank Smiley
Simone Smiley
Frank Bychan
Sarah Moore
Simone Chopman
Frank Chopman
Eve Rampman
Bernard Miller
Simone Bychan
Jane Singer
Roger Smith
John Baker
Robert Cook
Kathrin Cook

Every name, i.e first name and last name, had the same likehood to be drawn in the previous example. This is not very realistic, because we will expect in countries like the US or England names like Smith and Miller to occur more often than names like Rampman or Bychan. We will extend our synthesizer function with additional code for the "weighted" case, i.e. weights is not None. If weights are given, we will have to use the function weighted_cartesian_choice from the bk_random module. If "weights" is set to None, we will have to call the function cartesian_choice. We put this decision into a different subfunction of synthesizer to keep the function synthesize clearer.

We do not want to fiddle around with probabilites between 0 and 1 in defining the weights, so we take the detour with integer, which we normalize afterwards.

from bk_random import cartesian_choice, weighted_cartesian_choice
weighted_firstnames = [ ("John", 80), ("Eve", 70), ("Jane", 2), 
                        ("Paul", 8), ("Frank", 20), ("Laura", 6), 
                        ("Robert", 17), ("Zoe", 3), ("Roger", 8), 
                        ("Edgar", 4), ("Susanne", 11), ("Dorothee", 22),
                        ("Tim", 17), ("Donald", 12), ("Igor", 15),
                        ("Simone", 9), ("Bernard", 8), ("Sarah", 7),
                        ("Yvonne", 11), ("Bill", 12), ("Bernd", 10)]
weighted_surnames = [('Singer', 2), ('Miles', 2), ('Moore', 5),
                     ('Strongman', 5), ('Romero', 3), ("Yiang", 4),
                     ('Looper', 1), ('Rampman', 1), ('Chopman', 1), 
                     ('Smiley', 1), ('Bychan', 1), ('Smith', 150), 
                     ('Baker', 144), ('Miller', 87), ('Cook', 5),
                     ('Joyce', 1), ('Bush', 5), ('Shorter', 6), 
                     ('Wagner', 10), ('Sundigos', 10), ('Firenze', 8),
                     ('Puttner', 20), ('Faulkner', 10), ('Bowman', 11),
                     ('Klein', 1), ('Jungster', 14), ("Warner", 14),
                     ('Tiller', 9), ('Wogner', 10), ('Blumenthal', 16)]
firstnames, weights = zip(*weighted_firstnames)
wsum = sum(weights)
weights_firstnames = [ x / wsum for x in weights]
surnames, weights = zip(*weighted_surnames)
wsum = sum(weights)
weights_surnames = [ x / wsum for x in weights]
weights = (weights_firstnames, weights_surnames)
def synthesizer( data, weights=None, format_func=None, repeats=True):
    """
    "data" is a tuple or list of lists or tuples containing the 
    data.
    
    "weights" is a list or tuple of lists or tuples with the 
    corresponding weights of the data lists or tuples.
    
    "format_func" is a reference to a function which defines
    how a random result of the creator function will be formated. 
    If None,the generator "synthesize" will yield the list "res".
    
    If "repeats" is set to True, the output values yielded by 
    "synthesize" will not be unique.
    """
        
    def choice(data, weights):
        if weights:
            return weighted_cartesian_choice(*zip(data, weights))
        else:
            return cartesian_choice(*data)
        
    def synthesize():
        if not repeats:
            memory = set()
        while True:
            res = choice(data, weights)
            if not repeats:
                sres = str(res)
                while sres in memory:
                    res = choice(data, weights)
                    sres = str(res)
                memory.add(sres)
            if format_func:
                yield format_func(res)
            else:
                yield res
    return synthesize
        
recruit_employee = synthesizer( (firstnames, surnames), 
                                weights = weights,
                                format_func=lambda x: " ".join(x),
                                repeats=False)
employee = recruit_employee()
for _ in range(12):
    print(next(employee))
Frank Baker
Frank Smith
Eve Smith
Dorothee Baker
John Smith
Bill Bush
John Sundigos
Laura Blumenthal
Zoe Smith
Igor Baker
Bill Miller
Eve Baker



Wine Example

grapes

Let's imagine that you have to describe a dozen wines. Most probably a nice imagination for many, but I have to admit that it is not for me. The main reason is that I am not a wine drinker!

We can write a little Python program, which will use our synthesize function to create automatically "sophisticated criticisms" like this one:

This wine is light-bodied with a conveniently juicy bouquet leading to a lingering flamboyant finish!

Try to find some adverbs, like "seamlessly", "assertively", and some adjectives, like "fruity" and "refined", to describe the aroma.

If you have defined your lists, you can use the synthesize function.

Here is our solution, in case you don't want to do it on your own:

import bk_random
body = ['light-bodied', 'medium-bodied', 'full-bodied']
    
adverbs = ['appropriately', 'assertively', 'authoritatively', 
           'compellingly', 'completely', 'continually', 
           'conveniently', 'credibly', 'distinctively', 
           'dramatically', 'dynamically', 'efficiently', 
           'energistically', 'enthusiastically', 'fungibly', 
           'globally', 'holisticly', 'interactively', 
           'intrinsically', 'monotonectally', 'objectively', 
           'phosfluorescently', 'proactively', 'professionally', 
           'progressively', 'quickly', 'rapidiously', 
           'seamlessly', 'synergistically', 'uniquely']
noun = ['aroma', 'bouquet', 'flavour']
aromas = ['angular', 'bright', 'lingering', 'butterscotch', 
          'buttery', 'chocolate', 'complex', 'earth', 'flabby', 
          'flamboyant', 'fleshy', 'flowers', 'food friendly', 
          'fruits', 'grass', 'herbs', 'jammy', 'juicy', 'mocha', 
          'oaked', 'refined', 'structured', 'tight', 'toast',
          'toasty', 'tobacco', 'unctuous', 'unoaked', 'vanilla', 
          'velvetly']
          
example = """This wine is light-bodied with a completely buttery 
bouquet leading to a lingering fruity  finish!"""
def describe(data):
    body, adv, adj, noun, adj2 = data
    format_str = "This wine is %s with a %s %s %s\nleading to"
    format_str += " a lingering %s finish!"
    return format_str % (body, adv, adj, noun, adj2)  
    
t = bk_random.cartesian_choice(body, adverbs, aromas, noun, aromas)
data = (body, adverbs, aromas, noun, aromas)
synthesize = synthesizer( data, weights=None, format_func=describe, repeats=True)
criticism = synthesize()
for i in range(1, 13):
    print("{0:d}. wine:".format(i))
    print(next(criticism))
    print()
1. wine:
This wine is light-bodied with a progressively earth bouquet
leading to a lingering complex finish!
2. wine:
This wine is medium-bodied with a energistically unctuous bouquet
leading to a lingering vanilla finish!
3. wine:
This wine is medium-bodied with a synergistically flamboyant flavour
leading to a lingering unoaked finish!
4. wine:
This wine is light-bodied with a uniquely toasty flavour
leading to a lingering juicy finish!
5. wine:
This wine is full-bodied with a holisticly flowers flavour
leading to a lingering tobacco finish!
6. wine:
This wine is full-bodied with a energistically toasty flavour
leading to a lingering chocolate finish!
7. wine:
This wine is full-bodied with a proactively tobacco bouquet
leading to a lingering velvetly finish!
8. wine:
This wine is full-bodied with a authoritatively mocha aroma
leading to a lingering juicy finish!
9. wine:
This wine is light-bodied with a dynamically vanilla flavour
leading to a lingering juicy finish!
10. wine:
This wine is medium-bodied with a dynamically structured flavour
leading to a lingering complex finish!
11. wine:
This wine is full-bodied with a distinctively fruits flavour
leading to a lingering complex finish!
12. wine:
This wine is medium-bodied with a conveniently tight aroma
leading to a lingering chocolate finish!



Exercise: International Disaster Operation

World of Flags

It would be gorgeous, if the problem described in this exercise, would be purely synthetic, i.e. there would be no further catastophes in the world. Completely unrealistic, but a nice daydream. So, the task of this exercise is to provide synthetical test data for an international disaster operation. The countries taking part in this mission might be e.g. France, Switzerland, Germany, Canada, The Netherlands, The United States, Austria, Belgium and Luxembourg.

We want to create a file with random entries of aides. Each line should consist of:

UniqueIdentifier, FirstName, LastName, Country, Field

For example:

001, Jean-Paul,  Rennier, France, Medical Aid
002, Nathan, Bloomfield, Canada, Security Aid
003, Michael, Mayer, Germany, Social Worker

For practical reasons, we will reduce the countries to France, Italy, Switzerland and Germany in the following example implementation:

from bk_random import cartesian_choice, weighted_cartesian_choice
countries = ["France", "Switzerland", "Germany"]
w_firstnames = { "France" : [ ("Marie", 10), ("Thomas", 10), 
                            ("Camille", 10), ("Nicolas", 9),
                            ("Léa", 10), ("Julien", 9), 
                            ("Manon", 9), ("Quentin", 9), 
                            ("Chloé", 8), ("Maxime", 9), 
                            ("Laura", 7), ("Alexandre", 6),
                            ("Clementine", 2), ("Grégory", 2), 
                            ("Sandra", 1), ("Philippe", 1)],
               "Switzerland": [ ("Sarah", 10), ("Hans", 10), 
                            ("Laura", 9), ("Peter", 8),
                            ("Mélissa", 9), ("Walter", 7), 
                            ("Océane", 7), ("Daniel", 7), 
                            ("Noémie", 6), ("Reto", 7), 
                            ("Laura", 7), ("Bruno", 6),
                            ("Eva", 2), ("Urli", 4), 
                            ("Sandra", 1), ("Marcel", 1)],
               "Germany": [ ("Ursula", 10), ("Peter", 10), 
                            ("Monika", 9), ("Michael", 8),
                            ("Brigitte", 9), ("Thomas", 7), 
                            ("Stefanie", 7), ("Andreas", 7), 
                            ("Maria", 6), ("Wolfgang", 7), 
                            ("Gabriele", 7), ("Manfred", 6),
                            ("Nicole", 2), ("Matthias", 4), 
                            ("Christine", 1), ("Dirk", 1)],
               "Italy" : [ ("Francesco", 20), ("Alessandro", 19), 
                            ("Mattia", 19), ("Lorenzo", 18),
                            ("Leonardo", 16), ("Andrea", 15), 
                            ("Gabriele", 14), ("Matteo", 14), 
                            ("Tommaso", 12), ("Riccardo", 11), 
                            ("Sofia", 20), ("Aurora", 18),
                            ("Giulia", 16), ("Giorgia", 15), 
                            ("Alice", 14), ("Martina", 13)]}       
                        
w_surnames = { "France" : [ ("Matin", 10), ("Bernard", 10), 
                          ("Camille", 10), ("Nicolas", 9),
                          ("Dubois", 10), ("Petit", 9), 
                            ("Durand", 8), ("Leroy", 8), 
                            ("Fournier", 7), ("Lambert", 6), 
                            ("Mercier", 5), ("Rousseau", 4),
                            ("Mathieu", 2), ("Fontaine", 2), 
                            ("Muller", 1), ("Robin", 1)],
               "Switzerland": [ ("Müller", 10), ("Meier", 10), 
                            ("Schmid", 9), ("Keller", 8),
                            ("Weber", 9), ("Huber", 7), 
                            ("Schneider", 7), ("Meyer", 7), 
                            ("Steiner", 6), ("Fischer", 7), 
                            ("Gerber", 7), ("Brunner", 6),
                            ("Baumann", 2), ("Frei", 4), 
                            ("Zimmermann", 1), ("Moser", 1)],
               "Germany": [ ("Müller", 10), ("Schmidt", 10), 
                            ("Schneider", 9), ("Fischer", 8),
                            ("Weber", 9), ("Meyer", 7), 
                            ("Wagner", 7), ("Becker", 7), 
                            ("Schulz", 6), ("Hoffmann", 7), 
                            ("Schäfer", 7), ("Koch", 6),
                            ("Bauer", 2), ("Richter", 4), 
                            ("Klein", 2), ("Schröder", 1)],
               "Italy" : [ ("Rossi", 20), ("Russo", 19), 
                            ("Ferrari", 19), ("Esposito", 18),
                            ("Bianchi", 16), ("Romano", 15), 
                            ("Colombo", 14), ("Ricci", 14), 
                            ("Marino", 12), ("Grecco", 11), 
                            ("Bruno", 10), ("Gallo", 12),
                            ("Conti", 16), ("De Luca", 15), 
                            ("Costa", 14), ("Giordano", 13),
                            ("Mancini", 14), ("Rizzo", 13),
                            ("Lombardi", 11), ("Moretto", 9)]}
# separate names and weights
synthesize = {}
identifier = 1
for country in w_firstnames:
    firstnames, weights = zip(*w_firstnames[country])
    wsum = sum(weights)
    weights_firstnames = [ x / wsum for x in weights]
    w_firstnames[country] = [firstnames, weights_firstnames]
    surnames, weights = zip(*w_surnames[country])
    wsum = sum(weights)
    weights_surnames = [ x / wsum for x in weights]
    w_surnames[country] = [surnames, weights_firstnames]
    synthesize[country] = synthesizer( (firstnames, surnames), 
                                       (weights_firstnames, 
                                        weights_surnames),
                                 format_func=lambda x: " ".join(x),
                                 repeats=False)
nation_prob = [("Germany", 0.3), 
               ("France", 0.4), 
               ("Switzerland", 0.2),
               ("Italy", 0.1)]
profession_prob = [("Medical Aid", 0.3), 
                   ("Social Worker", 0.6), 
                   ("Security Aid", 0.1)]
helpers = []
for _ in range(200):
    country = weighted_cartesian_choice(zip(*nation_prob))
    profession = weighted_cartesian_choice(zip(*profession_prob))
    country, profession = country[0], profession[0]
    s = synthesize[country]()
    uid = "{id:05d}".format(id=identifier)
    helpers.append((uid, country, next(s), profession ))
    identifier += 1
    
print(helpers)
[('00001', 'Germany', 'Brigitte Wagner', 'Social Worker'), ('00002', 'France', 'Chloé Muller', 'Medical Aid'), ('00003', 'Switzerland', 'Laura Steiner', 'Medical Aid'), ('00004', 'France', 'Laura Matin', 'Medical Aid'), ('00005', 'France', 'Léa Fontaine', 'Social Worker'), ('00006', 'Switzerland', 'Océane Meyer', 'Social Worker'), ('00007', 'France', 'Léa Fournier', 'Social Worker'), ('00008', 'France', 'Marie Matin', 'Social Worker'), ('00009', 'France', 'Laura Durand', 'Security Aid'), ('00010', 'France', 'Maxime Dubois', 'Social Worker'), ('00011', 'France', 'Nicolas Mercier', 'Social Worker'), ('00012', 'Italy', 'Mattia Gallo', 'Medical Aid'), ('00013', 'France', 'Quentin Leroy', 'Social Worker'), ('00014', 'Germany', 'Wolfgang Koch', 'Medical Aid'), ('00015', 'France', 'Manon Matin', 'Social Worker'), ('00016', 'Switzerland', 'Mélissa Schneider', 'Social Worker'), ('00017', 'Germany', 'Thomas Koch', 'Social Worker'), ('00018', 'Germany', 'Wolfgang Schäfer', 'Medical Aid'), ('00019', 'Germany', 'Peter Schäfer', 'Security Aid'), ('00020', 'Italy', 'Alice Costa', 'Medical Aid'), ('00021', 'Switzerland', 'Océane Steiner', 'Social Worker'), ('00022', 'France', 'Manon Durand', 'Medical Aid'), ('00023', 'Switzerland', 'Daniel Meier', 'Social Worker'), ('00024', 'France', 'Laura Fournier', 'Social Worker'), ('00025', 'Switzerland', 'Daniel Schneider', 'Security Aid'), ('00026', 'Germany', 'Maria Weber', 'Social Worker'), ('00027', 'Switzerland', 'Sarah Weber', 'Medical Aid'), ('00028', 'Germany', 'Wolfgang Weber', 'Social Worker'), ('00029', 'Germany', 'Michael Fischer', 'Social Worker'), ('00030', 'Germany', 'Stefanie Hoffmann', 'Social Worker'), ('00031', 'France', 'Laura Mercier', 'Social Worker'), ('00032', 'France', 'Nicolas Leroy', 'Social Worker'), ('00033', 'Germany', 'Peter Becker', 'Social Worker'), ('00034', 'France', 'Maxime Petit', 'Social Worker'), ('00035', 'France', 'Maxime Matin', 'Security Aid'), ('00036', 'Germany', 'Stefanie Becker', 'Medical Aid'), ('00037', 'France', 'Laura Petit', 'Social Worker'), ('00038', 'Switzerland', 'Hans Fischer', 'Social Worker'), ('00039', 'France', 'Nicolas Leroy', 'Medical Aid'), ('00040', 'France', 'Léa Matin', 'Social Worker'), ('00041', 'Switzerland', 'Bruno Fischer', 'Social Worker'), ('00042', 'France', 'Julien Dubois', 'Medical Aid'), ('00043', 'France', 'Alexandre Petit', 'Social Worker'), ('00044', 'France', 'Camille Camille', 'Social Worker'), ('00045', 'France', 'Camille Rousseau', 'Medical Aid'), ('00046', 'France', 'Julien Lambert', 'Social Worker'), ('00047', 'France', 'Léa Dubois', 'Social Worker'), ('00048', 'Italy', 'Lorenzo Mancini', 'Security Aid'), ('00049', 'Germany', 'Ursula Hoffmann', 'Social Worker'), ('00050', 'Germany', 'Brigitte Meyer', 'Medical Aid'), ('00051', 'France', 'Sandra Lambert', 'Social Worker'), ('00052', 'Italy', 'Alice Rizzo', 'Medical Aid'), ('00053', 'France', 'Chloé Nicolas', 'Social Worker'), ('00054', 'Germany', 'Gabriele Schröder', 'Social Worker'), ('00055', 'France', 'Thomas Durand', 'Medical Aid'), ('00056', 'France', 'Léa Dubois', 'Medical Aid'), ('00057', 'France', 'Maxime Mercier', 'Social Worker'), ('00058', 'Germany', 'Peter Schmidt', 'Social Worker'), ('00059', 'France', 'Quentin Durand', 'Social Worker'), ('00060', 'France', 'Camille Petit', 'Social Worker'), ('00061', 'Switzerland', 'Laura Schmid', 'Medical Aid'), ('00062', 'Italy', 'Gabriele Lombardi', 'Social Worker'), ('00063', 'Switzerland', 'Peter Meier', 'Medical Aid'), ('00064', 'Switzerland', 'Reto Huber', 'Medical Aid'), ('00065', 'Italy', 'Matteo Mancini', 'Medical Aid'), ('00066', 'France', 'Marie Petit', 'Social Worker'), ('00067', 'Germany', 'Manfred Hoffmann', 'Medical Aid'), ('00068', 'Germany', 'Brigitte Schmidt', 'Medical Aid'), ('00069', 'France', 'Manon Matin', 'Medical Aid'), ('00070', 'France', 'Nicolas Petit', 'Social Worker'), ('00071', 'France', 'Léa Petit', 'Social Worker'), ('00072', 'Germany', 'Monika Schulz', 'Social Worker'), ('00073', 'Italy', 'Mattia Rizzo', 'Social Worker'), ('00074', 'Italy', 'Sofia Colombo', 'Social Worker'), ('00075', 'Germany', 'Michael Schäfer', 'Medical Aid'), ('00076', 'Germany', 'Matthias Hoffmann', 'Social Worker'), ('00077', 'Germany', 'Wolfgang Schneider', 'Social Worker'), ('00078', 'France', 'Julien Dubois', 'Social Worker'), ('00079', 'Germany', 'Peter Fischer', 'Social Worker'), ('00080', 'France', 'Julien Leroy', 'Social Worker'), ('00081', 'France', 'Julien Bernard', 'Social Worker'), ('00082', 'Germany', 'Michael Schmidt', 'Social Worker'), ('00083', 'France', 'Manon Bernard', 'Social Worker'), ('00084', 'Switzerland', 'Hans Huber', 'Security Aid'), ('00085', 'Germany', 'Monika Schneider', 'Medical Aid'), ('00086', 'Switzerland', 'Noémie Müller', 'Security Aid'), ('00087', 'Switzerland', 'Sarah Gerber', 'Medical Aid'), ('00088', 'Germany', 'Thomas Müller', 'Medical Aid'), ('00089', 'Switzerland', 'Sarah Weber', 'Medical Aid'), ('00090', 'France', 'Laura Petit', 'Medical Aid'), ('00091', 'Switzerland', 'Sarah Gerber', 'Medical Aid'), ('00092', 'Switzerland', 'Reto Schmid', 'Medical Aid'), ('00093', 'Germany', 'Monika Schneider', 'Medical Aid'), ('00094', 'France', 'Quentin Matin', 'Social Worker'), ('00095', 'Italy', 'Aurora Colombo', 'Social Worker'), ('00096', 'Germany', 'Ursula Meyer', 'Social Worker'), ('00097', 'Germany', 'Manfred Weber', 'Social Worker'), ('00098', 'Italy', 'Giulia Ferrari', 'Medical Aid'), ('00099', 'France', 'Thomas Muller', 'Social Worker'), ('00100', 'Switzerland', 'Daniel Schneider', 'Medical Aid'), ('00101', 'France', 'Maxime Camille', 'Medical Aid'), ('00102', 'France', 'Laura Petit', 'Social Worker'), ('00103', 'Germany', 'Manfred Schmidt', 'Medical Aid'), ('00104', 'Italy', 'Martina Lombardi', 'Social Worker'), ('00105', 'Switzerland', 'Sarah Baumann', 'Medical Aid'), ('00106', 'Switzerland', 'Bruno Gerber', 'Security Aid'), ('00107', 'Switzerland', 'Laura Müller', 'Social Worker'), ('00108', 'Germany', 'Andreas Weber', 'Social Worker'), ('00109', 'Switzerland', 'Hans Fischer', 'Social Worker'), ('00110', 'Switzerland', 'Daniel Meyer', 'Social Worker'), ('00111', 'France', 'Julien Rousseau', 'Security Aid'), ('00112', 'Switzerland', 'Reto Schmid', 'Social Worker'), ('00113', 'Switzerland', 'Urli Schneider', 'Social Worker'), ('00114', 'France', 'Grégory Rousseau', 'Medical Aid'), ('00115', 'France', 'Marie Durand', 'Social Worker'), ('00116', 'France', 'Léa Durand', 'Social Worker'), ('00117', 'France', 'Camille Matin', 'Medical Aid'), ('00118', 'Germany', 'Wolfgang Schneider', 'Social Worker'), ('00119', 'France', 'Julien Matin', 'Social Worker'), ('00120', 'France', 'Marie Leroy', 'Social Worker'), ('00121', 'Switzerland', 'Mélissa Brunner', 'Security Aid'), ('00122', 'Germany', 'Ursula Schneider', 'Social Worker'), ('00123', 'France', 'Camille Mercier', 'Social Worker'), ('00124', 'France', 'Julien Camille', 'Social Worker'), ('00125', 'Switzerland', 'Laura Schmid', 'Medical Aid'), ('00126', 'France', 'Camille Durand', 'Social Worker'), ('00127', 'France', 'Marie Camille', 'Medical Aid'), ('00128', 'Germany', 'Monika Wagner', 'Social Worker'), ('00129', 'Italy', 'Giorgia Esposito', 'Security Aid'), ('00130', 'France', 'Clementine Mercier', 'Social Worker'), ('00131', 'France', 'Marie Matin', 'Social Worker'), ('00132', 'Switzerland', 'Noémie Brunner', 'Medical Aid'), ('00133', 'France', 'Nicolas Leroy', 'Security Aid'), ('00134', 'France', 'Camille Camille', 'Social Worker'), ('00135', 'Germany', 'Wolfgang Fischer', 'Medical Aid'), ('00136', 'Germany', 'Brigitte Müller', 'Medical Aid'), ('00137', 'Germany', 'Peter Schneider', 'Social Worker'), ('00138', 'Switzerland', 'Laura Schneider', 'Medical Aid'), ('00139', 'France', 'Chloé Rousseau', 'Social Worker'), ('00140', 'Italy', 'Alice De Luca', 'Medical Aid'), ('00141', 'France', 'Thomas Bernard', 'Social Worker'), ('00142', 'Italy', 'Francesco Grecco', 'Medical Aid'), ('00143', 'Switzerland', 'Peter Frei', 'Medical Aid'), ('00144', 'France', 'Philippe Mercier', 'Security Aid'), ('00145', 'Germany', 'Monika Meyer', 'Social Worker'), ('00146', 'France', 'Alexandre Lambert', 'Medical Aid'), ('00147', 'Switzerland', 'Sarah Brunner', 'Security Aid'), ('00148', 'Germany', 'Wolfgang Schneider', 'Social Worker'), ('00149', 'Germany', 'Manfred Müller', 'Social Worker'), ('00150', 'France', 'Léa Dubois', 'Medical Aid'), ('00151', 'Switzerland', 'Reto Schmid', 'Medical Aid'), ('00152', 'France', 'Manon Lambert', 'Social Worker'), ('00153', 'France', 'Chloé Fournier', 'Social Worker'), ('00154', 'France', 'Grégory Bernard', 'Social Worker'), ('00155', 'Italy', 'Martina Bruno', 'Social Worker'), ('00156', 'France', 'Marie Nicolas', 'Social Worker'), ('00157', 'Italy', 'Giorgia Romano', 'Social Worker'), ('00158', 'France', 'Thomas Mercier', 'Security Aid'), ('00159', 'Germany', 'Manfred Richter', 'Social Worker'), ('00160', 'Germany', 'Wolfgang Schäfer', 'Social Worker'), ('00161', 'Germany', 'Peter Müller', 'Security Aid'), ('00162', 'Switzerland', 'Océane Meyer', 'Social Worker'), ('00163', 'Germany', 'Monika Schneider', 'Social Worker'), ('00164', 'France', 'Chloé Dubois', 'Social Worker'), ('00165', 'Germany', 'Peter Fischer', 'Social Worker'), ('00166', 'Germany', 'Christine Müller', 'Social Worker'), ('00167', 'Switzerland', 'Walter Steiner', 'Security Aid'), ('00168', 'Germany', 'Dirk Bauer', 'Medical Aid'), ('00169', 'Germany', 'Matthias Schmidt', 'Social Worker'), ('00170', 'Germany', 'Andreas Schneider', 'Medical Aid'), ('00171', 'Italy', 'Gabriele Grecco', 'Medical Aid'), ('00172', 'France', 'Léa Matin', 'Security Aid'), ('00173', 'France', 'Nicolas Dubois', 'Social Worker'), ('00174', 'Switzerland', 'Bruno Fischer', 'Social Worker'), ('00175', 'France', 'Camille Matin', 'Social Worker'), ('00176', 'Switzerland', 'Mélissa Zimmermann', 'Social Worker'), ('00177', 'Germany', 'Stefanie Becker', 'Medical Aid'), ('00178', 'France', 'Maxime Leroy', 'Social Worker'), ('00179', 'Germany', 'Michael Fischer', 'Security Aid'), ('00180', 'Germany', 'Stefanie Schmidt', 'Medical Aid'), ('00181', 'Germany', 'Peter Schneider', 'Social Worker'), ('00182', 'Switzerland', 'Laura Huber', 'Social Worker'), ('00183', 'France', 'Marie Fournier', 'Medical Aid'), ('00184', 'Italy', 'Leonardo Moretto', 'Social Worker'), ('00185', 'Germany', 'Peter Meyer', 'Social Worker'), ('00186', 'France', 'Alexandre Durand', 'Social Worker'), ('00187', 'Switzerland', 'Walter Müller', 'Social Worker'), ('00188', 'France', 'Chloé Leroy', 'Medical Aid'), ('00189', 'Switzerland', 'Walter Weber', 'Social Worker'), ('00190', 'Switzerland', 'Sarah Steiner', 'Social Worker'), ('00191', 'Germany', 'Wolfgang Fischer', 'Social Worker'), ('00192', 'Germany', 'Matthias Becker', 'Security Aid'), ('00193', 'Germany', 'Ursula Schäfer', 'Social Worker'), ('00194', 'Switzerland', 'Océane Keller', 'Security Aid'), ('00195', 'Germany', 'Brigitte Richter', 'Medical Aid'), ('00196', 'Germany', 'Ursula Müller', 'Medical Aid'), ('00197', 'Italy', 'Tommaso Rizzo', 'Social Worker'), ('00198', 'Switzerland', 'Marcel Fischer', 'Social Worker'), ('00199', 'France', 'Léa Petit', 'Medical Aid'), ('00200', 'France', 'Nicolas Camille', 'Security Aid')]
with open("disaster_mission.txt", "w") as fh:
    fh.write("Reference number,Country,Name,Function\n")
    for el in helpers:
        fh.write(",".join(el) + "\n")