8 faces with fully connected networks

In this excercise we work with the 8 faces dataset. this dataset has 350 images of 8 celebrities.
To get an overview of the data open the notebook 8 faces overview and look at the celebrities and the images.
The data is from a random sample of 8 persons of the OXFORD VGG Face dataset (over 2600 Persons),
for more information look here: http://www.robots.ox.ac.uk/~vgg/data/vgg_face/

a) Open the notebook 8 faces only fc and bulit this network and then train it.
How good is the model? Look at the train valid and test accuracy.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_1 (Dense)                  (None, 8)             55304       dense_input_1[0][0]              
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 8)             0           dense_1[0][0]                    
====================================================================================================
Total params: 55,304
Trainable params: 55,304
Non-trainable params: 0
____________________________________________________________________________________________________

b) Now let’s add some hidden layers to the network.
Restart the notebook and built a new network with hidden layers, see below.
How good is this model? Look at the train valid and test accuracy.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_1 (Dense)                  (None, 400)           2765200     dense_input_1[0][0]              
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 400)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 200)           80200       activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 200)           0           dense_2[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 8)             1608        activation_2[0][0]               
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 8)             0           dense_3[0][0]                    
====================================================================================================
Total params: 2,847,008
Trainable params: 2,847,008
Non-trainable params: 0
____________________________________________________________________________________________________

8 faces with convolutional neural networks

Hint: the training of the networks takes some time because we compute only on the cpu.
(up to 1h with the last network)

a) Open the notebook 8 faces cnn and bulit this network and then train it.
Do you expect it to be better then the last one with only fully connected layers?
How good is the model? Look at the train valid and test accuracy.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 48, 48, 32)    896         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 48, 48, 32)    0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 48, 48, 32)    9248        activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 48, 48, 32)    0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 24, 24, 32)    0           activation_2[0][0]               
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 24, 24, 64)    18496       maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 24, 24, 64)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 24, 24, 64)    36928       activation_3[0][0]               
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 24, 24, 64)    0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 12, 12, 64)    0           activation_4[0][0]               
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 9216)          0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 200)           1843400     flatten_1[0][0]                  
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 200)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 8)             1608        activation_5[0][0]               
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 8)             0           dense_2[0][0]                    
====================================================================================================
Total params: 1,910,576
Trainable params: 1,910,576
Non-trainable params: 0
____________________________________________________________________________________________________

b) Now let’s add the tricks which we already used on the MNIST dataset.
Restart the notebook and built the same network as above and add dropout layers, see below.
How good is the model now? Look at the train valid and test accuracy.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 48, 48, 32)    896         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 48, 48, 32)    0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 48, 48, 32)    9248        activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 48, 48, 32)    0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 24, 24, 32)    0           activation_2[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 24, 24, 32)    0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 24, 24, 64)    18496       dropout_1[0][0]                  
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 24, 24, 64)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 24, 24, 64)    36928       activation_3[0][0]               
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 24, 24, 64)    0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 12, 12, 64)    0           activation_4[0][0]               
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 12, 12, 64)    0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 9216)          0           dropout_2[0][0]                  
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 200)           1843400     flatten_1[0][0]                  
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 200)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 200)           0           activation_5[0][0]               
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 8)             1608        dropout_3[0][0]                  
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 8)             0           dense_2[0][0]                    
====================================================================================================
Total params: 1,910,576
Trainable params: 1,910,576
Non-trainable params: 0
____________________________________________________________________________________________________

c) Finally add batchnormalization to your network.
Restart the notebook and built the same network as above and add batchnormalization layers, see below.
How good is the model now? Look at the train valid and test accuracy.

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 48, 48, 32)    896         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 48, 48, 32)    128         convolution2d_1[0][0]            
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 48, 48, 32)    0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 48, 48, 32)    9248        activation_1[0][0]               
____________________________________________________________________________________________________
batchnormalization_2 (BatchNorma (None, 48, 48, 32)    128         convolution2d_2[0][0]            
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 48, 48, 32)    0           batchnormalization_2[0][0]       
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 24, 24, 32)    0           activation_2[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 24, 24, 32)    0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 24, 24, 64)    18496       dropout_1[0][0]                  
____________________________________________________________________________________________________
batchnormalization_3 (BatchNorma (None, 24, 24, 64)    256         convolution2d_3[0][0]            
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 24, 24, 64)    0           batchnormalization_3[0][0]       
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 24, 24, 64)    36928       activation_3[0][0]               
____________________________________________________________________________________________________
batchnormalization_4 (BatchNorma (None, 24, 24, 64)    256         convolution2d_4[0][0]            
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 24, 24, 64)    0           batchnormalization_4[0][0]       
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 12, 12, 64)    0           activation_4[0][0]               
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 12, 12, 64)    0           maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 9216)          0           dropout_2[0][0]                  
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 200)           1843400     flatten_1[0][0]                  
____________________________________________________________________________________________________
batchnormalization_5 (BatchNorma (None, 200)           800         dense_1[0][0]                    
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 200)           0           batchnormalization_5[0][0]       
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 200)           0           activation_5[0][0]               
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 8)             1608        dropout_3[0][0]                  
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 8)             0           dense_2[0][0]                    
====================================================================================================
Total params: 1,912,144
Trainable params: 1,911,360
Non-trainable params: 784
____________________________________________________________________________________________________