CNN : Watching the world through Neural Networks - Part 2

Using a pretrained convnet

A common and highly effective approach to deep learning on small image datasets is to use a pretrained network. A pretrained network is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. If this original dataset is large enough and general enough, then the spatial hierarchy of features learned by the pretrained network can effectively act as a generic model of the visual world

For instance, you might train a network on ImageNet (where classes are mostly animals and everyday objects) and then repurpose this trained network for something as remote as identifying furniture items in images. Such portability of learned features across different problems is a key advantage of deep learning compared to many older, shallow-learning approaches, and it makes deep learning very effective for small-data problems

Imagenet Dataset & VGG 16

In this case, let’s consider a large convnet trained on the ImageNet dataset (1.4 million labeled images and 1,000 different classes). ImageNet contains many animal classes, including different species of cats and dogs, and you can thus expect to perform well on the dogs-versus-cats classification problem . You’ll use the VGG16 architecture, developed by Karen Simonyan and Andrew Zisserman in 2014; it’s a simple and widely used convnet architecture for ImageNet. its architecture is similar to
what you’re already familiar with and is easy to understand without introducing any new concepts

Procedure

There are two ways to use a pretrained network:

  • Feature Extraction
  • Fine Tuning

Method #1 : Feature Extraction

Feature extraction consists of using the representations learned by a previous network to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch

Feature extraction consists of taking the convolutional base of a previously trained network, running the new data through it, and training a new classifier on top of the output

image.png

Why only reuse the convolutional base? Could you reuse the densely connected classifier as well?

  • In general, doing so should be avoided
  • The reason is that the representations learned by the convolutional base are likely to be more generic and therefore more reusable
  • The feature maps of a convnet are presence maps of generic concepts over a picture, which is likely to be useful regardless of the computer-vision problem at hand
  • Representations found in densely connected layers no longer contain any information about where objects are located in the input image: these layers get rid of the notion of space , whereas the object location is still described by convolutional feature maps
  • For problems where object location matters, densely connected features are largely useless

Note that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. Layers that come earlier in the model extract local, highly generic feature maps (such as visual edges, colors, and textures), whereas layers that are higher up extract more-abstract concepts (such as “cat ear” or “dog eye”)

So if your new dataset differs a lot from the dataset on which the original model was trained, you may be better off using only the first few layers of the model to do feature extraction, rather than using the entire convolutional base

Old Preprocessings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/dogs-vs-cats-redux-kernels-edition/sample_submission.csv
/kaggle/input/dogs-vs-cats-redux-kernels-edition/train.zip
/kaggle/input/dogs-vs-cats-redux-kernels-edition/test.zip
/kaggle/input/vgg16-mod/cats_and_dogs_transfer_learning_1_vgg16.h5
1
2
3
import zipfile
with zipfile.ZipFile("../input/dogs-vs-cats-redux-kernels-edition/"+"train"+".zip","r") as z:
z.extractall(".")
1
2
3
import zipfile
with zipfile.ZipFile("../input/dogs-vs-cats-redux-kernels-edition/"+"test"+".zip","r") as z:
z.extractall(".")
1
2
3
4
5
6
7
8
import os, cv2, re, random
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img
from keras import layers, models, optimizers
from keras import backend as K
from sklearn.model_selection import train_test_split
1
2
3
4
5
6
img_width = 150
img_height = 150
TRAIN_DIR = '/kaggle/working/train/'
TEST_DIR = '/kaggle/working/test/'
train_images_dogs_cats = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] # use this for full dataset
test_images_dogs_cats = [TEST_DIR+i for i in os.listdir(TEST_DIR)]
1
2
print(len(train_images_dogs_cats))
print(len(test_images_dogs_cats))
25000
12500
1
2
3
4
5
def atoi(text):
return int(text) if text.isdigit() else text

def natural_keys(text):
return [ atoi(c) for c in re.split('(\d+)', text) ]
1
print(natural_keys("cat.0.txt"))
['cat.', 0, '.txt']
1
num_of_each_sample = 1500
1
2
train_images_dogs_cats.sort(key=natural_keys)
print(len(train_images_dogs_cats))
25000
1
2
3
train_images_dogs_cats = train_images_dogs_cats[0:num_of_each_sample] + train_images_dogs_cats[12500:12500+num_of_each_sample] 
test_images_dogs_cats.sort(key=natural_keys)
print(len(train_images_dogs_cats))
3000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def prepare_data(list_of_images):
"""
Returns two arrays:
x is an array of resized images
y is an array of labels
"""
x = [] # images as arrays
y = [] # labels

for image in list_of_images:
x.append(cv2.resize(cv2.imread(image), (img_width,img_height), interpolation=cv2.INTER_CUBIC))

for i in list_of_images:
if 'dog' in i:
y.append(1)
elif 'cat' in i:
y.append(0)
#else:
#print('neither cat nor dog name present in images')
return x, y
1
2
X, Y = prepare_data(train_images_dogs_cats)
print(K.image_data_format())
channels_last
1
2
# First split the data in two sets, 80% for training, 20% for Val/Test)
X_train, X_val, Y_train, Y_val = train_test_split(X,Y, test_size=0.333334, random_state=1)
1
2
3
4
5
6
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
batch_size = 16

print(nb_train_samples)
print(nb_validation_samples)
1999
1001

VGG-16

The VGG16 model, among others, comes prepackaged with Keras. You can import it from the keras.applications module. Here’s the list of image-classification
models (all pretrained on the ImageNet dataset) that are available as part of keras.applications :

  • Xception
  • Inception V3
  • ResNet50
  • VGG16
  • VGG19
  • MobileNet
1
2
3
from keras.applications import VGG16

conv_base = VGG16(weights='imagenet',include_top=False,input_shape=(img_width, img_height, 3))
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 1s 0us/step

You pass three arguments to the constructor :

  • weights specifies the weight checkpoint from which to initialize the model
  • include_top refers to including (or not) the densely connected classifier on top of the network
  • input_shape is the shape of the image tensors that you’ll feed to the network (This argument is purely optional: if you don’t pass it, the network will be able to process inputs of any size)

Architecture

1
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 150, 150, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

The final feature map has shape (4, 4, 512). That’s the feature on top of which you’ll stick a densely connected classifier.

What to do now ?

There are 2 possible options :

  • Running the convolutional base over your dataset, recording its output to a Numpy array on disk, and then using this data as input to a standalone, densely connected classifier

    • Merits : fast
    • Demerits : can’t use data augmentation
  • Extending the model you have (conv_base) by adding Dense layers on top, and running the whole thing end to end on the input data.

    • Merits : allows data augmentation
    • Demerits : slow and expensive

Method 1 Part 1 : Fast Feature Extraction without Data Augmentation

Features are extracted by the predict method of the conv_base model

1
datagen = ImageDataGenerator(rescale=1./255)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def extract_features(X_INPUT,Y_OUTPUT, sample_count):
features = np.zeros(shape=(sample_count, 4, 4, 512))
labels = np.zeros(shape=(sample_count))
generator = datagen.flow(np.array(X_INPUT),Y_OUTPUT,batch_size=batch_size)
i = 0
for inputs_batch, labels_batch in generator:
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
labels[i * batch_size : (i + 1) * batch_size] = labels_batch
i += 1
if i * batch_size >= sample_count:
# Note that since generators yield data indefinitely in a loop,
# we must `break` after every image has been seen once.
break
return features, labels
1
2
train_features, train_labels = extract_features(X_train,Y_train, nb_train_samples)
validation_features, validation_labels = extract_features(X_val, Y_val,nb_validation_samples)
1
2
train_features = np.reshape(train_features, (nb_train_samples, 4 * 4 * 512))
validation_features = np.reshape(validation_features, (nb_validation_samples, 4 * 4 * 512))

dropout is used for regularization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from keras import models
from keras import layers
from keras import optimizers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4 * 4 * 512))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
loss='binary_crossentropy',
metrics=['acc'])

history = model.fit(train_features, train_labels,
epochs=30,
batch_size=batch_size,
validation_data=(validation_features, validation_labels))
Epoch 1/30
125/125 [==============================] - 2s 14ms/step - loss: 0.5894 - acc: 0.6768 - val_loss: 0.3824 - val_acc: 0.8511
Epoch 2/30
125/125 [==============================] - 2s 12ms/step - loss: 0.3750 - acc: 0.8369 - val_loss: 0.3344 - val_acc: 0.8472
Epoch 3/30
125/125 [==============================] - 2s 13ms/step - loss: 0.3237 - acc: 0.8609 - val_loss: 0.2774 - val_acc: 0.8881
Epoch 4/30
125/125 [==============================] - 2s 12ms/step - loss: 0.2762 - acc: 0.8834 - val_loss: 0.2614 - val_acc: 0.8911
Epoch 5/30
125/125 [==============================] - 2s 16ms/step - loss: 0.2553 - acc: 0.8909 - val_loss: 0.2571 - val_acc: 0.8991
Epoch 6/30
125/125 [==============================] - 2s 12ms/step - loss: 0.2368 - acc: 0.9015 - val_loss: 0.2406 - val_acc: 0.9021
Epoch 7/30
125/125 [==============================] - 2s 13ms/step - loss: 0.2095 - acc: 0.9185 - val_loss: 0.2362 - val_acc: 0.9001
Epoch 8/30
125/125 [==============================] - 2s 13ms/step - loss: 0.2087 - acc: 0.9105 - val_loss: 0.2440 - val_acc: 0.9001
Epoch 9/30
125/125 [==============================] - 2s 13ms/step - loss: 0.1881 - acc: 0.9255 - val_loss: 0.2300 - val_acc: 0.9011
Epoch 10/30
125/125 [==============================] - 2s 13ms/step - loss: 0.1772 - acc: 0.9325 - val_loss: 0.2271 - val_acc: 0.9041
Epoch 11/30
125/125 [==============================] - 2s 13ms/step - loss: 0.1727 - acc: 0.9350 - val_loss: 0.2278 - val_acc: 0.9011
Epoch 12/30
125/125 [==============================] - 2s 13ms/step - loss: 0.1598 - acc: 0.9365 - val_loss: 0.2252 - val_acc: 0.9041
Epoch 13/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1529 - acc: 0.9400 - val_loss: 0.2246 - val_acc: 0.9011
Epoch 14/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1449 - acc: 0.9470 - val_loss: 0.2283 - val_acc: 0.9031
Epoch 15/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1403 - acc: 0.9465 - val_loss: 0.2240 - val_acc: 0.9061
Epoch 16/30
125/125 [==============================] - 2s 12ms/step - loss: 0.1299 - acc: 0.9505 - val_loss: 0.2246 - val_acc: 0.9051
Epoch 17/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1247 - acc: 0.9550 - val_loss: 0.2233 - val_acc: 0.9011
Epoch 18/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1173 - acc: 0.9565 - val_loss: 0.2295 - val_acc: 0.9031
Epoch 19/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1187 - acc: 0.9580 - val_loss: 0.2240 - val_acc: 0.9021
Epoch 20/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1052 - acc: 0.9695 - val_loss: 0.2264 - val_acc: 0.9041
Epoch 21/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1025 - acc: 0.9680 - val_loss: 0.2255 - val_acc: 0.9081
Epoch 22/30
125/125 [==============================] - 1s 12ms/step - loss: 0.1045 - acc: 0.9665 - val_loss: 0.2268 - val_acc: 0.9051
Epoch 23/30
125/125 [==============================] - 1s 12ms/step - loss: 0.0942 - acc: 0.9670 - val_loss: 0.2281 - val_acc: 0.9051
Epoch 24/30
125/125 [==============================] - 2s 12ms/step - loss: 0.0901 - acc: 0.9725 - val_loss: 0.2306 - val_acc: 0.9051
Epoch 25/30
125/125 [==============================] - 2s 12ms/step - loss: 0.0859 - acc: 0.9750 - val_loss: 0.2349 - val_acc: 0.9031
Epoch 26/30
125/125 [==============================] - 2s 12ms/step - loss: 0.0851 - acc: 0.9700 - val_loss: 0.2308 - val_acc: 0.9041
Epoch 27/30
125/125 [==============================] - 1s 12ms/step - loss: 0.0824 - acc: 0.9725 - val_loss: 0.2319 - val_acc: 0.9031
Epoch 28/30
125/125 [==============================] - 1s 12ms/step - loss: 0.0796 - acc: 0.9760 - val_loss: 0.2337 - val_acc: 0.9031
Epoch 29/30
125/125 [==============================] - 1s 12ms/step - loss: 0.0761 - acc: 0.9785 - val_loss: 0.2350 - val_acc: 0.9041
Epoch 30/30
125/125 [==============================] - 1s 12ms/step - loss: 0.0720 - acc: 0.9815 - val_loss: 0.2449 - val_acc: 0.9081
1
model.save('dogsVScats_TL_VGG16_feature_extraction.h5')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

png

png

Validation accuracy has increased to 90 % . Much better than the previous techniques . But the plots indicate that the model is overfitting despite using droput with a fairly large rate . The reason is that this technique doesn’t use data augmentation, which is essential for preventing overfitting with small image datasets

Prediction

1
2
3
X_test, Y_test = prepare_data(test_images_dogs_cats) #Y_test in this case will be []
nb_test_samples = len(test_images_dogs_cats)
print(nb_test_samples)
12500
1
2
3
4
5
6
7
8
9
10
11
12
13
def extract_test_features(X_INPUT, sample_count):
features = np.zeros(shape=(sample_count, 4, 4, 512))
generator = datagen.flow(np.array(X_INPUT),batch_size=batch_size)
i = 0
for inputs_batch in generator:
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
i += 1
if i * batch_size >= sample_count:
# Note that since generators yield data indefinitely in a loop,
# we must `break` after every image has been seen once.
break
return features
1
2
test_features = extract_test_features(X_test,nb_test_samples)
test_features = np.reshape(test_features, (nb_test_samples, 4 * 4 * 512))
1
prediction_probabilities = model.predict(test_features, verbose=1)
391/391 [==============================] - 1s 3ms/step
1
2
3
4
5
6
7
# prediction_probabilities_binary = []
# for p in prediction_probabilities :
# if p >= 0.5 :
# prediction_probabilities_binary.append(1)
# else:
# prediction_probabilities_binary.append(0)
# print(len(prediction_probabilities_binary))
1
print(prediction_probabilities.shape)
(12500, 1)
1
2
3
4
5
6
7
8
counter = range(1, len(test_images_dogs_cats) + 1)
solution = pd.DataFrame({"id": counter, "label":list(prediction_probabilities)})
cols = ['label']

for col in cols:
solution[col] = solution[col].map(lambda x: str(x).lstrip('[').rstrip(']')).astype(float)

solution.to_csv("dogsVScats_TL_VGG16_feature_extraction.csv", index = False)