A common and highly effective approach to deep learning on small image datasets is to use a pretrained network. A pretrained network is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. If this original dataset is large enough and general enough, then the spatial hierarchy of features learned by the pretrained network can effectively act as a generic model of the visual world
For instance, you might train a network on ImageNet (where classes are mostly animals and everyday objects) and then repurpose this trained network for something as remote as identifying furniture items in images. Such portability of learned features across different problems is a key advantage of deep learning compared to many older, shallow-learning approaches, and it makes deep learning very effective for small-data problems
Imagenet Dataset & VGG 16
In this case, let’s consider a large convnet trained on the ImageNet dataset (1.4 million labeled images and 1,000 different classes). ImageNet contains many animal classes, including different species of cats and dogs, and you can thus expect to perform well on the dogs-versus-cats classification problem . You’ll use the VGG16 architecture, developed by Karen Simonyan and Andrew Zisserman in 2014; it’s a simple and widely used convnet architecture for ImageNet. its architecture is similar to what you’re already familiar with and is easy to understand without introducing any new concepts
Procedure
There are two ways to use a pretrained network:
Feature Extraction
Fine Tuning
Method #1 : Feature Extraction
Feature extraction consists of using the representations learned by a previous network to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch
Feature extraction consists of taking the convolutional base of a previously trained network, running the new data through it, and training a new classifier on top of the output
Why only reuse the convolutional base? Could you reuse the densely connected classifier as well?
In general, doing so should be avoided
The reason is that the representations learned by the convolutional base are likely to be more generic and therefore more reusable
The feature maps of a convnet are presence maps of generic concepts over a picture, which is likely to be useful regardless of the computer-vision problem at hand
Representations found in densely connected layers no longer contain any information about where objects are located in the input image: these layers get rid of the notion of space , whereas the object location is still described by convolutional feature maps
For problems where object location matters, densely connected features are largely useless
Note that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. Layers that come earlier in the model extract local, highly generic feature maps (such as visual edges, colors, and textures), whereas layers that are higher up extract more-abstract concepts (such as “cat ear” or “dog eye”)
So if your new dataset differs a lot from the dataset on which the original model was trained, you may be better off using only the first few layers of the model to do feature extraction, rather than using the entire convolutional base
Old Preprocessings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory # For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename))
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" # You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
import zipfile with zipfile.ZipFile("../input/dogs-vs-cats-redux-kernels-edition/"+"train"+".zip","r") as z: z.extractall(".")
1 2 3
import zipfile with zipfile.ZipFile("../input/dogs-vs-cats-redux-kernels-edition/"+"test"+".zip","r") as z: z.extractall(".")
1 2 3 4 5 6 7 8
import os, cv2, re, random import numpy as np import pandas as pd from keras.preprocessing.image import ImageDataGenerator from keras.preprocessing.image import img_to_array, load_img from keras import layers, models, optimizers from keras import backend as K from sklearn.model_selection import train_test_split
1 2 3 4 5 6
img_width = 150 img_height = 150 TRAIN_DIR = '/kaggle/working/train/' TEST_DIR = '/kaggle/working/test/' train_images_dogs_cats = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] # use this for full dataset test_images_dogs_cats = [TEST_DIR+i for i in os.listdir(TEST_DIR)]
defprepare_data(list_of_images): """ Returns two arrays: x is an array of resized images y is an array of labels """ x = [] # images as arrays y = [] # labels for image in list_of_images: x.append(cv2.resize(cv2.imread(image), (img_width,img_height), interpolation=cv2.INTER_CUBIC)) for i in list_of_images: if'dog'in i: y.append(1) elif'cat'in i: y.append(0) #else: #print('neither cat nor dog name present in images') return x, y
1 2
X, Y = prepare_data(train_images_dogs_cats) print(K.image_data_format())
channels_last
1 2
# First split the data in two sets, 80% for training, 20% for Val/Test) X_train, X_val, Y_train, Y_val = train_test_split(X,Y, test_size=0.333334, random_state=1)
The VGG16 model, among others, comes prepackaged with Keras. You can import it from the keras.applications module. Here’s the list of image-classification models (all pretrained on the ImageNet dataset) that are available as part of keras.applications :
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 1s 0us/step
You pass three arguments to the constructor :
weights specifies the weight checkpoint from which to initialize the model
include_top refers to including (or not) the densely connected classifier on top of the network
input_shape is the shape of the image tensors that you’ll feed to the network (This argument is purely optional: if you don’t pass it, the network will be able to process inputs of any size)
The final feature map has shape (4, 4, 512). That’s the feature on top of which you’ll stick a densely connected classifier.
What to do now ?
There are 2 possible options :
Running the convolutional base over your dataset, recording its output to a Numpy array on disk, and then using this data as input to a standalone, densely connected classifier
Merits : fast
Demerits : can’t use data augmentation
Extending the model you have (conv_base) by adding Dense layers on top, and running the whole thing end to end on the input data.
Merits : allows data augmentation
Demerits : slow and expensive
Method 1 Part 1 : Fast Feature Extraction without Data Augmentation
Features are extracted by the predict method of the conv_base model
1
datagen = ImageDataGenerator(rescale=1./255)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
defextract_features(X_INPUT,Y_OUTPUT, sample_count): features = np.zeros(shape=(sample_count, 4, 4, 512)) labels = np.zeros(shape=(sample_count)) generator = datagen.flow(np.array(X_INPUT),Y_OUTPUT,batch_size=batch_size) i = 0 for inputs_batch, labels_batch in generator: features_batch = conv_base.predict(inputs_batch) features[i * batch_size : (i + 1) * batch_size] = features_batch labels[i * batch_size : (i + 1) * batch_size] = labels_batch i += 1 if i * batch_size >= sample_count: # Note that since generators yield data indefinitely in a loop, # we must `break` after every image has been seen once. break return features, labels
Validation accuracy has increased to 90 % . Much better than the previous techniques . But the plots indicate that the model is overfitting despite using droput with a fairly large rate . The reason is that this technique doesn’t use data augmentation, which is essential for preventing overfitting with small image datasets
Prediction
1 2 3
X_test, Y_test = prepare_data(test_images_dogs_cats) #Y_test in this case will be [] nb_test_samples = len(test_images_dogs_cats) print(nb_test_samples)
12500
1 2 3 4 5 6 7 8 9 10 11 12 13
defextract_test_features(X_INPUT, sample_count): features = np.zeros(shape=(sample_count, 4, 4, 512)) generator = datagen.flow(np.array(X_INPUT),batch_size=batch_size) i = 0 for inputs_batch in generator: features_batch = conv_base.predict(inputs_batch) features[i * batch_size : (i + 1) * batch_size] = features_batch i += 1 if i * batch_size >= sample_count: # Note that since generators yield data indefinitely in a loop, # we must `break` after every image has been seen once. break return features
# prediction_probabilities_binary = [] # for p in prediction_probabilities : # if p >= 0.5 : # prediction_probabilities_binary.append(1) # else: # prediction_probabilities_binary.append(0) # print(len(prediction_probabilities_binary))