Supervised Learning : Binary Classification

Md. Zarif Ul Alam

2020-06-27

Algo, DS, Software & What Not!, Learning Notes

prepare train data
define train variables
define step/update function
- define loss function
train

Importing Dependencies

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras import models, layers, optimizers, losses, metrics
from tensorflow.keras.datasets import imdb

Importing Dataset

1
2
3

LIMIT_WORD = 10000

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=LIMIT_WORD)

D:\Anaconda3\envs\tf_env\lib\site-packages\tensorflow_core\python\keras\datasets\imdb.py:129: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
D:\Anaconda3\envs\tf_env\lib\site-packages\tensorflow_core\python\keras\datasets\imdb.py:130: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])

1 2	print(train_data[0]) print(train_labels[0])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
1

def convert_to_english(sequence):
    # word_index is a dictionary mapping words to an integer index
    word_index = imdb.get_word_index()

    # We reverse it, mapping integer indices to words
    reverse_word_index = dict(
        [(value, key) for (key, value) in word_index.items()]
    )

    # We decode the review; note that our indices were offset by 3
    # because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown".
    decoded_review = " ".join(
        [reverse_word_index.get(i - 3, '?') for i in sequence] # if not found replace with '?'
    )
    return decoded_review

1	print(convert_to_english(train_data[0]))

? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

Preparing the data

We cannot feed lists of integers into a neural network. We have to turn our lists into tensors. There are two ways we could do that:

We could pad our lists so that they all have the same length, and turn them into an integer tensor of shape (samples, word_indices), then use as first layer in our network a layer capable of handling such integer tensors (the Embedding layer, which we will cover in detail later in the book).
We could one-hot-encode our lists to turn them into vectors of 0s and 1s. Concretely, this would mean for instance turning the sequence [3, 5] into a 10,000-dimensional vector that would be all-zeros except for indices 3 and 5, which would be ones. Then we could use as first layer in our network a Dense layer, capable of handling floating point vector data.

We will go with the latter solution. Let’s vectorize our data, which we will do manually for maximum clarity:

# one hot encoding
def vectorize_sequences(sequences, dimension=LIMIT_WORD):
    results = np.zeros((len(sequences), dimension)) # Create an all-zero matrix of shape (len(sequences), dimension)
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1 # set specific indices of results[i] to 1s
    return results

1
2
3

# vectorize train test data
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

1 2	print(x_train[0])

[0. 1. 1. ... 0. 0. 0.]

1
2
3

# vectorize train test labels
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

1	print(y_train[0])

1.0

Building the neural network

Architecture

Intermediate Dense Layer : 2 , Hidden Units : 16 , Activation : Relu
Output Layer : 1 , Activation : Sigmoid

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(LIMIT_WORD,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

Compile

Lastly, we need to pick a loss function and an optimizer.
Since we are facing a binary classification problem and the output of our network is a
probability (we end our network with a single-unit layer with a sigmoid activation),
is it best to use the binary_crossentropy loss.
It isn’t the only viable choice: you could use, for instance, mean_squared_error.
But crossentropy is usually the best choice when you are dealing with models that output
probabilities. Crossentropy is a quantity from the field of Information Theory,
that measures the “distance” between probability distributions, or in our case, between
the ground-truth distribution and our predictions.

Loss Function : Binary Crossentropy
Optimizer : rmsprop

model.compile(
    optimizer=optimizers.RMSprop(lr=0.001),
    loss=losses.binary_crossentropy,
    metrics=[metrics.binary_accuracy]
)

Validation

Hold Out Validation

1	print(x_train.shape[0])

1 2	num_of_validation_sample = int(x_train.shape[0]*0.3) print(num_of_validation_sample)

# don't use random here , as the labels are separate

x_val = x_train[:num_of_validation_sample]
partial_x_train = x_train[num_of_validation_sample:]

y_val = y_train[:num_of_validation_sample]
partial_y_train = y_train[num_of_validation_sample:]

Train / Fit

history = model.fit(
    partial_x_train,
    partial_y_train,
    epochs=20,
    batch_size=512,
    validation_data=(x_val, y_val)
)

Train on 17500 samples, validate on 7500 samples
Epoch 1/20
17500/17500 [==============================] - 13s 757us/sample - loss: 0.5067 - binary_accuracy: 0.7964 - val_loss: 0.3647 - val_binary_accuracy: 0.8772
Epoch 2/20
17500/17500 [==============================] - 7s 407us/sample - loss: 0.2882 - binary_accuracy: 0.9059 - val_loss: 0.2963 - val_binary_accuracy: 0.8887
Epoch 3/20
17500/17500 [==============================] - 4s 252us/sample - loss: 0.2097 - binary_accuracy: 0.9323 - val_loss: 0.2727 - val_binary_accuracy: 0.8960
Epoch 4/20
17500/17500 [==============================] - 10s 569us/sample - loss: 0.1700 - binary_accuracy: 0.9434 - val_loss: 0.2978 - val_binary_accuracy: 0.8800
Epoch 5/20
17500/17500 [==============================] - 4s 214us/sample - loss: 0.1369 - binary_accuracy: 0.9557 - val_loss: 0.2879 - val_binary_accuracy: 0.8908
Epoch 6/20
17500/17500 [==============================] - 2s 143us/sample - loss: 0.1157 - binary_accuracy: 0.9625 - val_loss: 0.3057 - val_binary_accuracy: 0.8840
Epoch 7/20
17500/17500 [==============================] - 3s 153us/sample - loss: 0.0958 - binary_accuracy: 0.9696 - val_loss: 0.3219 - val_binary_accuracy: 0.8828
Epoch 8/20
17500/17500 [==============================] - 3s 165us/sample - loss: 0.0773 - binary_accuracy: 0.9771 - val_loss: 0.3429 - val_binary_accuracy: 0.8831
Epoch 9/20
17500/17500 [==============================] - 3s 148us/sample - loss: 0.0654 - binary_accuracy: 0.9813 - val_loss: 0.3752 - val_binary_accuracy: 0.8797
Epoch 10/20
17500/17500 [==============================] - 3s 148us/sample - loss: 0.0518 - binary_accuracy: 0.9865 - val_loss: 0.3988 - val_binary_accuracy: 0.8792
Epoch 11/20
17500/17500 [==============================] - 4s 230us/sample - loss: 0.0428 - binary_accuracy: 0.9893 - val_loss: 0.4315 - val_binary_accuracy: 0.8775
Epoch 12/20
17500/17500 [==============================] - 2s 122us/sample - loss: 0.0341 - binary_accuracy: 0.9923 - val_loss: 0.4669 - val_binary_accuracy: 0.8756
Epoch 13/20
17500/17500 [==============================] - 2s 114us/sample - loss: 0.0286 - binary_accuracy: 0.9935 - val_loss: 0.4983 - val_binary_accuracy: 0.8735
Epoch 14/20
17500/17500 [==============================] - 2s 100us/sample - loss: 0.0222 - binary_accuracy: 0.9957 - val_loss: 0.5291 - val_binary_accuracy: 0.8741
Epoch 15/20
17500/17500 [==============================] - 2s 102us/sample - loss: 0.0169 - binary_accuracy: 0.9971 - val_loss: 0.5658 - val_binary_accuracy: 0.8704
Epoch 16/20
17500/17500 [==============================] - 2s 98us/sample - loss: 0.0145 - binary_accuracy: 0.9970 - val_loss: 0.5996 - val_binary_accuracy: 0.8695
Epoch 17/20
17500/17500 [==============================] - 2s 99us/sample - loss: 0.0126 - binary_accuracy: 0.9976 - val_loss: 0.6323 - val_binary_accuracy: 0.8707
Epoch 18/20
17500/17500 [==============================] - 2s 103us/sample - loss: 0.0054 - binary_accuracy: 0.9999 - val_loss: 0.6702 - val_binary_accuracy: 0.8697
Epoch 19/20
17500/17500 [==============================] - 2s 100us/sample - loss: 0.0093 - binary_accuracy: 0.9978 - val_loss: 0.7070 - val_binary_accuracy: 0.8671
Epoch 20/20
17500/17500 [==============================] - 2s 92us/sample - loss: 0.0066 - binary_accuracy: 0.9984 - val_loss: 0.7420 - val_binary_accuracy: 0.8665

Note that the call to model.fit() returns a history object.
This object has a member history, which is a dictionary containing data about everything
that happened during training. Let’s take a look at it:

1 2	history_dict = history.history history_dict.keys()

dict_keys(['loss', 'binary_accuracy', 'val_loss', 'val_binary_accuracy'])

import matplotlib.pyplot as plt

acc = history.history['binary_accuracy']
val_acc = history.history['val_binary_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

png

plt.clf()   # clear figure

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

png

As you can see, the training loss decreases with every epoch and the training accuracy increases with every epoch. That’s what you would expect when running gradient descent optimization – the quantity you are trying to minimize should get lower with every iteration

But that isn’t the case for the validation loss and accuracy: they seem to peak at the fourth epoch. This is an example of what we were warning against earlier: a model that performs better on the training data isn’t necessarily a model that will do better on data it has never seen before

In precise terms, what you are seeing is “overfitting”: after the second epoch, we are over-optimizing on the training data, and we ended up learning representations that are specific to the training data and do not generalize to data outside of the training set.

In this case, to prevent overfitting, we could simply stop training after three epochs.

Evaluate

1	results = model.evaluate(x_test, y_test)

25000/25000 [==============================] - 5s 213us/sample - loss: 0.8104 - binary_accuracy: 0.8513

1	print(results)

[0.8104157666528224, 0.85132]

Avoiding Overfitting

from the graph we can see , it starts overfitting after 4th epoch

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(LIMIT_WORD,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(
    optimizer=optimizers.RMSprop(lr=0.001),
    loss=losses.binary_crossentropy,
    metrics=[metrics.binary_accuracy]
)
model.fit(
    x_train,
    y_train,
    epochs=4,
    batch_size=512
)
results = model.evaluate(x_test, y_test)

Train on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 5s 206us/sample - loss: 0.4753 - binary_accuracy: 0.8172
Epoch 2/4
25000/25000 [==============================] - 2s 84us/sample - loss: 0.2724 - binary_accuracy: 0.9065
Epoch 3/4
25000/25000 [==============================] - 2s 75us/sample - loss: 0.2076 - binary_accuracy: 0.9272
Epoch 4/4
25000/25000 [==============================] - 2s 75us/sample - loss: 0.1720 - binary_accuracy: 0.9402
25000/25000 [==============================] - 6s 229us/sample - loss: 0.2931 - binary_accuracy: 0.8841 - ETA: 2s - loss: 0.2995 - binary_accuracy: 0.8803 - ETA: 1s - loss: 0.3016 - binary_accuracy: 0.8796

1 2	print(results)

[0.2931446085309982, 0.88408]

Prediction

1	model.predict(x_test)

array([[0.21143122],
       [0.99991524],
       [0.89155275],
       ...,
       [0.13420637],
       [0.09828689],
       [0.69260585]], dtype=float32)

K Fold Cross Validation

we can get an idea about how our model will perform at unknown sets

that is why we are always instantiating new model

we can tune our model or the hyperparameters by doing cross validation

https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation?newreg=9eee23e22a2c4012aa70e069c76b32c4

def get_model():
    model = models.Sequential()
    model.add(layers.Dense(16, activation='relu', input_shape=(LIMIT_WORD,)))
    model.add(layers.Dense(16, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))

    model.compile(
        optimizer=optimizers.RMSprop(lr=0.001),
        loss=losses.binary_crossentropy,
        metrics=[metrics.binary_accuracy]
    )
    return model

k = 3
num_of_validation_sample = int(x_train.shape[0]*0.2)
print(num_of_validation_sample)
validation_scores = []

for fold in range(k):
    x_val = x_train[num_of_validation_sample*fold : num_of_validation_sample*(fold+1)]
    print(x_val.shape)
    part1_x = x_train[:num_of_validation_sample*fold]
    part2_x = x_train[num_of_validation_sample*(fold+1):]
    partial_x_train = np.append(part1_x,part2_x,axis=0)
    print(partial_x_train.shape)

    y_val = y_train[num_of_validation_sample*fold : num_of_validation_sample*(fold+1)]
    part1_y = y_train[:num_of_validation_sample*fold]
    part2_y = y_train[num_of_validation_sample*(fold+1):]
    partial_y_train = np.append(part1_y,part2_y,axis=0)

    model = get_model()
    model.fit(
        partial_x_train,
        partial_y_train,
        epochs=4,
        batch_size=512
    )

    results = model.evaluate(x_val, y_val)
    validation_scores.append(results[1])
    print(results)

5000
(5000, 10000)
(20000, 10000)
Train on 20000 samples
Epoch 1/4
20000/20000 [==============================] - 3s 133us/sample - loss: 0.4727 - binary_accuracy: 0.8161
Epoch 2/4
20000/20000 [==============================] - 2s 78us/sample - loss: 0.2763 - binary_accuracy: 0.9061
Epoch 3/4
20000/20000 [==============================] - 2s 77us/sample - loss: 0.2077 - binary_accuracy: 0.9286
Epoch 4/4
20000/20000 [==============================] - 1s 70us/sample - loss: 0.1702 - binary_accuracy: 0.9411
5000/5000 [==============================] - 1s 139us/sample - loss: 0.2709 - binary_accuracy: 0.8944
[0.2709195018291473, 0.8944]
(5000, 10000)
(20000, 10000)
Train on 20000 samples
Epoch 1/4
20000/20000 [==============================] - 5s 273us/sample - loss: 0.5080 - binary_accuracy: 0.7980
Epoch 2/4
20000/20000 [==============================] - 1s 69us/sample - loss: 0.3060 - binary_accuracy: 0.9003
Epoch 3/4
20000/20000 [==============================] - 1s 72us/sample - loss: 0.2239 - binary_accuracy: 0.9242
Epoch 4/4
20000/20000 [==============================] - 1s 65us/sample - loss: 0.1802 - binary_accuracy: 0.9378
5000/5000 [==============================] - 1s 131us/sample - loss: 0.2654 - binary_accuracy: 0.8962
[0.2653677617073059, 0.8962]
(5000, 10000)
(20000, 10000)
Train on 20000 samples
Epoch 1/4
20000/20000 [==============================] - 3s 129us/sample - loss: 0.4807 - binary_accuracy: 0.7968
Epoch 2/4
20000/20000 [==============================] - 2s 79us/sample - loss: 0.2740 - binary_accuracy: 0.9079
Epoch 3/4
20000/20000 [==============================] - 2s 91us/sample - loss: 0.2047 - binary_accuracy: 0.9299
Epoch 4/4
20000/20000 [==============================] - 1s 75us/sample - loss: 0.1675 - binary_accuracy: 0.9445
5000/5000 [==============================] - 1s 199us/sample - loss: 0.2939 - binary_accuracy: 0.8858
[0.29388200962543487, 0.8858]

1 2	val_score = np.average(validation_scores) print(val_score)

0.8921334

model = get_model()
model.fit(
    x_train,
    y_train,
    epochs=4,
    batch_size=512
)
results = model.evaluate(x_test, y_test)
print(results)

Train on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 5s 211us/sample - loss: 0.4542 - binary_accuracy: 0.8146
Epoch 2/4
25000/25000 [==============================] - 3s 135us/sample - loss: 0.2576 - binary_accuracy: 0.9096
Epoch 3/4
25000/25000 [==============================] - 3s 103us/sample - loss: 0.1981 - binary_accuracy: 0.9294
Epoch 4/4
25000/25000 [==============================] - 2s 82us/sample - loss: 0.1670 - binary_accuracy: 0.9399
25000/25000 [==============================] - 6s 231us/sample - loss: 0.3200 - binary_accuracy: 0.8750
[0.31998701528549195, 0.875]