Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz
2113536/2110848 [==============================] - 2s 1us/step - ETA: 0s
D:\Anaconda3\envs\tf_env\lib\site-packages\tensorflow_core\python\keras\datasets\reuters.py:113: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
D:\Anaconda3\envs\tf_env\lib\site-packages\tensorflow_core\python\keras\datasets\reuters.py:114: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])
1 2
print(train_data.shape) print(test_data.shape)
(8982,)
(2246,)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
defconvert_to_english(sequence): # word_index is a dictionary mapping words to an integer index word_index = reuters.get_word_index()
# We reverse it, mapping integer indices to words reverse_word_index = dict( [(value, key) for (key, value) in word_index.items()] )
# We decode the review; note that our indices were offset by 3 # because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown". decoded_review = " ".join( [reverse_word_index.get(i - 3, '?') for i in sequence] # if not found replace with '?' ) return decoded_review
1
print(convert_to_english(train_data[0]))
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json
557056/550378 [==============================] - 1s 1us/step
? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3
Preparing the data
We cannot feed lists of integers into a neural network. We have to turn our lists into tensors. There are two ways we could do that:
We could pad our lists so that they all have the same length, and turn them into an integer tensor of shape (samples, word_indices), then use as first layer in our network a layer capable of handling such integer tensors (the Embedding layer, which we will cover in detail later in the book).
We could one-hot-encode our lists to turn them into vectors of 0s and 1s. Concretely, this would mean for instance turning the sequence [3, 5] into a 10,000-dimensional vector that would be all-zeros except for indices 3 and 5, which would be ones. Then we could use as first layer in our network a Dense layer, capable of handling floating point vector data.
We will go with the latter solution. Let’s vectorize our data, which we will do manually for maximum clarity:
1 2 3 4 5 6
# one hot encoding for data defvectorize_sequences(sequences, dimension=LIMIT_WORD): results = np.zeros((len(sequences), dimension)) # Create an all-zero matrix of shape (len(sequences), dimension) for i, sequence inenumerate(sequences): results[i, sequence] = 1# set specific indices of results[i] to 1s return results
1 2 3
# vectorize train test data x_train = vectorize_sequences(train_data) x_test = vectorize_sequences(test_data)
1
print(x_train[0])
[0. 1. 1. ... 0. 0. 0.]
1 2 3 4 5 6
# one hot encoding for label defto_one_hot(labels, dimension=46): results = np.zeros((len(labels), dimension)) # Create an all-zero matrix of shape (len(sequences), dimension) for i, labels inenumerate(labels): results[i, labels] = 1# set specific indices of results[i] to 1s return results
# can also use builting keras function # from keras.utils.np_utils import to_categorical # # y_train = to_categorical(train_labels) # y_test = to_categorical(test_labels)
the number of hidden units in the intermediate layer should be large so that it can learn 46 different class , small layer can cause information bottleneck , permanently dropping relevant information
As this is a multiclass classification problem , we will use softmax activation function in the output layer . the network will output a probability distribution over the 46 different output classes , all of which will sum to 1
Compile
Lastly, we need to pick a loss function and an optimizer. The best loss function to use in this case is categorical_crossentropy . It measures the distance between two probability distributions: here, between the probability distribution output by the network and the true distribution of the labels. By minimizing the distance between these two distributions, you train the network to output something as close as possible to the true labels.
Note that the call to model.fit() returns a history object. This object has a member history, which is a dictionary containing data about everything that happened during training. Let’s take a look at it:
# "bo" is for "blue dot" plt.plot(epochs, loss, 'bo', label='Training loss') # b is for "solid blue line" plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend()
As you can see, the training loss decreases with every epoch and the training accuracy increases with every epoch. That’s what you would expect when running gradient descent optimization – the quantity you are trying to minimize should get lower with every iteration
But that isn’t the case for the validation loss and accuracy: they seem to peak at the 10th epoch. This is an example of what we were warning against earlier: a model that performs better on the training data isn’t necessarily a model that will do better on data it has never seen before
In precise terms, what you are seeing is “overfitting”: after the 10th epoch, we are over-optimizing on the training data, and we ended up learning representations that are specific to the training data and do not generalize to data outside of the training set.
In this case, to prevent overfitting, we could simply stop training after 10 epochs.
We can see that our model is overfitting after the 10th epoch
1 2 3 4 5 6 7 8 9 10
# now training on the full train data and upto epoch 10 model = get_model() model.fit( x_train, y_train, epochs=10, batch_size=512 ) results = model.evaluate(x_test, y_test) print(results)
The only thing it would change is the choice of the loss function. Our previous loss, categorical_crossentropy, expects the labels to follow a categorical encoding. With integer labels, we should use sparse_categorical_crossentropy
This new loss function is still mathematically the same as categorical_crossentropy; it just has a different interface.