As different features are on different scale , it is good practice and better for the neural network to normalize them
for each feature in the input data (a column in the input data matrix), we will subtract the mean of the feature and divide by the standard deviation, so that the feature will be centered around 0 and will have a unit standard deviation.
1 2 3 4 5 6 7
mean = train_data.mean(axis=0) train_data -= mean std = train_data.std(axis=0) train_data /= std
test_data -= mean test_data /= std
A very important note
Note that the quantities that we use for normalizing the test data have been computed using the training data . We should never use in our workflow any quantity computed on the test data, even for something as simple as data normalization
Building the neural network
Architecture
as the datset is really small , it is easy to overfit, so , a small network is helpful to avoid that
In the output layer , there is no activation function as that would constraint the output.
This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). for instance if we applied a sigmoid activation function to our last layer, the network could only learn to predict values between 0 and 1. Here, because the last layer is purely linear, the network is free to learn to predict values in any range.
Compile
Loss Function : mse(mean squared error)
Optimizer : rmsprop
Metrics : mae(mean absolute error)
mse is a widely used loss function for regression problem
naturally, the concept of accuracy doesn’t apply for regression. A common regression metric is mean absolute error (MAE). It is simply the absolute value of the difference between the predictions and the targets. For instance, a MAE of 0.5 on this problem would mean that our predictions are off by $500 on average.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
from keras import models from keras import layers
defget_model(): # Because we will need to instantiate # the same model multiple times, # we use a function to construct it. model = models.Sequential() model.add(layers.Dense(64, activation='relu', input_shape=(train_data.shape[1],))) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(1)) model.compile(optimizer='rmsprop', loss='mse', metrics=['mae']) return model
K-fold validation
As the dataset is small , the small validation set could result in an highly biased model. We can use K-fold validation in this case to tune our model more generally . This is the best practice.
k = 4 num_val_samples = len(train_data) // k num_epochs = 100 all_scores = [] for i inrange(k): print('processing fold #', i) # Prepare the validation data: data from partition # k val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples] val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# Prepare the training data: data from all other partitions partial_train_data = np.concatenate( [train_data[:i * num_val_samples], train_data[(i + 1) * num_val_samples:]], axis=0) partial_train_targets = np.concatenate( [train_targets[:i * num_val_samples], train_targets[(i + 1) * num_val_samples:]], axis=0)
# Build the Keras model (already compiled) model = get_model() # Train the model (in silent mode, verbose=0) model.fit(partial_train_data, partial_train_targets, epochs=num_epochs, batch_size=1, verbose=0) # Evaluate the model on the validation data val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0) all_scores.append(val_mae)
As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their average (2.4) is a much more reliable metric than any single of these scores – that’s the entire point of K-fold cross-validation. In this case, we are off by $2,400 on average, which is still significant considering that the prices range from $10,000 to $50,000
Let’s try training the network for a bit longer: 500 epochs. To keep a record of how well the model did at each epoch, we will modify our training loop to save the per-epoch validation score log:
num_epochs = 500 all_mae_histories = [] for i inrange(k): print('processing fold #', i) # Prepare the validation data: data from partition # k val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples] val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# Prepare the training data: data from all other partitions partial_train_data = np.concatenate( [train_data[:i * num_val_samples], train_data[(i + 1) * num_val_samples:]], axis=0) partial_train_targets = np.concatenate( [train_targets[:i * num_val_samples], train_targets[(i + 1) * num_val_samples:]], axis=0)
# Build the Keras model (already compiled) model = get_model() # Train the model (in silent mode, verbose=0) history = model.fit(partial_train_data, partial_train_targets, validation_data=(val_data, val_targets), epochs=num_epochs, batch_size=1, verbose=0) mae_history = history.history['val_mae'] all_mae_histories.append(mae_history)
According to this plot, it seems that validation MAE stops improving significantly after ~40 epochs. Past that point, we start overfitting.
Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust the size of the hidden layers), we can train a final “production” model on all of the training data, with the best parameters, then look at its performance on the test data:
1 2 3 4 5 6
# Get a fresh, compiled model. model = get_model() # Train it on the entirety of the data. model.fit(train_data, train_targets, epochs=40, batch_size=16, verbose=0) test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)