why MSE on test set is very low and doesn't seem to evolve (not increasing after increasing epochs)


February 2019


67 time


I am working on a problem of predicting stock values using LSTMs.

My work is based on the following project . I use a data set (time series of stock prices) of total length 12075 that I split into train and test set (almost 10%). It is the same used in the link project.

train_data.shape (11000,)

test_data.shape (1075,)enter image description here

In our model, we start by training it on a many-to-many lstm model, where we provide N sequence of input (stock prices) and N sequence of labels (which are sampled by sequencing the train_data into N segments as inputs and labels are sampled as the following value sequence of the inputs).

Then we start to predict each value separately and providing it as input the next time till we reach num_predictions predictions.

Loss is simply the MSE between the predicted values and the actual values.

The predictions at the end seem not bad. However, I just don't understand why the training error decreases dramatically and the test error is always very very low (though it keeps decreasing by very little). I know that normally the test error should also start to increase after some number of epochs because of overfitting. I have tested with a simpler code and with a different dataset and I have encountered relatively similar MSE graphs.

Here is my mane loop:

for ep in range(epochs):

# ========================= Training =====================================
for step in range(num_batches):

    u_data, u_labels = data_gen.unroll_batches()

    feed_dict = {}
    for ui,(dat,lbl) in enumerate(zip(u_data,u_labels)):
        feed_dict[train_inputs[ui]] = dat.reshape(-1,1)
        feed_dict[train_outputs[ui]] = lbl.reshape(-1,1)

    feed_dict.update({tf_learning_rate: 0.0001, tf_min_learning_rate:0.000001})

    _, l = session.run([optimizer, loss], feed_dict=feed_dict)

    average_loss += l

# ============================ Validation ==============================
if (ep+1) % valid_summary == 0:

  average_loss = average_loss/(valid_summary*num_batches)

  # The average loss
  if (ep+1)%valid_summary==0:
    print('Average loss at step %d: %f' % (ep+1, average_loss))


  average_loss = 0 # reset loss

  predictions_seq = []

  mse_test_loss_seq = []

  # ===================== Updating State and Making Predicitons ========================
  for w_i in test_points_seq:
    mse_test_loss = 0.0
    our_predictions = []

    if (ep+1)-valid_summary==0:
      # Only calculate x_axis values in the first validation epoch

    # Feed in the recent past behavior of stock prices
    # to make predictions from that point onwards
    for tr_i in range(w_i-num_unrollings+1,w_i-1):
      current_price = all_mid_data[tr_i]
      feed_dict[sample_inputs] = np.array(current_price).reshape(1,1)
      _ = session.run(sample_prediction,feed_dict=feed_dict)

    feed_dict = {}

    current_price = all_mid_data[w_i-1]

    feed_dict[sample_inputs] = np.array(current_price).reshape(1,1)

    # Make predictions for this many steps
    # Each prediction uses previous prediciton as it's current input
    for pred_i in range(n_predict_once):

      pred = session.run(sample_prediction,feed_dict=feed_dict)


      feed_dict[sample_inputs] = np.asarray(pred).reshape(-1,1)

      if (ep+1)-valid_summary==0:
        # Only calculate x_axis values in the first validation epoch

      mse_test_loss += 0.5*(pred-all_mid_data[w_i+pred_i])**2



    mse_test_loss /= n_predict_once

    if (ep+1)-valid_summary==0:

  current_test_mse = np.mean(mse_test_loss_seq)

  # Learning rate decay logic
  if len(test_mse_ot)>0 and current_test_mse > min(test_mse_ot):
      loss_nondecrease_count += 1
      loss_nondecrease_count = 0

  if loss_nondecrease_count > loss_nondecrease_threshold :
        loss_nondecrease_count = 0
        print('\tDecreasing learning rate by 0.5')

  #print('\tTest MSE: %.5f'%np.mean(mse_test_loss_seq))
  print('\tTest MSE: %.5f' % current_test_mse)
  print('\tFinished Predictions')

Could this be normal ? should I just increase the size of test set ? Is there any thing done wrong ? any ideas please on how to test/investigate that ?

1 answers


The reason behind above difference in MSE between training and test is that we are not computing the same thing. During training, the MSE is the average of the sum of errors over time steps for every sample in the training data and so it is big. During test, we are making N=50 predictions and computing the average error between predictions and actual values. This average is always very small, and it appears almost constant in the above graph.