For the Analysis, it is crucial to split the data into a training set used to build the algorithm, and a test set that is kept apart and only used to assess performance – it would be serious cheating to look at the test set before we are ready with our algorithm.