Early StoppingBy Eric Antoine Scuccimarra
I recently began using the early stopping feature of Light GBM, which allows you to stop training when the validation score doesn't improve for a certain number of rounds. This is especially useful if you are bagging models, as you don't need to watch each one and figure out when training should stop. The way it works is you specify a number of rounds, and if the validation score doesn't improve during that number of rounds the training is stopped and the round with the best validation score is used.
When working with this I noticed that often the best validation round is a very early round, which has a very good validation score but an incredibly low training score. As an example here is the output from a model I am currently training. Normally the training F1 gets up to the high 0.90s:
Early stopping, best iteration is:  train's macroF1: 0.525992 valid's macroF1: 0.390373
Out of at least 400 rounds of training, the best performance on the validation set was on the 7th, at which time it was performing incredibly poorly on the training data. This indicates overfitting to the validation set, which is just as bad as overfitting to the training set in that the model is not likely to generalize well.
So what to do about this issue? The obvious solution would be to provide a minimum number of rounds and begin to monitor the validation score for early stopping once that number of rounds has passed, but I don't see any way to do this through the LGB API.
I am running this code using sklearn's joblib to do parallel processing, so I have create a list of the estimators to fit and then pass that list to the parallel processing which calls a function which fits the estimator to the data and returns it. The early stopping is taken care of by LGB, so what I did is after the estimator is fit I manually get the validation results and the train performance for the best validation round. If the train performance is above a specified threshold I return the estimator as normal. If, however, the train performance is below that threshold I recursively call the function again.
The downside to this is that it is possible to get into an infinite loop, but if the thresholds are properly tuned this should be easily avoidable.