The solution came from an unexpected direction. In 1994, a group at University of Pennsylvania led by Mitchell Marcus published the Penn treebank, a collection of one million words of text from the Wall Street Journal together with the structure of the sentences in the form of syntactic trees [61]. From the treebank, it is possible to read off all of the context-free rules necessary to assign the correct tree for all the sentences therein. This pretty much solved the grammar leakage problem.

