Two Talks: “Generalization through Memorization: Nearest Neighbor Language Models” and “Probing Neural NLP: Ideas and Problems”
Lecture | November 18 | 4:10-5 p.m. | 202 South Hall
Urvashi Khandelwal & John Hewitt
Generalization through Memorization: Nearest Neighbor Language Models
Neural language models (LMs) are typically trained on large amounts of data. However, generalizing to a larger corpus or to a different domain requires additional training which is expensive. This raises an important question â how can LMs generalize better without additional training? In this talk, I will introduce kNN-LMs which extend a pre-trained LM by linearly interpolating it with a k-nearest neighbors (kNN) model. Distances are computed in the pre-trained LM embedding space, and neighbors can be drawn from any text collection, including the original LM training set. Experiments show that using the original LM training data alone, without further training, can improve performance quite a bit. In addition, kNN-LM efficiently scales up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search can help LMs to effectively use data without having to train on it.
Probing Neural NLP: Ideas and Problems
Recent work in NLP has attempted to explore the basic linguistic skills induced by neural NLP models. Probing methods ask these questions through supervised analyses of modelsâ representations of sentences. In this talk, Iâll cover a new way of thinking about how neural networks can implicitly encode discrete structures, and provide probing evidence that ELMo and BERT have internal representations of syntax. Iâll then introduce work challenging the premises of probing, demonstrating that the methodology can admit false positive results and showing how probes can be designed and interpreted to avoid this.