Latent language models have recently gained attention for their potential as an alternative to (or proxy for) explicit knowledge bases (kbs).
In this position paper, we examine this hypothesis, identify strengths and limitations of both latent language models and knowledge bases, and discuss the complementary nature of the two paradigms.
We present a language model that combines a large parametric neural network(i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture.
Our model uses extended short-term context by caching local hidden states and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep.
Language models do not fully understand the context and sensitivity of text and can sometimes memorize phrases or sentences present in their training sets.
In this paper, we investigate whether they not only memorize but also plagiarize training samples when generating artificial texts.
We propose a new paradigm to help large language models generate more accurate factual knowledge without retrieving from an external corpus, called recitation-augmented generation (recite).
We show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance in various closed-book question answering (cbqa) tasks.
We present the pre-trained and performance evaluations of the large and base robertson-robertson - alpha (robertson-robertson - alpha) models, as well as the corresponding performance evaluations.
We conduct a rigorous study to explore the underlying predicting mechanisms of pre-trained masked language models (mlms) over different extraction paradigms.
By investigating the behaviors of mlms, we find that previous decent performancemainly owes to the biased prompts which overfit dataset artifacts.
This survey reviews works in which language models (LMs) are augmented with
reasoning skills and the ability to use tools. The former is defined as
decomposing a potentially complex task into simpler
Reinforcement learning (rl) is challenging since specifying human notions of desired behavior may be difficult via reward functions or require many expert demonstrations.
This paper explores how to simplify reward design by prompting a large language model (llm) such as gpt-3 as a proxy reward function, where the user provides a textual prompt containing a few examples(few-shot) or a description (zero-shot) of the desired behavior.
Creole languages such as Nigerian Pidgin English and Haitian Creole are
under-resourced and largely ignored in the NLP literature. Creoles typically
result from the fusion of a foreign language with m
We present a text-only approach to augment language models with non-differentiable tools, and an iterative"self-play"technique to bootstrap performance starting from few tooldemonstrations.
At a given model scale,
tool augmented language models significantly outperform non-augmented language models.
Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data. In this paper, we leverage this implicit knowledge to create an effec
Non-parametric neural language models (NLMs) learn predictive distributions
of text utilizing an external datastore, which allows them to learn through
explicitly memorizing the training datapoints. W
We address the problem of any-code completion - generating a missing piece of
source code in a given program without any restriction on the vocabulary or
structure. We introduce a new approach to any-
Embedding recycling (er):re-using activations from previous model runs when performing training or inference.
We show that our method provides a 100% speedup during training and a 55-86% speedup for inference for text classification and entity recognition tasks in the scientific domain.
Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding
entities and relations into continuous vector spaces. Existing methods are
mainly structure-based or description-based. Structur
The factual knowledge acquired during pretraining and stored in the
parameters of Language Models (LM) can be useful in downstream tasks (e.g.,
question answering or textual inference). However, some
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in a
To obtain high-quality sentence embeddings from pretrained language models,
they must either be augmented with additional pretraining objectives or
finetuned on large amounts of labeled text pairs. Wh
Language models (LMs) are trained on collections of documents, written by
individual human agents to achieve specific goals in an outside world. During
training, LMs have access only to text of these