Large pre-trained neural networks have enabled impressive results on a variety of downstream tasks, but even the largest existing models still make errors, and even accurate predictions may become outdated over time.
Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise otherwise intact is desirable.
However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted targetededits difficult.
If presented with only a single problematic input and newdesired output, fine-tuning approaches tend to overfit ; other editingalgorithms are either computationally infeasible or simply ineffective when applied to very large models.
To enable easy post-hoc editing at scale, we propose a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model.
Our experiments with t5, gpt, bert, and bart models show that mend is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters.
Authors
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning