Self-Instruct: A Framework for Tuning Pretrained Language Models with Instructions
Self-Instruct: Aligning Language Model with Self Generated Instructions
We introduce self-instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations.
Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model.
Applying our method to vanilla gpt3, we demonstrate a 33% absolute improvement over the original model on super-naturalinstructions, on par with the performance of instructgpt_001, which is trained with private user data and human annotations.
Authors
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
We introduce, a semi-automated process for instruction-tuning a pretrained language model using instructional signals from the model itself.
The overall process is an iterative bootstrapping algorithm, which starts off with a limited (e.g., 175 in our study) seed set of manually-written instructions that are used to guide the overall generation.
In the first phase, the model is prompted to generate instructions for new tasks.
This step leverages the existing collection of instructions to create more broad-coverage instructions that define (often new) tasks.
Given the newly-generated set of instructions, the framework also creates input-output instances for them, which can be later used for supervising the instruction tuning.
Various measures are used to prune low-quality and repeated instructions, before adding them to the task pool.
This process can be repeated for many interactions until reaching a large number of tasks and different ways to describe them.
Result
We conduct experiments to measure and compare the quality of models under various instruction tuning setups.
And are two instruction-tuned models proposed in @cite2 and @cite3 respectively, and are demonstrated to be able to follow instructions for many nlp tasks.
Both of these models are finetuned from the t5 @cite4 checkpoints and are publicly available.
For our human evaluation of these models on newly written instructions, we include their 001, 002 and 003 engines for completeness.
We also evaluate @xmath0 @cite5, which is developed by openai based on gpt3 to follow human instructions better and has been found by the community to have impressive zero-shot abilities.
In this work, we mainly focus on the zero-shot setup, i.e., the model is prompted with the definition of the tasks only, without in-context demonstration examples.
For all our requests to the variants, we use the deterministic generation mode (temperature as 0 and no nucleus sampling) without specific stop sequences.