Generating Contextutal Documents for Answering Knowledge-intensive Tasks
Generate rather than Retrieve: Large Language Models are Strong Context Generators
Knowledge-intensive tasks, such as open-domain question answering (qa), require access to a large amount of world or domain knowledge.A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external knowledge source such as wikipedia and then predicts an answer conditioned on the retrieved documents.In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.We call our method generate-then-read (genread), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain question answering, fact checking, and dialogue system, and demonstrate that genread achieves 71.6 and 54.4 exact match scores on triviaqa and webq, significantly outperforming the state-of-the-art retrieve-then-read pipeline dpr-fid by +4.0and +3.9, without retrieving any documents from any external knowledge source.