We live in an era of rapid progress in artificial intelligence, both within the field and in the public sphere.
The more adept large language models become at mimicking human language, the more vulnerable we become to anthropomorphism, to seeing the systems in which they are embedded as more human-like than they really are.
Large language models-large neural networks trained on a simple predictive objective over a massive corpus of natural language-have reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data.
Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training on those problems.
We show how large language models can be made to perform faithful multi-step reasoning via a process whose causal structure mirrors the underlying logical structure of the problem.
Our approach works by chaining together reasoning steps, where each step results from calls to two fine-tuned large language models, one for selection and one for inference, to produce a valid reasoning trace.
Large language models (llm) are increasingly able to answer questions like these accurately, but that ability does not necessarily imply a general understanding of concepts relevant to the anchor query.
We propose conceptual consistency to measure a llm s understanding of relevant concepts.