Learning to Learn with Generative Models of Neural Network Checkpoints
We explore a data-driven approach for learning to optimize neural networks.
We construct a dataset of neural network checkpoints and train a generative
model on the parameters. In particular, our model is a conditional diffusion
transformer that, given an initial input parameter vector and a prompted loss,
error, or return, predicts the distribution over parameter updates that achieve
the desired metric. At test time, it can optimize neural networks with unseen
parameters for downstream tasks in just one update. We find that our approach
successfully generates parameters for a wide range of loss prompts. Moreover,
it can sample multimodal parameter solutions and has favorable scaling
properties. We apply our method to different neural network architectures and
tasks in supervised and reinforcement learning.