We introduce the pixel-based encoder of language (pixel), a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels.
We pretrain the 86-parameter parameter pixelmodel on the same data as the well-known bert model and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-latin scripts.
We find that pixel substantially outperforms bert on syntactic and semantic processing tasks on scripts that are not found in the pretraining data, but is slightly weaker than bert when working with latin scripts.
Furthermore, we find that pixel is more robust to noisy text inputs than bert, further confirming the benefits of modelling language with pixels.
Authors
Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott