A Sequence-to-sequence prediction approach for semantic segmentation
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Semantic segmentation is a key step in the design of semantic segmentation models.
Specifically, we deploy a puretransformer (ie.
Without convolution and resolution reduction) to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed segmentation transformer (setr).
Extensive experiments show that setr achieves new state of the art on ade20k (50.28% miou), pascal context (55.83% miou) and competitive results on cityscapes.
Particularly, we achieve the first (44.42% miou) position in the highly competitive ade20k test server leaderboard.
Authors
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang