Text-to-Image Generation Grounded by Fine-Grained User Attention - 42Papers