A bounded-memory approach to transformer attention - 42Papers