We present a suite of decoder-only pre-trained transformers ranging from 125 to 175b parameters, which we aim to fully and responsibly share with interested researchers.
We show that opt-175b is comparable to gpt-3, while requiring only 1/7th the carbon footprint to develop.
We are also releasing our logbook detailing the infrastructurechallenges we faced, along with code for experimenting with all of the released models.
Authors
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer